<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Julian Rex &#187; google</title>
	<atom:link href="http://rexy.co.uk/tag/google/feed/" rel="self" type="application/rss+xml" />
	<link>http://rexy.co.uk</link>
	<description>iPhone Game Developer</description>
	<lastBuildDate>Thu, 10 Dec 2009 04:31:41 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.4</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Google and sitemap.xml</title>
		<link>http://rexy.co.uk/2006/05/google-and-sitemapxml/</link>
		<comments>http://rexy.co.uk/2006/05/google-and-sitemapxml/#comments</comments>
		<pubDate>Sat, 20 May 2006 15:36:36 +0000</pubDate>
		<dc:creator>Julian</dc:creator>
				<category><![CDATA[web]]></category>
		<category><![CDATA[34sp]]></category>
		<category><![CDATA[google]]></category>
		<category><![CDATA[sitemap]]></category>

		<guid isPermaLink="false">http://rexy.co.uk/wordpress/?p=12</guid>
		<description><![CDATA[I finally got around to adding a robots.txt and sitemap.xml to my site. But I had some issues along the way&#8230;
First, I wanted to use Google&#8217;s sitemap.xml generator to generate the xml file. Here are the steps I took to get this working with my host 34sp. The instructions from google&#8217;s site are very good, though there [...]]]></description>
			<content:encoded><![CDATA[<p>I finally got around to adding a robots.txt and sitemap.xml to my site. But I had some issues along the way&#8230;</p>
<p>First, I wanted to use <a href="http://sourceforge.net/projects/goog-sitemapgen">Google&#8217;s sitemap.xml generator</a> to generate the xml file. Here are the steps I took to get this working with my host 34sp. The instructions from google&#8217;s site are very good, though there was one minor point of confusion.</p>
<p>I have a mirrored setup that allows me to test changes to the site on my iBook, before uploading to the website. These are 3 files I used;</p>
<ul>
<li>sitemap_gen.py &#8211; the script, you shouldn&#8217;t need to modify this</li>
<li>config.xml &#8211; could be called anything, but I stuck with the default</li>
<li>urllist.txt &#8211; a list of the urls I want the script to use.</li>
</ul>
<p>The last two are the ones you need to change. The docs say to delete the sections (in config.xml) that you don&#8217;t need &#8211; however, I&#8217;ve been commenting them out as, no doubt, I&#8217;ll start using the other sections. For the moment I&#8217;m only using the urllist.txt, as this seemed the easiest approach.</p>
<p>Here are my sections (without the comments, and paths removed)</p>
<div class="codey">
<code>&lt;site<br />
  base_url="http://rexy.co.uk"<br />
  store_into="/absolute/path/to/httpdocs/sitemap.xml.gz"<br />
  verbose="1"<br />
  &gt;</p>
<p>  &lt;urllist  path="/absolute/path/to/private/sitemap/urllist.txt"  encoding="UTF-8"  /&gt;</code>
</div>
<p>Notice here that I&#8217;ve stuck all 3 files in a <strong>not</strong> web-accessible location. Probably best not to stick it in your document root. (On 34sp, the private directory should already be set up for you)</p>
<p>Now, I don&#8217;t have SSH access, but running the script</p>
<div class="codey">
<code>python sitemap_gen.py --config=config.xml --testing</code>
</div>
<p>locally worked fine (remember to remove the &#8211;testing switch when it&#8217;s all working).Now to get this working for my site, I&#8217;ve set up a cron job and here&#8217;s where things went odd; Initially I had it looking like:</p>
<div class="codey">
<code>0 0 * * * /usr/local/bin/python ~/private/sitemap/sitemap_gen.py --config=config.xml</code>
</div>
<p>this is as some docs have listed. This failed with the python error &#8216;ValueError: unknown url type: ~/private/sitemap/config.xml&#8217;. So I tried setting the path.</p>
<div class="codey"><code>0 0 * * * /usr/local/bin/python ~/private/sitemap/sitemap_gen.py --config=~/private/sitemap/config.xml</code></div>
<p>Again this failed. I then realised that the cron jobs were probably being run as a different user, so using &#8216;~&#8217; in the path is not going to work. Switching to the absolute path worked a treat:</p>
<div class="codey"><code>0 0 * * * /usr/local/bin/python ~/private/sitemap/sitemap_gen.py --config=/absolute/path/to/private/sitemap/config.xml</code></div>
<p>This generates a sitemap.xml.gz in the root, so at <a href="http://rexy.co.uk/sitemap.xml.gz">http://rexy.co.uk/sitemap.xml.gz</a>. Now Google needs to be told about the sitemap. You CAN &#8216;upload&#8217; the gz file, it&#8217;s fine.</p>
<p>However, in my case, even having done all this, it still wasn&#8217;t working; I was getting an &#8216;unsupported format&#8217; error.</p>
<p>In the end I traced this down to my Rewrite rules not behaving in my root .htaccess file. So what I thought should have been the sitemap file, was actually ending up as blosxom post.</p>
<p>This as it turned out was also affecting my robots.txt, but I hadn&#8217;t realised it.</p>
]]></content:encoded>
			<wfw:commentRss>http://rexy.co.uk/2006/05/google-and-sitemapxml/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
