<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Morpheus Media Mlog &#187; googlebot</title>
	<atom:link href="http://www.morpheusmedia.com/mlog/tag/googlebot/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.morpheusmedia.com/mlog</link>
	<description>Just another WordPress weblog</description>
	<lastBuildDate>Thu, 06 Jan 2011 22:20:36 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.5</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>SES NYC 2007 &#8211; Robots.txt</title>
		<link>http://www.morpheusmedia.com/mlog/archive/ses-nyc-2007-robotstxt/</link>
		<comments>http://www.morpheusmedia.com/mlog/archive/ses-nyc-2007-robotstxt/#comments</comments>
		<pubDate>Thu, 12 Apr 2007 14:36:43 +0000</pubDate>
		<dc:creator>Toby Evers</dc:creator>
				<category><![CDATA[Archive]]></category>
		<category><![CDATA[crawlers]]></category>
		<category><![CDATA[googlebot]]></category>
		<category><![CDATA[robots.txt]]></category>
		<category><![CDATA[SES NYC]]></category>
		<category><![CDATA[sitemaps]]></category>
		<category><![CDATA[spiders]]></category>
		<category><![CDATA[Web Development]]></category>

		<guid isPermaLink="false">http://www.morpheusmedia.com/mlog/?p=42</guid>
		<description><![CDATA[Here&#8217;s a breakdown of the session
Keith Hogan – Ask.com

Less than 35% of servers have a robots.txt file
Most are copied from one found online
Typical 23 character (100 is about the max)
Format is not well understood
May change to xml format for better control in the near future
Ask: can find info about a crawler on site in the [...]]]></description>
			<content:encoded><![CDATA[<p class="MsoNormal">Here&#8217;s a breakdown of the session</p>
<p class="MsoNormal"><strong>Keith Hogan</strong> – Ask.com</p>
<ul type="disc" style="margin-top: 0in">
<li class="MsoNormal">Less than 35% of servers have a robots.txt file</li>
<li class="MsoNormal">Most are copied from one found online</li>
<li class="MsoNormal">Typical 23 character (100 is about the max)</li>
<li class="MsoNormal">Format is not well understood</li>
<li class="MsoNormal">May change to xml format for better control in the near future</li>
<li class="MsoNormal">Ask: can find info about a crawler on site in the about page.</li>
</ul>
<p class="MsoNormal"><strong>Eytan Seidma</strong>n – MSN Live Search</p>
<ul type="disc" style="margin-top: 0in">
<li class="MsoNormal">Used example <a href="http://www.hilton.com/robots.txt">www.hilton.com/robots.txt</a> (tells them not to use it during the day… funny)</li>
</ul>
<p class="MsoNormal"><strong>Dan Crow </strong>– Google</p>
<ul type="disc" style="margin-top: 0in">
<li class="MsoNormal">Also need to focus on robots exlusion – robots.txt + robots meta tags</li>
<li class="MsoNormal">Exclusion Protocol: tells search engines what not to index</li>
<li class="MsoNormal">Comparted to sitemaps which tells search engines what to crawl</li>
<li class="MsoNormal">Search engines have a lot of differences between them</li>
<li class="MsoNormal">There is interest to standardize protocol for all search engines</li>
</ul>
<p class="MsoNormal"><strong>Sean Suchter </strong>– Yahoo</p>
<ul type="disc" style="margin-top: 0in">
<li class="MsoNormal">Yahoo slurp is the web crawling robot</li>
<li class="MsoNormal">Yahoo! Slurp = user-agent identifier</li>
<li class="MsoNormal">Supports all standard robots.txt commands (robotstxt.org</li>
<li class="MsoNormal">Custom</li>
<li style="list-style-type: none; list-style-image: none; list-style-position: outside">
<ul type="circle" style="margin-top: 0in">
<li class="MsoNormal">Crawl delay</li>
<li class="MsoNormal">Sitemap – specify location</li>
<li class="MsoNormal">Wildcards – specify patterns of urls to disallow/allow</li>
<li class="MsoNormal">Custom meta extensions – NOODP  and NOYDIR – do not use Yahoo directory titles</li>
</ul>
</li>
<li class="MsoNormal">Please only address yahoo robot with that section in .txt file</li>
<li class="MsoNormal">Currently supports crawl-delay</li>
<li style="list-style-type: none; list-style-image: none; list-style-position: outside">
<ul type="circle" style="margin-top: 0in">
<li class="MsoNormal">Often misused</li>
</ul>
</li>
<li class="MsoNormal">Microformats.org/wiki/robots-exclusion</li>
<li style="list-style-type: none; list-style-image: none; list-style-position: outside">
<ul type="circle" style="margin-top: 0in">
<li class="MsoNormal">Demark sections of html you don’t want robots to use for matching</li>
<li class="MsoNormal">Used to demark useless template text, ad text, etc.. irrelevant traffic</li>
</ul>
</li>
<li class="MsoNormal">Historically – to identify what pages to not show in search results</li>
<li class="MsoNormal">But… there is more beyond those (css, images, inline text, iframes), do we need a mechanism to exclude that in certain respects?</li>
</ul>
<p class="MsoNormal"><strong>Danny Sullivan</strong> – Host</p>
<ul type="disc" style="margin-top: 0in">
<li class="MsoNormal">Check out webmasterworld’s robots.txt file – has lots of notes</li>
<li class="MsoNormal">Wonders if robots.txt in XML format to make it easier</li>
<li style="list-style-type: none; list-style-image: none; list-style-position: outside">
<ul type="circle" style="margin-top: 0in">
<li class="MsoNormal">Maybe combine the sitemap and robots to one file – check and prevent in one shot</li>
</ul>
</li>
</ul>
<p style="color: #000088; text-align: right">
]]></content:encoded>
			<wfw:commentRss>http://www.morpheusmedia.com/mlog/archive/ses-nyc-2007-robotstxt/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Due to SEO Abuse &#8211; Wikipedia Nofollows links</title>
		<link>http://www.morpheusmedia.com/mlog/archive/due-to-seo-abuse-wikipedia-nofollows-links/</link>
		<comments>http://www.morpheusmedia.com/mlog/archive/due-to-seo-abuse-wikipedia-nofollows-links/#comments</comments>
		<pubDate>Wed, 31 Jan 2007 22:33:25 +0000</pubDate>
		<dc:creator>Toby Evers</dc:creator>
				<category><![CDATA[googlebot]]></category>
		<category><![CDATA[links]]></category>
		<category><![CDATA[nofollow]]></category>
		<category><![CDATA[search]]></category>
		<category><![CDATA[SEO]]></category>

		<guid isPermaLink="false">http://www.morpheusmedia.com/mlog/?p=10</guid>
		<description><![CDATA[As reported by Search Engine Land here, Wikipedia has put &#8220;no follow&#8221; tags on all it&#8217;s outbound links.  Gray and Black hat SEO&#8217;s have been using Wiki to create &#8220;hot&#8221; inbound links to their sites that they are trying to promote.  Even with all of the algorythm changes the search engines have made, [...]]]></description>
			<content:encoded><![CDATA[<p>As reported by <a target="_blank" href="http://searchengineland.com">Search Engine Land</a> <a target="_blank" href="http://searchengineland.com/070122-091812.php">here</a>, Wikipedia has put &#8220;no follow&#8221; tags on all it&#8217;s outbound links.  Gray and Black hat SEO&#8217;s have been using Wiki to create &#8220;hot&#8221; inbound links to their sites that they are trying to promote.  Even with all of the algorythm changes the search engines have made, inbound links from high ranking sites are still holding their stature.  Since Wikipedia is a well traveled site, it just seems natural to promote a site via spamming links all over this &#8220;open source&#8221; site.  So, once again, the site gets spammed and has to ruin it for others by putting a no follow tag on all links.  What this means, is that when a crawler makes it&#8217;s way to a Wikipedia article, it will not grab the link and follow it to the site it&#8217;s linking to, rendering this &#8220;gray/black&#8221; hat SEO technique useless.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.morpheusmedia.com/mlog/archive/due-to-seo-abuse-wikipedia-nofollows-links/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>

