Major search engines support robots.txt standard

GoogleYahoo, and Microsoft’s Live Search recently announced standard support for the major robots.txt directives. This means that you can use the same syntax for robots.txt to control the activities of those three major search engine crawlers. The common directives are: Disallow, Allow, and Sitemaps.  In addition, all three support the use of wildcards (* and $) in specifying paths for Allow and Disallow. It’s interesting to note that Yahoo says they support “$ Wildcards,” whereas Google and Microsoft say that they support “* Wildcards” as well as “$ Wildcards.” From reading Yahoo’s documentation, though, I’d say that they also support “* Wildcards.”

All three also support several HTML META tags, such as NOINDEX and NOFOLLOW, that give content authors much tighter control over crawlers than can be accomplished with robots.txt. 

This isn’t exactly a new step. The three major search engines have been collaborating for the last few years, trying to make Webmasters’ jobs easier with respect to the major search engines. For example, back in February they announced common support for cross-submission of Sitemaps.

Unfortunately, all three also support individual extensions to the Robots Exclusion Protocol.  For example, Yahoo and Microsoft support the Crawl-Delay directive, which Google does not support. Both Google and Yahoo support some unique META tags that the others don’t support.

Even with the incompatibilities, this is a big step in the right direction. With unified support of the major robots.txt directives among the three major search engine crawlers, we can expect to see more support by smaller crawlers. I know that many authors of smaller-scale crawlers look to the majors to see what they should support. Having all three support the same directives in the same way, makes other developers’ jobs (including mine!) easier.

But ultimately it’s the Webmasters who benefit the most by giving them a standard way to control crawlers’ access to their sites.