Robots Exclusion Standard

From Just Solve the File Format Problem
Revision as of 12:56, 5 December 2012 by Dan Tobias (Talk | contribs)

Jump to: navigation, search
File Format
Name Robots Exclusion Standard
Extension(s) .txt

The Robots Exclusion Standard is a method by which webmasters can specify which parts of their site they don't want robots to scan, index, or retrieve. This is done with a file named robots.txt in the root directory of their site. Well-behaved robots look at this file before proceeding to take action regarding a site (which results in web access logs showing attempted accesses for this filename even if no such file exists). Less-well-behaved robots such as spambots and malware don't heed this file (which is just a voluntary standard with no means of enforcing it), so its use is limited to giving instruction to the reasonable robots such as Googlebot.

To keep robots out of your cgi-bin directory you can use:

User-agent: *
Disallow: /cgi-bin/

The asterisk means it applies to all user agents. It's also possible to identify specific robots by their user-agent strings and exclude them from things without affecting others.

There are some meta tags like "noindex" and "nofollow" that can be used in HTML for related effects.


Personal tools
