Robots Exclusion Standard
The Robots Exclusion Standard is a method by which webmasters can specify which parts of their site they don't want robots to scan, index, or retrieve. This is done with a file named robots.txt in the root directory of their site. Well-behaved robots look at this file before proceeding to take action regarding a site (which results in web access logs showing attempted accesses for this filename even if no such file exists). Less-well-behaved robots such as spambots and malware don't heed this file (which is just a voluntary standard with no means of enforcing it), so its use is limited to giving instruction to the reasonable robots such as Googlebot.
To keep robots out of your cgi-bin directory you can use:
User-agent: * Disallow: /cgi-bin/
The asterisk means it applies to all user agents. It's also possible to identify specific robots by their user-agent strings and exclude them from things without affecting others.
There are some meta tags like "noindex" and "nofollow" that can be used in HTML for related effects.
References
- Robots Exclusion Standard (Wikipedia)
- Standards document (actually a non-binding consensus, not a formal standard)
- Robots.txt generator/tutorial