Although it sounds far-fetched, blocking search engine spiders with robots is actually what a robot.txt file does. Search engines use spiders (or robots, or bots) to crawl or index your website, searching for keywords to use to bring up your website in a search. A robot.txt file is a file you can easily create to let the spider know that you don't want it to crawl on your page, or part of your page.
Open your favorite text editor. It doesn't matter what text editor you use. Notepad works just fine if you're on a PC, and can be found under "Accessories."
Enter two lines, one for the name of the spider that will be crawling your web page, and one for the directory or file name you want to exclude for the search. This is the syntax:
User-Agent: [Spider or Bot name]
Disallow: [Directory or File Name]
Disallow: [Directory or File Name]
For example:
User-Agent: Googlebot
Disallow: /mywebsite/private.html
Disallow: /mywebsite/private.html
where "Googlebot" is the robot sent out by Google, and "private.html" is the file in the directory "mywebsite" that you do not want the robot to index.
Exclude a section of your site from all spiders. If you do not want any robots to index a certain section of your site, use the "*" character after User-Agent. Your file would look like this:
User-Agent: *
Disallow: /mywebsite/private.html
Disallow: /mywebsite/private.html
Exclude your whole site from all robots. If you don't want any of your site to be visible by robots, (e.g. if you are building your website, and it is not ready to be viewed by the public), insert a "*" character after User-Agent, and the "/" after Disallow. For example:
User-Agent: *
Disallow: /
Disallow: /
If you want to allow all robots to access your whole site, simply add the asterisk as before, and leave the Disallow section empty, as follows:
User-Agent: *
Disallow:
Disallow:
Save the file as robot.txt, and place it in the root directory of your website. For example, http://www.mywebsite.com/robots.txt.
Comments
Post a Comment