Although  it sounds far-fetched, blocking search engine spiders with robots is  actually what a robot.txt file does. Search engines use spiders (or  robots, or bots) to crawl or index your website, searching for keywords  to use to bring up your website in a search. A robot.txt file is a file  you can easily create to let the spider know that you don't want it to  crawl on your page, or part of your page.
Open  your favorite text editor. It doesn't matter what text editor you use.  Notepad works just fine if you're on a PC, and can be found under  "Accessories."
Enter  two lines, one for the name of the spider that will be crawling your  web page, and one for the directory or file name you want to exclude for  the search. This is the syntax:
User-Agent: [Spider or Bot name]
Disallow: [Directory or File Name]
Disallow: [Directory or File Name]
For example:
User-Agent: Googlebot
Disallow: /mywebsite/private.html
Disallow: /mywebsite/private.html
where  "Googlebot" is the robot sent out by Google, and "private.html" is the  file in the directory "mywebsite" that you do not want the robot to  index.
Exclude  a section of your site from all spiders. If you do not want any robots  to index a certain section of your site, use the "*" character after  User-Agent. Your file would look like this:
User-Agent: *
Disallow: /mywebsite/private.html
Disallow: /mywebsite/private.html
Exclude  your whole site from all robots. If you don't want any of your site to  be visible by robots, (e.g. if you are building your website, and it is  not ready to be viewed by the public), insert a "*" character after  User-Agent, and the "/" after Disallow. For example:
User-Agent: *
Disallow: /
Disallow: /
If  you want to allow all robots to access your whole site, simply add the  asterisk as before, and leave the Disallow section empty, as follows:
User-Agent: *
Disallow:
Disallow:
Save  the file as robot.txt, and place it in the root directory of your  website. For example, http://www.mywebsite.com/robots.txt.

Comments
Post a Comment