robots.txt file
robots.txt are programs that automatically crawl the Web and retrieve documents. When Google ,Yahoo, Bing, Msn etc are visit a site, then the robots.txt file say these search engines to crawl or spider the website files and also say whats are not need to crawl or spider. Search engines first want the robots.txt file when indexing files then visit the others files according to robots.txt command.
Create a robots.txt file
Open your notpad and save the file as robots.txt with .txt extention. After save the file write the comand and upload it in your root directory.
So your robots.txt files url is ..
http:/www.mydomainname.com/robots.txt
Syntax
robots.txt file include two commands
User-agent: { bot name }
Disallow: { file }
There has another one command for only googlebot
Allow:
There are several bots name
Robot Name - Search Engine
Googlebot - Google
Googlebot Image - Google Images
Slurp - Inktomi
ZyBorg - WiseNut/LookSmart
fast - Fast/AllTheWeb
Openbot - OpenFind
Scooter - Alta Vista
list of all bots visit bot name
The basic syntax of robots.txt file is
User-agent: { bot name }
Disallow: { file }
There has another one command for only googlebot
Allow:
There are several bots name
Robot Name - Search Engine
Googlebot - Google
Googlebot Image - Google Images
Slurp - Inktomi
ZyBorg - WiseNut/LookSmart
fast - Fast/AllTheWeb
Openbot - OpenFind
Scooter - Alta Vista
list of all bots visit bot name
The basic syntax of robots.txt file is
User-agent: gooblebot Disallow: /images/ Disallow: /projects/ Disallow: contact.html # This is a comment User-agent: * Disallow: /support/ Disallow: contact.html
Allow indexing of everything
The wildcard "*" specifies all robots
User-agent: * Disallow:
Disallow indexing of everything
User-agent: * Disallow: /
Disallow fle or directory
Disallow: /help #disallows both /help.html and /help/index.html, whereas Disallow: /help/ # would disallow /help/index.html but allow /help.html
Disallow Googlebot from indexing of a folder, except for allowing the indexing of one file in that folder
User-agent: Googlebot Disallow: /folder1/ Allow: /folder1/myfile.html
Disallow for indexing everything and allow googleBot for indexing all
User-agent: googleBot Disallow: User-agent: * Disallow: /
Disallow for indexing everything and allow googlebot and openbot for indexing all
User-agent: googleBot User-agent: openBot Disallow: User-agent: * Disallow: /
Allow Indexing for googleBot and scooter and disallow indexing for openBot
User-agent: googleBot User-agent: scooter Disallow: User-agent: openBot Disallow: /
Allow scooter and googleBot to visit all files. All other bots can access all files except files located in /private/
User-agent: scooter User-agent: googleBot Disallow: User-agent: * Disallow: /private/
Allow all without the private directory
Allow googleBot and scooter to visit all files. All other bots can access all files except files beginning with /private - All bots that are not googleBot or scooter are not allowed to crawl files in a folder called /private/ or file starting with "private" e.g. /privatepics.html.
User-agent: googleBot User-agent: scooter Disallow: User-agent: * Disallow: /private
Disallow directory to openBot and fast
Allow OpenBot and fast to visit all files except those in /private/ and /personal.html. Deny all other robots access to visit all files
User-agent: openBot User-agent: fast Disallow: /private/ Disallow: /personal.html User-agent: * Disallow: /
Allow specific bot and disallow specific file or location
Allow openBot and fast to visit all files except those in /private/ and file beginning with /personal. Deny all other robots access to visit all files - openBot and fast will not be allowed to access personal.html neither. This example is fairly the same as above, but without specifying a certain file.
User-agent: openBot User-agent: fast Disallow: /private/ Disallow: /personal User-agent: * Disallow: /
Allow Googlebot
Allow Googlebot to visit all files. Deny all other robots access to the web site. - In case the only search engine you care about is Google.
User-agent: Googlebot Allow: / User-agent: * Disallow: /
Allow and disallow in diffreent bot
Allow Googlebot and msnbot to visit all files, except those in /private/ except file.html. Deny all other robots access to the web site. - In case the only search engines you care about are Google and Bing and you want to give them some private information. It is wise to keep the Allow-rule(set) above the Disallow rule(set), as the first pattern matches (in this case important for Bing a.k.a. msnbot).
User-agent: Googlebot User-agent: msnbot Allow: /private/file.html Disallow: /private/ User-agent: * Disallow: /
Incude a character matching
To block access to all URLs that include a question mark (?), you could use the following entry:
User-agent: * Disallow: /*?
$ character - matching end of the URL
You can use the $ character to specify matching the end of the URL. For instance, to block an URLs that end with .asp, you could use the following entry:
User-agent: Googlebot Disallow: /*.asp$
No comments:
Post a Comment