Pages - পৃষ্ঠাসমূহ

robots.text tutorial and install in web server

In this post we will describe about robots.text file. Which are very important to complete Search Engine Optimization (SEO).It defines the search engines to crawl the web contents. It restricted the search engines and alsp can access to search engines to crawl web contents. So it is very important. Now we will learn about robot.text in this post and install it on server..
Hide Example Show Example

robots.txt file

robots.txt are programs that automatically crawl the Web and retrieve documents. When Google ,Yahoo, Bing, Msn etc are visit a site, then the robots.txt file say these search engines to crawl or spider the website files and also say whats are not need to crawl or spider. Search engines first want the robots.txt file when indexing files then visit the others files according to robots.txt command.

Create a robots.txt file

Open your notpad and save the file as robots.txt with .txt extention. After save the file write the comand and upload it in your root directory. So your robots.txt files url is ..
http:/www.mydomainname.com/robots.txt

Syntax

robots.txt file include two commands

User-agent: { bot name }
Disallow: { file }

There has another one command for only googlebot

Allow:


There are several bots name

Robot Name - Search Engine
Googlebot - Google
Googlebot Image - Google Images
Slurp - Inktomi
ZyBorg - WiseNut/LookSmart
fast - Fast/AllTheWeb
Openbot - OpenFind
Scooter - Alta Vista

list of all bots visit bot name

The basic syntax of robots.txt file is
User-agent: gooblebot
Disallow: /images/
Disallow: /projects/
Disallow: contact.html

# This is a comment

User-agent: *
Disallow: /support/
Disallow: contact.html

Allow indexing of everything

The wildcard "*" specifies all robots
User-agent: *
Disallow: 

Disallow indexing of everything

User-agent: *
Disallow: /

Disallow fle or directory


 Disallow: /help 
#disallows both /help.html and /help/index.html, whereas

Disallow: /help/ 
# would disallow /help/index.html but allow /help.html

Disallow Googlebot from indexing of a folder, except for allowing the indexing of one file in that folder

User-agent: Googlebot
Disallow: /folder1/
Allow: /folder1/myfile.html

Disallow for indexing everything and allow googleBot for indexing all

User-agent: googleBot
Disallow:

User-agent: *
Disallow: /

Disallow for indexing everything and allow googlebot and openbot for indexing all

User-agent: googleBot
User-agent: openBot
Disallow:

User-agent: *
Disallow: /

Allow Indexing for googleBot and scooter and disallow indexing for openBot

User-agent: googleBot
User-agent: scooter
Disallow:

User-agent: openBot
Disallow: /

Allow scooter and googleBot to visit all files. All other bots can access all files except files located in /private/

User-agent: scooter
User-agent: googleBot
Disallow:

User-agent: *
Disallow: /private/

Allow all without the private directory

Allow googleBot and scooter to visit all files. All other bots can access all files except files beginning with /private - All bots that are not googleBot or scooter are not allowed to crawl files in a folder called /private/ or file starting with "private" e.g. /privatepics.html.
User-agent: googleBot
User-agent: scooter
Disallow:

User-agent: *
Disallow: /private

Disallow directory to openBot and fast

Allow OpenBot and fast to visit all files except those in /private/ and /personal.html. Deny all other robots access to visit all files
User-agent: openBot
User-agent: fast
Disallow: /private/
Disallow: /personal.html

User-agent: *
Disallow: /

Allow specific bot and disallow specific file or location

Allow openBot and fast to visit all files except those in /private/ and file beginning with /personal. Deny all other robots access to visit all files - openBot and fast will not be allowed to access personal.html neither. This example is fairly the same as above, but without specifying a certain file.
User-agent: openBot
User-agent: fast
Disallow: /private/
Disallow: /personal

User-agent: *
Disallow: /

Allow Googlebot

Allow Googlebot to visit all files. Deny all other robots access to the web site. - In case the only search engine you care about is Google.
User-agent: Googlebot
Allow: /

User-agent: *
Disallow: /

Allow and disallow in diffreent bot

Allow Googlebot and msnbot to visit all files, except those in /private/ except file.html. Deny all other robots access to the web site. - In case the only search engines you care about are Google and Bing and you want to give them some private information. It is wise to keep the Allow-rule(set) above the Disallow rule(set), as the first pattern matches (in this case important for Bing a.k.a. msnbot).
User-agent: Googlebot
User-agent: msnbot
Allow: /private/file.html
Disallow: /private/

User-agent: *
Disallow: /

Incude a character matching

To block access to all URLs that include a question mark (?), you could use the following entry:
User-agent: *
Disallow: /*?

$ character - matching end of the URL

You can use the $ character to specify matching the end of the URL. For instance, to block an URLs that end with .asp, you could use the following entry:
User-agent: Googlebot
Disallow: /*.asp$

No comments:

Post a Comment