mjc
05-29-2001, 01:51 AM
Ixl,
Robots
robot.txt can exclude directories from being scanned.
The Robot will simply look for a "/robots.txt" URL on your site, where a site is defined as a HTTP server running on a particular host and port number. For example: http://w3.org/robots.txt
Also, remember that URL's are case sensitive, and "/robots.txt" must be all lower-case.
The "/robots.txt" file usually contains a record looking like this:
User-agent: *
Disallow: /cgi-bin/
Disallow: /tmp/
Disallow: /~joe/
In this example, three directories are excluded.
What you want to exclude depends on your server. Everything not explicitly disallowed is considered fair game to retrieve.
To exclude all robots from the entire server
User-agent: *
Disallow: /
To allow all robots complete access
User-agent: *
Disallow:
To exclude a single robot
User-agent: BadBot
Disallow: /
To allow a single robot
User-agent: WebCrawler
Disallow:
User-agent: *
Disallow: /
Picked it up here:Tips from Herb (http://webwi.de/data/web.htm)
------------------
mjc
Links list:Computer Links (http://www.fortunecity.com/skyscraper/highrise/11/index.htm)
Celts are the men that heaven made mad, For all their battles are merry and their songs are all sad.
Robots
robot.txt can exclude directories from being scanned.
The Robot will simply look for a "/robots.txt" URL on your site, where a site is defined as a HTTP server running on a particular host and port number. For example: http://w3.org/robots.txt
Also, remember that URL's are case sensitive, and "/robots.txt" must be all lower-case.
The "/robots.txt" file usually contains a record looking like this:
User-agent: *
Disallow: /cgi-bin/
Disallow: /tmp/
Disallow: /~joe/
In this example, three directories are excluded.
What you want to exclude depends on your server. Everything not explicitly disallowed is considered fair game to retrieve.
To exclude all robots from the entire server
User-agent: *
Disallow: /
To allow all robots complete access
User-agent: *
Disallow:
To exclude a single robot
User-agent: BadBot
Disallow: /
To allow a single robot
User-agent: WebCrawler
Disallow:
User-agent: *
Disallow: /
Picked it up here:Tips from Herb (http://webwi.de/data/web.htm)
------------------
mjc
Links list:Computer Links (http://www.fortunecity.com/skyscraper/highrise/11/index.htm)
Celts are the men that heaven made mad, For all their battles are merry and their songs are all sad.