What is robots.txt

Technically, robots.txt is a simple text file placed in the root directory of a website (e.g., www.example.com/robots.txt). It adheres to the Robots Exclusion Standard, guidelines for how web crawlers should behave when visiting a website. This file contains instructions in the form of "directives" that tell bots which parts of the website they can and cannot crawl.

Directive Description Example
Disallow Specifies paths or patterns that the bot should not crawl. Disallow: /admin/ (disallow access to the admin directory)
Allow Explicitly permits the bot to crawl specific paths or patterns, even if they fall under a broader Disallow rule. Allow: /public/ (allow access to the public directory)
Crawl-delay Sets a delay (in seconds) between successive requests from the bot to avoid overloading the server. Crawl-delay: 10 (10-second delay between requests)
Sitemap Provides the URL to an XML sitemap for more efficient crawling. Sitemap: <https://www.example.com/sitemap.xml>