What is a robots.txt file?
robots.txt (always lowercase) is a plain text file stored in the root directory of a website. It functions as a communication bridge between the site owner and web crawlers (also known as spiders or bots).
Technical Requirement: Because many systems are case-sensitive, the filename must be strictly
robots.txt and placed at the absolute root of the domain.
The protocol is based on the "Robots Exclusion Standard," an ethical framework established by the global internet community. It is built on two core principles:
- Search technology should serve humanity while respecting the provider's intent and privacy.
- Websites have an obligation to protect user information from unauthorized access.