What is a robots.txt file?
A robots.txt file is a text document located in a website's root directory that tells search engine crawlers which pages or sections they should or should not crawl. It is a vital tool for managing crawl budget and preventing the indexing of sensitive or redundant parts of a site.
The robots.txt file is essentially a set of instructions for web robots (crawlers). When a search engine like Google visits a site, the first thing it looks for is this file. It uses the 'Robots Exclusion Protocol' to give commands like 'User-agent' (who the rule applies to) and 'Disallow' (which paths should be ignored). While it is excellent for preventing crawlers from wasting time on low-value pages—like your login screens, internal search results, or admin folders—it is important to note that robots.txt is not a guaranteed way to keep a page out of Google's index. If a page is blocked in robots.txt but has external links pointing to it, Google might still index the URL. To truly prevent a page from appearing in search results, a 'noindex' tag is required. Misconfiguring your robots.txt file is a common technical SEO mistake; accidentally disallowing your entire site can lead to a total loss of search visibility, so it must be handled with care.
Guide étape par étape
Locate or Create
Ensure a file named robots.txt exists in your site's root directory (e.g., example.com/robots.txt).
Define User-Agents
Specify which bots the rules apply to, using an asterisk (*) for all bots or 'Googlebot' for specific ones.
Set Disallow Rules
List the directories or specific file paths you want to keep private from search engine crawlers.
Add Sitemap Link
Include a direct link to your XML sitemap at the bottom of the file to help bots find your content.
Test for Errors
Use the Google Search Console robots.txt Tester to ensure you aren't blocking important pages.
Conseils de pro
- Use 'Disallow' for private or repetitive pages like /wp-admin/ or /search/.
- Never use robots.txt to hide sensitive user data; use password protection instead.
- Keep the syntax simple; complex rules can lead to crawling errors.
Comment pSeoMatic vous aide
Pseomatic automatically monitors your robots.txt file for unexpected changes. If a developer accidentally blocks a high-traffic section of your site, our system sends an immediate alert, preventing catastrophic drops in organic visibility before they impact your bottom line.
Essayer pSeoMatic gratuitementQuestions connexes
Can robots.txt stop a page from being indexed?
It stops crawling, but indexing can still occur if other sites link to that page. Use a noindex tag for full removal.
Where do I put the robots.txt file?
It must be placed in the main root directory of your website host.
Is robots.txt case sensitive?
Yes, both the filename and the directory paths listed within it are case sensitive.
Guides associés
Prêt à passer à l'action ?
pSeoMatic génère des milliers de pages optimisées pour le SEO à partir de vos données.