What is Robots.txt Checker?
A robots.txt checker is a program that will fetch and display a web page's robots.txt file. A robots.txt file is a text file in the root of a Website (such as example.com/robots.txt) that tells search engine robots to either crawl or not crawl any directories or pages.
Our Robots.txt Checker will fetch the robot exclusion file directly from the target server and show you what it contains to verify at a glance what the robot is or isn't being allowed to do.
What is Robots.txt Checker?
Robots.txt Checker is part of the Robots Exclusion Protocol, a protocol used by Web sites to communicate with Web crawlers and bots. It includes instructions for search engine spiders regarding your website:
- User-agent — the user agent that the rule is applied to (e.g., Googlebot or all).
- Disallow — is for excluding a specific path or directory from being followed by crawlers.
- Allow — It lets access to a specific path in a prohibited directory.
- Sitemap — redirects to points, to find the XML sitemap.
- Crawl-delay — Asks for a delay between two requests to reduce the server's load.
You can use the Robots.txt Checker to check for robots.txt errors.
- Type in a domain name (such as
example.com). - Click Check Robots.txt Checker.
- Check the original (raw) robots.txt file.
Why Check Robots.txt Checker?
- SEO health — Make sure that important pages aren't inadvertently denied access by search engines through a robots.txt file.
- Check permissions — Make sure that private or sensitive directories (e.g., /admin or /wp-admin) have appropriate permissions set to deny access.
- Check sitemap reference — Make sure that the link of your XML Sitemap is correct so that crawlers can locate it.
- Competitive analysis — Take a peek at competitor robots.txt to understand the structure of websites and their SEO strategies.
When creating a Robot.txt file, there are some common mistakes that you should avoid making.
- Disallow: / – will disallow all pages, and can drop the pages from the search results.
- No Sitemap directive — If you don't have it on your site, then it will be impossible to locate your XML sitemap efficiently.
- Rules that conflict — Rules that are both Allow and Disallow on the same path can differ for various crawlers.
- Disclosing your dir structure that you don't want to be public — A robots.txt is public and can reveal your dir structure that you don't want to be public.
- Missing wildcard entry — If a
User-agent:rule is missing, then crawlers that don't have rules will interpret the directives differently.
Example Robots.txt Checker
The basic robots.txt for a WordPress site might be:
User-agent:
Disallow: /wp-admin/
Disallow: /wp-content/
Allow: /wp-admin/admin-ajax.php
Allow: /wp-content/uploads/
Sitemap: https://example.com/sitemap.xml
Privacy
Our tool downloads the robots.txt file from the target server. There is no saving, logging or sharing of domains, results or IP addresses. No accounts, rate limits or captchas.
Frequently Asked Questions
What is the location of robots.txt file?
The website always uses the following URL for robots.txt: https://example.com/robots.txt The crawler is assumed there is no restriction when the file is not found, and will crawl the whole site.
Yes, robots.txt checker is free of charge.
Yes. No sign up, no limits, no captchas and it's 100% free.
Does a web page have access denied due to robots.txt?
No. Robots.txt Checker is not an enforced security. Valid crawlers (such as Googlebot) abide by directions, although any malicious crawler or scraper may or may not do the same. Robots.txt Checker is not intended to be used to block access to sensitive information.
How to block a particular file in robots.txt?
Include Disallow with file path. For instance: Disallow: /private-page.html would prevent any crawler from getting to that particular file.
So what is the difference between noindex and robots.txt?
A robots.txt file will tell spiders to not access a page, but does not mean that the page will not be seen in search results if other pages link to it. The noindex meta tag is used to keep the crawler from adding a page to search results. Prevent indexing by using noindex, NOT robots.txt.
Does robots.txt impact my website's SEO?
Yes. If a robot blocked by a bad robots.txt rule is important to the search engine, it won't crawl or index your important pages, and thus they won't receive as much visibility. Be sure to check your robots.txt code after any change.