Robots.txt Tester

Test and validate robots.txt files. Check which URLs are blocked or allowed for any crawler.

Enter any URL — we'll automatically fetch /robots.txt from that domain

What is robots.txt and Why Does It Matter?

The robots.txt file is a standard text file placed at the root of your website (e.g., https://example.com/robots.txt) that tells search engine crawlers which pages or sections of your site they are allowed or not allowed to visit. It follows the Robots Exclusion Protocol, a standard used by all major search engines including Google, Bing, and Yahoo.

How does this tool work?

  • Fetch & Parse: Enter any URL and the tool automatically fetches the site's robots.txt, then parses all directives grouped by user-agent.
  • URL Testing: Type any URL path and select a user-agent to instantly see if that path is allowed or blocked, with the matching rule highlighted.
  • Validation: The tool flags common mistakes like missing User-agent: *, typos in directives (e.g., Dissallow), and overly broad blocking rules.
  • Sitemap Discovery: Extracts and displays all Sitemap: references so you can verify your sitemaps are correctly declared.

Frequently Asked Questions

How do I test my robots.txt file?

Enter your website URL in the tool above. It will fetch your robots.txt, parse all directives, and let you test specific URL paths against any user-agent to see if they are allowed or blocked.

What does "Disallow: /" mean in robots.txt?

Disallow: / tells the specified crawler not to access any page on your site. Under a User-agent: * block, this blocks all search engines from crawling your entire site. This is commonly used on staging or development environments.

Why is my page not being indexed by Google?

Your robots.txt may be blocking Googlebot from crawling the page. Use the URL Tester above to check if your page's path is blocked. Common causes include overly broad Disallow rules or missing Allow exceptions for specific paths.

What's the difference between Allow and Disallow?

Disallow tells crawlers not to access a path. Allow creates an exception within a Disallow rule. For example, you can Disallow: /private/ but Allow: /private/public-page. The most specific matching rule wins.

robots.txt Examples & Patterns

These ready-to-use robots.txt examples cover the most common configurations. Click any example to load it into the tester above.

Block all crawlers

Use on staging and development environments to prevent search engines from indexing your site.

User-agent: *
Disallow: /

Test it: Enter any path like /page for user-agent *BLOCKED

Allow only Googlebot, block everything else

A common pattern on soft-launch sites — indexed by Google but invisible to other crawlers and scrapers.

User-agent: *
Disallow: /

User-agent: Googlebot
Disallow:

Test it: Path /blog/post for GooglebotALLOWED. Same path for BingbotBLOCKED

Typical website configuration

Blocks admin and internal areas while allowing all public content. Includes a Sitemap reference — best practice for SEO.

User-agent: *
Disallow: /admin/
Disallow: /login/
Disallow: /private/
Allow: /private/press-kit/
Crawl-delay: 2

Sitemap: https://example.com/sitemap.xml

Test it: Path /private/press-kit/brochure.pdfALLOWED (the Allow rule overrides the Disallow)

E-commerce site

Protects cart, checkout, and account pages from indexing while keeping product and category pages fully crawlable.

User-agent: *
Disallow: /cart/
Disallow: /checkout/
Disallow: /account/
Disallow: /order-confirmation/
Disallow: /search?*
Allow: /products/
Allow: /categories/

User-agent: GPTBot
Disallow: /

Sitemap: https://shop.example.com/sitemap.xml
Sitemap: https://shop.example.com/sitemap-products.xml

Test it: Path /search?q=shoesBLOCKED (wildcard * after ?). Path /products/running-shoesALLOWED

Common mistakes — can you spot them? ⚠

This example contains 3 real errors that will silently break your robots.txt. Load it to see our validator flag each one.

# No User-agent: * catch-all!
Disallow: /admin/

User-agent: Googlebot
Dissallow: /private/
Allow: /private/press/

Sitemap: https://example.com/sitemap.xml
  • Typo: Dissallow instead of Disallow — the rule is silently ignored
  • Missing catch-all: No User-agent: * group means other crawlers get no guidance
  • Rule before User-agent: Disallow: /admin/ appears outside any group — it's ignored