Robots.txt Generator

What Is robots.txt?

The robots.txt file tells search engine crawlers which pages or sections of your site they can or cannot access. It lives at the root of your domain (e.g., https://example.com/robots.txt) and is the first file crawlers check before indexing.

robots.txt Syntax

# Comments start with hash User-agent: * # Which bot this applies to (* = all) Disallow: /private/ # Block this path Allow: /private/public # But allow this sub-path Crawl-delay: 10 # Wait 10 seconds between requests Sitemap: https://example.com/sitemap.xml

Common Directives

User-agent: — Specifies which crawler the rules apply to. Use * for all bots.

Disallow: — Blocks the crawler from accessing a path. An empty value means "allow everything."

Allow: — Explicitly allows a path (overrides Disallow for Google, Bing).

Crawl-delay: — Seconds between requests (not supported by Google, but Bing and others respect it).

Sitemap: — Points to your XML sitemap. Can have multiple entries.

Important Notes

robots.txt is advisory, not enforced. Well-behaved bots follow it, malicious ones ignore it.

Blocking a page with robots.txt does not remove it from search results — use noindex meta tag for that.

Google treats Allow directives; many other bots only understand Disallow.

The file must be at the exact root: /robots.txt (not /pages/robots.txt).

Test your robots.txt with Google Search Console.

Blocking AI Crawlers

To prevent AI training bots from scraping your content:

User-agent: GPTBot Disallow: / User-agent: CCBot Disallow: / User-agent: Google-Extended Disallow: / User-agent: anthropic-ai Disallow: /

Frequently Asked Questions

What is a robots.txt file and why do I need one?

A robots.txt file is a text file placed at the root of your website that tells search engine crawlers which pages or sections to crawl or skip. It's part of the Robots Exclusion Protocol. While not mandatory, it helps manage crawl budget, prevent indexing of duplicate or private content, and communicate sitemap location to bots.

Does robots.txt actually block search engines from indexing pages?

Robots.txt disallows crawling, not indexing. If other sites link to a disallowed page, Google may still index the URL without crawling its content. To prevent indexing entirely, use a noindex meta tag or X-Robots-Tag response header instead. Robots.txt is best for managing crawl efficiency, not enforcing content privacy.

What is the difference between User-agent: * and specific bot names?

User-agent: * applies rules to all web crawlers. Specific bot names like Googlebot, Bingbot, or GPTBot target individual crawlers. Specific rules override the wildcard for that bot. Use the wildcard as a default and add specific rules to allow or restrict individual bots differently, such as blocking AI training crawlers while allowing search engines.

How do I add my sitemap to robots.txt?

Add a Sitemap directive at the end of your robots.txt file with the full URL: Sitemap: https://yourdomain.com/sitemap.xml. You can list multiple sitemaps on separate lines. Google and Bing both support this directive and will use it to discover and crawl your sitemap automatically during their next visit.

Can I block specific directories or file types in robots.txt?

Yes. Use Disallow: /private/ to block an entire directory, or Disallow: /*.pdf$ to block all PDF files using wildcards. The * wildcard matches any sequence of characters, and $ anchors to the end of the URL. You can combine multiple Disallow lines under a single User-agent to build granular crawl rules.

Related Free Tools

Meta Tag Generator Sitemap Regex Generator HTML Minifier HTTP Status Codes

Quick Presets

Sitemaps

Crawler Rules

Common Disallow Paths (Quick Add)

Generated robots.txt

🚀 More Free Dev Tools