100% FREE · NO SIGNUP

Robots.txt Generator

Create a valid robots.txt file for your website. Configure crawl rules for search engines, set sitemaps, and control bot access.

Quick Presets

Allow All BotsStandard open crawling
Block All BotsPrevent all indexing
WordPressBlock wp-admin, allow wp-content
Next.js / ReactBlock _next/static, API routes
E-commerceBlock cart, checkout, account
Block AI CrawlersGPTBot, CCBot, Google-Extended

Sitemaps

Crawler Rules

Common Disallow Paths (Quick Add)

Generated robots.txt

🚀 More Free Dev Tools

Meta Tag Generator, Sitemap Builder, SEO tools, and 40+ more free developer tools.

Browse All Tools →

What Is robots.txt?

The robots.txt file tells search engine crawlers which pages or sections of your site they can or cannot access. It lives at the root of your domain (e.g., https://example.com/robots.txt) and is the first file crawlers check before indexing.

robots.txt Syntax

# Comments start with hash
User-agent: *          # Which bot this applies to (* = all)
Disallow: /private/    # Block this path
Allow: /private/public # But allow this sub-path
Crawl-delay: 10        # Wait 10 seconds between requests

Sitemap: https://example.com/sitemap.xml

Common Directives

Important Notes

Blocking AI Crawlers

To prevent AI training bots from scraping your content:

User-agent: GPTBot
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: anthropic-ai
Disallow: /

Frequently Asked Questions

What is a robots.txt file and why do I need one?

A robots.txt file is a text file placed at the root of your website that tells search engine crawlers which pages or sections to crawl or skip. It's part of the Robots Exclusion Protocol. While not mandatory, it helps manage crawl budget, prevent indexing of duplicate or private content, and communicate sitemap location to bots.

Does robots.txt actually block search engines from indexing pages?

Robots.txt disallows crawling, not indexing. If other sites link to a disallowed page, Google may still index the URL without crawling its content. To prevent indexing entirely, use a noindex meta tag or X-Robots-Tag response header instead. Robots.txt is best for managing crawl efficiency, not enforcing content privacy.

What is the difference between User-agent: * and specific bot names?

User-agent: * applies rules to all web crawlers. Specific bot names like Googlebot, Bingbot, or GPTBot target individual crawlers. Specific rules override the wildcard for that bot. Use the wildcard as a default and add specific rules to allow or restrict individual bots differently, such as blocking AI training crawlers while allowing search engines.

How do I add my sitemap to robots.txt?

Add a Sitemap directive at the end of your robots.txt file with the full URL: Sitemap: https://yourdomain.com/sitemap.xml. You can list multiple sitemaps on separate lines. Google and Bing both support this directive and will use it to discover and crawl your sitemap automatically during their next visit.

Can I block specific directories or file types in robots.txt?

Yes. Use Disallow: /private/ to block an entire directory, or Disallow: /*.pdf$ to block all PDF files using wildcards. The * wildcard matches any sequence of characters, and $ anchors to the end of the URL. You can combine multiple Disallow lines under a single User-agent to build granular crawl rules.

Related Free Tools

Meta Tag Generator Sitemap Regex Generator HTML Minifier HTTP Status Codes