Create a valid robots.txt file for your website. Configure crawl rules for search engines, set sitemaps, and control bot access.
Meta Tag Generator, Sitemap Builder, SEO tools, and 40+ more free developer tools.
Browse All Tools →The robots.txt file tells search engine crawlers which pages or sections of your site they can or cannot access. It lives at the root of your domain (e.g., https://example.com/robots.txt) and is the first file crawlers check before indexing.
# Comments start with hash User-agent: * # Which bot this applies to (* = all) Disallow: /private/ # Block this path Allow: /private/public # But allow this sub-path Crawl-delay: 10 # Wait 10 seconds between requests Sitemap: https://example.com/sitemap.xml
* for all bots.noindex meta tag for that.Allow directives; many other bots only understand Disallow./robots.txt (not /pages/robots.txt).To prevent AI training bots from scraping your content:
User-agent: GPTBot Disallow: / User-agent: CCBot Disallow: / User-agent: Google-Extended Disallow: / User-agent: anthropic-ai Disallow: /
A robots.txt file is a text file placed at the root of your website that tells search engine crawlers which pages or sections to crawl or skip. It's part of the Robots Exclusion Protocol. While not mandatory, it helps manage crawl budget, prevent indexing of duplicate or private content, and communicate sitemap location to bots.
Robots.txt disallows crawling, not indexing. If other sites link to a disallowed page, Google may still index the URL without crawling its content. To prevent indexing entirely, use a noindex meta tag or X-Robots-Tag response header instead. Robots.txt is best for managing crawl efficiency, not enforcing content privacy.
User-agent: * applies rules to all web crawlers. Specific bot names like Googlebot, Bingbot, or GPTBot target individual crawlers. Specific rules override the wildcard for that bot. Use the wildcard as a default and add specific rules to allow or restrict individual bots differently, such as blocking AI training crawlers while allowing search engines.
Add a Sitemap directive at the end of your robots.txt file with the full URL: Sitemap: https://yourdomain.com/sitemap.xml. You can list multiple sitemaps on separate lines. Google and Bing both support this directive and will use it to discover and crawl your sitemap automatically during their next visit.
Yes. Use Disallow: /private/ to block an entire directory, or Disallow: /*.pdf$ to block all PDF files using wildcards. The * wildcard matches any sequence of characters, and $ anchors to the end of the URL. You can combine multiple Disallow lines under a single User-agent to build granular crawl rules.