Start here
Before You Fix It: What This Check Means
AI crawler policy checks whether major AI user agents receive intentional, explicit directives. In plain terms, this tells you whether AI crawlers and answer systems can understand and reuse your content correctly. Scavo reads `robots.txt` status/content and evaluates root-path policy for key AI crawlers.
Why this matters in practice: unclear machine-facing signals can reduce retrieval quality and citation consistency.
How to use this result: treat this as directional evidence, not final truth. Bot access outcomes can vary by edge controls, geo policies, and temporary WAF behavior. First, confirm the issue in live output: verify bot-facing output and policy files on the final URL Then ship one controlled change: Normalize robots groups and remove contradictory root rules. Finally, re-scan the same URL to confirm the result improves.
Background sources
TL;DR: Your robots.txt doesn't specify rules for AI crawlers like GPTBot or ClaudeBot, leaving your AI visibility to chance.
GPTBot is blocked by 5.89% of all websites, with 35.7% of the top 1,000 sites blocking it (Ahrefs, 2024). Nearly 38% of indexed sites now have AI-specific restrictions, up from 8% in 2023 (EngageCoders). If you don't set explicit policy, you can't control whether your content appears in AI products or training data. A deliberate policy — whether allowing or blocking — is better than leaving it undefined.
What Scavo checks (plain English)
Scavo reads robots.txt status/content and evaluates root-path policy for key AI crawlers:
- GPTBot
- ChatGPT-User
- OAI-SearchBot
- ClaudeBot
- anthropic-ai
- PerplexityBot
- Google-Extended
- CCBot
Scavo classifies each agent policy as explicit/wildcard and state (allowed, blocked, mixed, unspecified).
How Scavo scores this check
Result behavior:
- Warning: robots.txt missing/empty (
404or blank) - Fail: wildcard root block with no explicit per-bot exceptions
- Info: robots policy unavailable in scan
- Info: default/wildcard policy is clean but no explicit bot rules
- Warning: one or more agents blocked or mixed/conflicting
- Pass: explicit, consistent per-bot policy with no critical conflicts
In your scan report, this appears under What failed / What needs attention / What is working for ai_crawler_policy, followed by Recommended next steps and Technical evidence (for developers) when needed.
- Scan key:
ai_crawler_policy - Category:
AI_VISIBILITY
Why fixing this matters
Policy clarity matters more than hype. Teams need to know whether they are intentionally visible, intentionally restricted, or unintentionally drifting due to inherited wildcard rules.
Without explicit policy, legal, content, and engineering can make conflicting assumptions about AI usage rights and discoverability.
Common reasons this check flags
User-agent: *with broadDisallow: /and no AI exceptions.- Duplicate groups produce mixed allow/disallow outcomes.
- Robots exists but does not mention any modern AI agents explicitly.
- Production robots differs from documented policy in legal/content docs.
If you are not technical
- Decide business stance for each major crawler class (allow, block, conditional).
- Ensure legal/comms language matches technical robots policy.
- Ask engineering for one plain-language matrix by bot.
- Re-run Scavo and check blocked/mixed counts.
Technical handoff message
Copy and share this with your developer.
Scavo flagged AI Crawler Policy (ai_crawler_policy). Please clean up robots.txt so each target AI bot has clear root-path intent (allow/block), remove mixed directives, and document policy decisions for legal/content stakeholders.If you are technical
- Normalize robots groups and remove contradictory root rules.
- Add explicit directives for bots you intentionally manage.
- Avoid relying on ambiguous inherited wildcard behavior for critical decisions.
- Keep one source-of-truth robots file under version control.
- Reconcile legal policy text with actual robots directives.
How to verify
- Fetch live
robots.txtfrom production. - Parse each target bot and confirm single clear root policy.
- Confirm wildcard behavior does not accidentally override intent.
- Re-run Scavo and verify policy score + blocked/mixed counts improve.
What this scan cannot confirm
- It does not enforce contractual/legal licensing terms by itself.
- It does not guarantee third-party compliance beyond published robots norms.
- It does not test all non-standard/private crawler identities.
Owner checklist
- [ ] Assign owner for AI crawler stance and robots implementation.
- [ ] Keep a reviewed bot policy matrix (intent + technical directive).
- [ ] Version control robots updates with approval trail.
- [ ] Audit robots policy after CDN/security migrations.
FAQ
Is blocking all AI crawlers always wrong?
No. It can be intentional. The issue is unintentional or undocumented blocking.
Why does missing robots.txt return warning instead of fail?
Because policy is undefined rather than explicitly contradictory, but it still increases control risk.
Why include Google-Extended separately?
It allows policy distinction for AI-related usage controls beyond standard crawl/index behavior.
Should we list every bot on earth?
No. Prioritize major bots relevant to your business and review regularly.
Sources
- RFC 9309: Robots Exclusion Protocol
- Google Search Central: robots.txt reference
- OpenAI: Search crawler and GPTBot documentation
- Anthropic: Web crawler guidance
- Perplexity: Crawler and robots policy docs
Need a bot-policy matrix draft (intent + robots syntax) your team can approve quickly? Send support your preferred allow/block stance per crawler.