AI Bot Policy Not Set in robots.txt

Start here

Before You Fix It: What This Check Means

AI crawler policy checks whether major AI user agents receive intentional, explicit directives. In plain terms, this tells you whether AI crawlers and answer systems can understand and reuse your content correctly. Scavo reads `robots.txt` status/content and evaluates root-path policy for key AI crawlers.

Why this matters in practice: unclear machine-facing signals can reduce retrieval quality and citation consistency.

How to use this result: treat this as directional evidence, not final truth. Bot access outcomes can vary by edge controls, geo policies, and temporary WAF behavior. First, confirm the issue in live output: verify bot-facing output and policy files on the final URL Then ship one controlled change: Normalize robots groups and remove contradictory root rules. Finally, re-scan the same URL to confirm the result improves.

Background sources

TL;DR: Your robots.txt doesn't specify rules for AI crawlers like GPTBot or ClaudeBot, leaving your AI visibility to chance.

GPTBot is blocked by 5.89% of all websites, with 35.7% of the top 1,000 sites blocking it (Ahrefs, 2024). Nearly 38% of indexed sites now have AI-specific restrictions, up from 8% in 2023 (EngageCoders). If you don't set explicit policy, you can't control whether your content appears in AI products or training data. A deliberate policy — whether allowing or blocking — is better than leaving it undefined.

What Scavo checks (quick version)

Scavo reads robots.txt status/content and evaluates root-path policy for key AI crawlers:

GPTBot
ChatGPT-User
OAI-SearchBot
ClaudeBot
anthropic-ai
PerplexityBot
Google-Extended
CCBot

Scavo classifies each agent policy as explicit/wildcard and state (allowed, blocked, mixed, unspecified).

How Scavo scores this check

Result behavior:

Warning: robots.txt missing/empty (404 or blank)
Fail: wildcard root block with no explicit per-bot exceptions
Info: robots policy unavailable in scan
Info: default/wildcard policy is clean but no explicit bot rules
Warning: one or more agents blocked or mixed/conflicting
Pass: explicit, consistent per-bot policy with no critical conflicts

In your scan report, this appears under What failed / What needs attention / What is working for ai_crawler_policy, followed by Recommended next steps and Technical evidence (for developers) when needed.

Scan key: ai_crawler_policy
Category: AI_VISIBILITY

Why fixing this matters

Policy clarity matters more than hype. Teams need to know whether they are intentionally visible, intentionally restricted, or unintentionally drifting due to inherited wildcard rules.

Without explicit policy, legal, content, and engineering can make conflicting assumptions about AI usage rights and discoverability.

Common reasons this check flags

User-agent: * with broad Disallow: / and no AI exceptions.
Duplicate groups produce mixed allow/disallow outcomes.
Robots exists but does not mention any modern AI agents explicitly.
Production robots differs from documented policy in legal/content docs.

If you are not technical

Decide business stance for each major crawler class (allow, block, conditional).
Ensure legal/comms language matches technical robots policy.
Ask engineering for one plain-language matrix by bot.
Re-run Scavo and check blocked/mixed counts.

Scavo flagged AI Crawler Policy (ai_crawler_policy). Please clean up robots.txt so each target AI bot has clear root-path intent (allow/block), remove mixed directives, and document policy decisions for legal/content stakeholders.

If you are technical

Normalize robots groups and remove contradictory root rules.
Add explicit directives for bots you intentionally manage.
Avoid relying on ambiguous inherited wildcard behavior for critical decisions.
Keep one source-of-truth robots file under version control.
Reconcile legal policy text with actual robots directives.

How to verify

Fetch live robots.txt from production.
Parse each target bot and confirm single clear root policy.
Confirm wildcard behavior does not accidentally override intent.
Re-run Scavo and verify policy score + blocked/mixed counts improve.

What this scan cannot confirm

It does not enforce contractual/legal licensing terms by itself.
It does not guarantee third-party compliance beyond published robots norms.
It does not test all non-standard/private crawler identities.

Owner checklist

[ ] Assign owner for AI crawler stance and robots implementation.
[ ] Keep a reviewed bot policy matrix (intent + technical directive).
[ ] Version control robots updates with approval trail.
[ ] Audit robots policy after CDN/security migrations.

FAQ

Is blocking all AI crawlers always wrong?

No. It can be intentional. The issue is unintentional or undocumented blocking.

Why does missing robots.txt return warning instead of fail?

Because policy is undefined rather than explicitly contradictory, but it still increases control risk.

Why include Google-Extended separately?

It allows policy distinction for AI-related usage controls beyond standard crawl/index behavior.

Should we list every bot on earth?

No. Prioritize major bots relevant to your business and review regularly.

Sources

Need a bot-policy matrix draft (intent + robots syntax) your team can approve quickly? Send support your preferred allow/block stance per crawler.

AI Bot Policy Not Set in robots.txt

Before You Fix It: What This Check Means

Background sources

What Scavo checks (quick version)

How Scavo scores this check

Why fixing this matters

Common reasons this check flags

If you are not technical

If you are technical

How to verify

What this scan cannot confirm

Owner checklist

FAQ

Is blocking all AI crawlers always wrong?

Why does missing robots.txt return warning instead of fail?

Why include Google-Extended separately?

Should we list every bot on earth?

Sources

More checks in this area

AI Crawlers Blocked More Restrictively Than Search Engines

Content Not Structured for AI Processing

Content Not Structured for AI Citation

AI Bot Policy Not Set in robots.txt

Before You Fix It: What This Check Means

Background sources

What Scavo checks (quick version)

How Scavo scores this check

Why fixing this matters

Common reasons this check flags

If you are not technical

If you are technical

How to verify

What this scan cannot confirm

Owner checklist

FAQ

Is blocking all AI crawlers always wrong?

Why does missing robots.txt return warning instead of fail?

Why include Google-Extended separately?

Should we list every bot on earth?

Sources

More checks in this area

AI Crawlers Blocked More Restrictively Than Search Engines

Content Not Structured for AI Processing

Content Not Structured for AI Citation

Can we use optional cookies?

Essential

Preferences

Engagement

Analytics

Optional browser storage