AI visibility is now a real operational topic, not a speculative one.
But a lot of teams are still mixing up three different things:
- search eligibility,
- training crawlers,
- user-triggered fetchers.
If you do not separate those clearly, you end up with robots rules and infrastructure behavior that look intentional on paper but behave inconsistently in production.
Start with the simplest truth
There is no single "AI bot" setting.
Different systems use different agents for different jobs. That matters because your policy may need to allow one and restrict another.
Google is still the baseline for search eligibility
Google's current guidance is straightforward:
- AI Overviews and AI Mode are part of Search,
- pages need to be indexed and eligible to show with a snippet,
- standard Search crawling and preview controls still apply.
That means a page that is blocked, noindexed, or snippet-restricted may also limit how it appears in Google's AI search features.
OpenAI separates search, training, and user-triggered access
OpenAI now documents three distinct agents:
OAI-SearchBotfor ChatGPT search features,GPTBotfor training-related crawling,ChatGPT-Userfor certain user-triggered fetches.
That is an important distinction. A site may reasonably allow OAI-SearchBot while disallowing GPTBot. And ChatGPT-User is not the same thing as automatic web crawling.
Anthropic and Perplexity need the same careful reading
Anthropic documents separate agents such as:
ClaudeBotClaude-SearchBotClaude-User
Perplexity also documents its crawler behavior and robots handling separately.
So if your current rule is just "allow AI" or "block AI", it is probably too blunt to reflect what you actually want.
Where parity problems happen
Bot parity means different search or AI agents should not get materially different results unless you intended that difference.
Common examples:
- one bot gets HTTP 200 while another gets 429,
- one bot sees different canonical or robots directives,
- one bot gets a thinner response because of edge rate limiting or bot handling,
- a CDN or WAF treats one crawler more harshly than another.
When that happens, visibility becomes inconsistent and hard to debug.
llms.txt is useful, but it is not magic
Treat llms.txt as:
- a useful AI-facing context layer,
- a machine-readable guide to important content and policy,
- optional and still emerging.
Do not treat it as:
- access control,
- a substitute for robots rules,
- a replacement for clean page metadata and crawlable HTML.
If you publish llms.txt, keep it aligned with your actual robots policy and public legal position.
A practical AI visibility policy model
This is the simplest clean approach:
- Decide your search-bot policy.
- Decide your training-bot policy.
- Decide your user-triggered fetch policy.
- Keep canonical, robots, and preview controls consistent across allowed bots.
- Watch for rate limiting or WAF rules that accidentally create parity problems.
That gets you much closer to a deliberate policy than a one-line robots.txt edit ever will.
What to check first when something looks wrong
- Does the page return the same HTTP status to major search/AI agents?
- Do canonical and robots signals match across those agents?
- Is the page still indexable and snippet-eligible for Search?
- Is your CDN or WAF rate limiting one agent differently?
- Is
llms.txtaligned with the public policy you actually want?
Owner checklist
- [ ] Search, training, and user-triggered bot policies are separated clearly.
- [ ] Allowed bots receive consistent HTTP status, canonical, and robots signals.
- [ ] Google preview controls are set intentionally, not by accident.
- [ ]
llms.txtsupports the policy rather than contradicting it.
Where Scavo helps
Scavo checks AI crawler policy, bot parity, citation readiness, snippet controls, and machine-readable AI guidance so teams can see both the policy and the runtime outcome.
That matters because most AI visibility problems are caused by inconsistency, not by the total absence of a file.
Sources
- Google: AI Features and Your Website
- OpenAI: Overview of OpenAI crawlers
- Anthropic: Does Anthropic crawl data from the web, and how can site owners block the crawler?
- Perplexity: How does Perplexity follow robots.txt?
- llms.txt
What to do next in Scavo
- Run a fresh scan on your main domain.
- Open the matching help guide in
/help, assign an owner, and ship the smallest safe fix. - Re-scan after deployment and confirm the trend is moving in the right direction.