In April 2026, Cloudflare put a public number on something many website teams have been feeling for a while: most sites are still built for humans and search engines, not for AI agents.
That does not mean every business needs to publish an MCP server, accept agent payments, or rebuild its website around a new acronym. It does mean your public site now has another operational surface to keep healthy: can automated systems discover the right pages, understand what they are allowed to do, read the content cleanly, and cite the current version instead of an outdated one?
That is the practical version of agent readiness.
What changed
Cloudflare launched isitagentready.com on 17 April 2026 and added Agent Readiness signals to Cloudflare Radar and URL Scanner. Their scan looks at discoverability, content accessibility, bot access control, protocol discovery, and commerce-related standards.
The early data is useful because it shows the web is not there yet:
- Cloudflare scanned a filtered set of the 200,000 most visited domains.
- 78% had a
robots.txtfile, but most were written for traditional crawlers rather than agents. - 4% declared AI usage preferences through Content Signals.
- 3.9% supported markdown content negotiation.
- MCP Server Cards and API Catalogs appeared on fewer than 15 sites in the dataset.
The important takeaway is not "chase every new standard today". It is that agent-facing website signals are becoming measurable, and the easy basics are still where most teams are exposed.
The difference between search, training, and agent access
This is where a lot of advice gets muddled.
Search visibility, model training, and user-driven agent access are not the same thing.
OpenAI documents separate crawler identities for different purposes. For example, OAI-SearchBot is used for surfacing websites in ChatGPT search, while GPTBot is associated with training. That means a site may reasonably allow one and disallow the other.
Google's guidance is also deliberately conservative: to appear in Google AI features, pages still need to be crawlable, indexable, and eligible for snippets. Google says there is no special schema.org markup, AI file, or machine-readable file required for AI Overviews or AI Mode.
So the safe baseline is:
- keep public pages crawlable and indexable if you want discovery,
- use
noindexand snippet controls only when you mean them, - distinguish search access from AI training access,
- document AI preferences in one place instead of scattering contradictory rules across CDN, app, and robots layers.
What to fix first
If you run a normal SaaS, agency, publisher, ecommerce, or brochure site, start here.
1. Make discovery boring and reliable
Your robots.txt should return 200 OK, be parseable as plain text, include the right sitemap URLs, and avoid accidentally blocking your main public pages.
This sounds basic, but it is still the root of many AI visibility issues. If agents, search crawlers, and scanners cannot find the same canonical pages, everything downstream gets noisier.
2. Make the policy explicit
A modern crawl policy should answer four questions:
- Can search crawlers index and show this content?
- Can AI answer engines use this content for grounding or real-time answers?
- Can training crawlers use this content for model training?
- Are private, account, checkout, or internal paths blocked consistently?
Cloudflare's managed robots.txt documentation is a useful reminder that robots rules are voluntary. They express preferences; they do not technically prevent every crawler from accessing content. If you need enforcement, you still need CDN/WAF controls.
3. Add Content Signals if your position is clear
Content Signals let a site express preferences such as ai-train=no, ai-input=yes, and search=yes inside robots.txt.
This is still emerging, so do not treat it as magic protection. Treat it as a clearer machine-readable declaration of intent. It is especially useful when the business position is nuanced: for example, "we want search and answer-engine citation, but we do not want model training".
4. Publish a useful llms.txt, not a huge one
A good llms.txt is not a dump of every URL on your site. It is a short, curated reading list for agents:
- who you are,
- what the site offers,
- which pages are authoritative,
- which docs, pricing, policies, or guides should be preferred,
- when the information was last reviewed.
Cloudflare's write-up is useful here because they point out a real failure mode: huge files and low-value directory pages force agents into repeated searching, which increases tokens, latency, and error risk.
5. Keep canonical content current
AI systems can reuse stale content for a long time. If old docs, retired product pages, or legacy policy pages remain online, agents may treat them as current unless the machine-readable signals are strong.
That means canonical tags, redirects, sitemap freshness, visible deprecation copy, and internal links need to agree. Cloudflare's separate work on redirecting verified AI training crawlers away from deprecated canonical content is a useful signal of where this is going.
6. Consider markdown negotiation for documentation-heavy sites
If you have docs, guides, API references, or long knowledge-base pages, markdown negotiation is becoming worth testing.
The idea is simple: when a client requests Accept: text/markdown, return a cleaner markdown version instead of heavy HTML. Cloudflare says this can reduce token usage substantially, and their own docs benchmark reported lower token use and faster answers after refining their agent-facing structure.
For a small marketing site, this is optional. For a large documentation or support site, it is moving into "worth planning" territory.
7. Do not implement protocol standards just to collect badges
MCP Server Cards, API Catalogs, Agent Skills, OAuth protected resource metadata, WebMCP, x402, and agentic commerce standards are real, but they are not all relevant to every business.
Use this rule:
- If you only publish public marketing content, focus on discovery, crawl policy, citations, and readable content.
- If you have public APIs or developer docs, consider API Catalogs, markdown content, and agent-friendly docs structure.
- If agents need to act on behalf of users, look at OAuth discovery and protected resource metadata.
- If agents should call your tools, then MCP and related discovery documents become relevant.
- If agents should buy from you, then commerce protocols are worth tracking, but probably not your first fix.
Agent readiness should describe what your site genuinely supports, not invent a fake capability layer.
A practical 30-minute audit
Here is the quick version I would run today:
- Fetch
/robots.txtand confirm it returns clean text, includes sitemaps, and does not contradict your business policy. - Check whether AI/search crawlers are treated differently from normal search crawlers, and whether that is intentional.
- Review
noindex,nosnippet,max-snippet, and canonical tags on your main commercial pages. - Open
/llms.txtif you have one. If it is missing, decide whether a short curated version would help. - Test one content page with
Accept: text/markdownif your site has docs or long-form guides. - Confirm stale pages redirect or clearly point to current canonical versions.
- Re-run after CDN, CMS, or plugin changes, because these signals drift quietly.
Where Scavo helps
Scavo is not trying to make every site pretend it is an agent platform.
The useful part is continuous monitoring of the signals that matter for normal businesses:
robots.txthealth and sitemap discovery,- AI crawler policy and access parity,
- Content Signals parsing,
llms.txtavailability and quality,- citation metadata and snippet-control safety,
- markdown negotiation where relevant,
- protocol discovery documents when they are actually published,
- Web Bot Auth signals for bots and agents that need verifiable identity.
That makes agent readiness less of a one-off launch task and more like uptime, security headers, or Core Web Vitals: a living website health check that can regress after a theme update, CDN toggle, plugin change, or rushed deployment.
What to do next in Scavo
- Run a fresh scan on your homepage and your most important landing page.
- Open the AI Visibility checks first: crawler policy, bot parity,
llms.txt, Content Signals, citation readiness, and markdown negotiation. - Ignore optional emerging-protocol checks unless they match something your site genuinely offers.
- Re-scan after you change robots, CDN bot settings, canonical tags, or public docs.
Sources
- Cloudflare: Introducing the Agent Readiness score
- Cloudflare: Is Your Site Agent-Ready?
- Cloudflare Docs: Managed robots.txt and Content Signals
- Cloudflare: Redirects for AI Training enforces canonical content
- OpenAI: Overview of OpenAI Crawlers
- OpenAI Help: Publishers and Developers FAQ
- Google Search Central: AI Features and Your Website