AI Agent Readiness Is the New Website Health Check: What to Fix First
Cloudflare's 2026 Agent Readiness data shows the web still has basic AI visibility gaps.
Read articleAI crawler access, llms.txt, citation signals, and machine-readable structure.
Cloudflare's 2026 Agent Readiness data shows the web still has basic AI visibility gaps.
Read articleContent Signals let you publish machine-readable AI usage preferences in robots.txt. This check is optional, but if you use it, the syntax needs to be valid and aligned with your actual crawl policy.
Open guideThis check is about consistency. Scavo compares page-level opt-out directives, wildcard robots behavior, and `/llms.txt` so you can catch mixed signals before they create visibility drift or internal confusion.
Open guideLink headers give agents a lightweight way to discover machine-readable resources such as API catalogs and service descriptions. If you advertise them, they need to resolve cleanly.
Open guideServing Markdown or plain text when a client explicitly asks for it can make content cheaper and easier for AI systems to parse. It is optional, but the request should never break your route.
Open guideIf agents or third-party clients need OAuth to access your service, they should be able to discover the correct authorization metadata without guessing endpoints by hand.
Open guideIf your service exposes machine-usable tools, APIs, or agent endpoints, discovery documents can tell clients what exists before they start guessing. Broken discovery docs create more friction than having none at all.
Open guideWeb Bot Auth lets bots prove who they are with signed HTTP requests and published keys. It is optional for most sites, but if you use it, the key directory must be valid and public.
Open guideClaudeBot saw the highest growth in block rates — increasing 32.67% year-over-year (EngageCoders, 2024). If you block AI crawlers while allowing Googlebot, you're letting Google use your content in its AI products (Gemini, AI Overviews) while excluding others. Consider whether this asymmetry aligns with your content strategy, or whether parity across all bots better serves your interests.
Open guide44.2% of AI citations come from the first 30% of content (Profound), so front-loading key facts matters. AI models work better with structured, chunked content — clear headers, concise paragraphs, fact boxes, and attributed claims. Walls of unstructured text force AI to guess at relevance, reducing your chances of being cited or recommended in AI-generated responses.
Open guideNone of the major AI crawlers (GPTBot, ClaudeBot) render JavaScript — analysis of over half a billion GPTBot requests found zero evidence of JS execution (Vercel). ChatGPT fetches HTML content 57.7% of the time while Claude focuses on images at 35.2% (SearchViu). If your important content only exists after JavaScript runs, AI models can't see it, cite it, or recommend it.
Open guideIf you set nosnippet to control how Google displays snippets, you're also preventing AI models from processing your content. Similarly, max-snippet:50 limits what AI can extract. This may be exactly what you want — but if you're trying to increase AI visibility while restricting snippets, these directives work against you. Review whether your snippet controls match your AI strategy.
Open guideAI models have context window limits — typically 128K tokens (~90K words) for the largest models, but effective processing degrades well before that limit. Extremely long pages get truncated, and AI models struggle to extract meaning from walls of undifferentiated text. Breaking content into clearly headed, focused sections lets AI extract the most relevant parts even from longer pages.
Open guide44.2% of all LLM citations come from the first 30% of text, with content depth and readability being the most important factors for citation (Profound). AI-driven referral traffic increased more than tenfold from July 2024 to February 2025, with 87.4% coming from ChatGPT (Adobe). To be cited, your content needs clear, fact-based claims with attribution — not just narrative prose.
Open guideGPTBot is blocked by 5.89% of all websites, with 35.7% of the top 1,000 sites blocking it (Ahrefs, 2024). Nearly 38% of indexed sites now have AI-specific restrictions, up from 8% in 2023 (EngageCoders). If you don't set explicit policy, you can't control whether your content appears in AI products or training data. A deliberate policy — whether allowing or blocking — is better than leaving it undefined.
Open guidellms.txt is an emerging standard that tells AI models how you want your content cited and used. Adoption is still early — only 10.13% of domains have one (llms-txt.io, 2025) — but adoption among developer-focused companies grew 600% in early 2025. Cloudflare, Vercel, Anthropic, and Astro already support it. Adding one now puts you ahead of 90% of websites.
Open guideHow to think about AI crawler access, user-triggered fetchers, and parity problems without guesswork or hype.
Read article