llms.txt File Missing — No AI Model Guidance

llms.txt is an emerging standard that tells AI models how you want your content cited and used. Adoption is still early — only 10.13% of domains have one (llms-txt.io, 2025) — but adoption among developer-focused companies grew 600% in early 2025. Cloudflare, Vercel, Anthropic, and Astro already support it. Adding one now puts you ahead of 90% of websites.

Start here

Before You Fix It: What This Check Means

llms.txt is an emerging convention for machine-readable guidance aimed at LLM retrieval workflows. In plain terms, this tells you whether AI crawlers and answer systems can understand and reuse your content correctly. Scavo requests `https://your-domain/llms.txt` and scores it for baseline quality.

Why this matters in practice: unclear machine-facing signals can reduce retrieval quality and citation consistency.

How to use this result: treat this as directional evidence, not final truth. Answer-engine retrieval behavior can shift over time even when your technical setup is stable. First, confirm the issue in live output: verify bot-facing output and policy files on the final URL Then ship one controlled change: Serve `/llms.txt` as plain text/markdown on your primary host. Finally, re-scan the same URL to confirm the result improves.

Background sources

TL;DR: Your site doesn't have an llms.txt file, so AI models have no structured guidance on how to use or cite your content.

llms.txt is an emerging standard that tells AI models how you want your content cited and used. Adoption is still early — only 10.13% of domains have one (llms-txt.io, 2025) — but adoption among developer-focused companies grew 600% in early 2025. Cloudflare, Vercel, Anthropic, and Astro already support it. Adding one now puts you ahead of 90% of websites.

What Scavo checks (plain English)

Scavo requests https://your-domain/llms.txt and scores it for baseline quality.

What is evaluated:

  • endpoint availability (200 vs missing/unreachable)
  • whether the file has at least one Markdown heading
  • whether a summary/overview section is present
  • whether a links/resources section is present
  • whether at least 2 valid absolute URLs are included
  • whether there is a freshness hint (for example Last updated: 2026-03-02)

How results are assigned:

  • Info: file is missing (404) and Scavo suggests publishing one
  • Warning: file exists but is empty, unreachable, or missing key quality signals
  • Fail: file exists but is structurally weak (for example no valid links and no core sections)
  • Pass: structure, links, and freshness signals are all present

How Scavo scores this check

Scavo assigns one result state for this check on the tested page:

  • Pass: baseline signals for this check were found.
  • Warning: partial coverage or risk signals were found and should be reviewed.
  • Fail: required signals were missing or risky behavior was confirmed.
  • Info: Scavo could not gather enough reliable evidence on this run to score pass/fail confidently.

In your scan report, this appears under What failed / What needs attention / What is working for llms_txt, followed by Recommended next steps and Technical evidence (for developers) when needed.

  • Scan key: llms_txt
  • Category: AI_VISIBILITY

Why fixing this matters

Without a clear machine-readable summary, AI systems can rely on scattered pages, stale cached text, or low-value URLs when forming answers. That can reduce citation quality and increase incorrect summaries.

A good llms.txt also helps your own team: it forces one canonical list of "what should be read first" for docs, pricing, policies, and contact/support pages.

If you are not technical

  1. Ask for one owner of llms.txt (usually content or SEO + engineering).
  2. Provide a list of your top canonical pages (homepage, pricing, docs/help, legal, contact).
  3. Require plain-language summary text that matches your current product positioning.
  4. Re-run Scavo after publishing and confirm the check improves.

Technical handoff message

Copy and share this with your developer.

Scavo flagged LLMs.txt quality (llms_txt). Please publish or improve /llms.txt with clear headings, a short summary, at least two valid canonical URLs, and an update timestamp. Share the live file and re-run Scavo.

If you are technical

  1. Serve /llms.txt as plain text/markdown on your primary host.
  2. Add a concise summary section describing what the site offers and who it serves.
  3. Add a resources section with canonical, stable URLs only.
  4. Add a freshness line and update it when core pages change materially.
  5. Keep this file in version control and review after major navigation/content releases.

Minimal structure that usually passes

# Summary
Scavo helps teams monitor website health, compliance, and SEO signals.

## Resources
- https://example.com/
- https://example.com/pricing
- https://example.com/help
- https://example.com/contact

Last updated: 2026-03-02

How to verify

  • curl -i https://your-domain/llms.txt returns 200.
  • Content is readable and not HTML fallback output.
  • At least two valid absolute URLs resolve to canonical public pages.
  • Freshness hint is present and current enough for your release cadence.
  • Re-run Scavo and confirm llms_txt moves to Pass or a smaller warning set.

What this scan cannot confirm

  • It cannot guarantee how every AI product will rank or cite your content.
  • It does not validate business/legal correctness of linked pages.
  • It does not replace broader AI visibility controls such as robots policy and structured data.

Owner checklist

  • [ ] Assign owner for llms.txt and document where it lives in the codebase.
  • [ ] Keep URLs canonical and remove temporary campaign/query links.
  • [ ] Review after navigation, docs IA, or pricing/legal URL changes.
  • [ ] Include llms.txt in deployment smoke tests.

FAQ

Is llms.txt an official web standard like robots.txt?

Not today. It is an emerging convention, so quality and consistency matter more than strict syntax perfection.

Should we put every URL in llms.txt?

No. Keep it curated. Add only high-value canonical pages you want systems to prioritize.

Can we include private URLs?

Do not include private/internal endpoints. Treat this as a public guidance file.

How often should we update it?

Update whenever your core positioning, product docs structure, pricing URL, or support/legal paths change.

Sources


Need a reviewed llms.txt draft aligned to your live information architecture? Send support your canonical URL set.

More checks in this area

ai_bot_access_parity

AI Crawlers Blocked More Restrictively Than Search Engines

ClaudeBot saw the highest growth in block rates — increasing 32.67% year-over-year (EngageCoders, 2024). If you block AI crawlers while allowing Googlebot, you're letting Google use your content in its AI products (Gemini, AI Overviews) while excluding others. Consider whether this asymmetry aligns with your content strategy, or whether parity across all bots better serves your interests.

Open guide
ai_chunkability

Content Not Structured for AI Processing

44.2% of AI citations come from the first 30% of content (Profound), so front-loading key facts matters. AI models work better with structured, chunked content — clear headers, concise paragraphs, fact boxes, and attributed claims. Walls of unstructured text force AI to guess at relevance, reducing your chances of being cited or recommended in AI-generated responses.

Open guide
ai_citation_readiness

Content Not Structured for AI Citation

44.2% of all LLM citations come from the first 30% of text, with content depth and readability being the most important factors for citation (Profound). AI-driven referral traffic increased more than tenfold from July 2024 to February 2025, with 87.4% coming from ChatGPT (Adobe). To be cited, your content needs clear, fact-based claims with attribution — not just narrative prose.

Open guide