GEO Research · May 11, 2026

llms.txt: what it is, who needs it, and how to write one

Learn what llms.txt is, whether your site needs one in 2026, and how to write a file that AI tools and IDE agents will actually use.

The file most sites don't have yet

Somewhere around November 2024, a quiet shift happened across thousands of documentation sites. Mintlify, a documentation platform used by companies like Anthropic and Cursor, rolled out automatic llms.txt support for every site it hosts. Practically overnight, a niche proposal by Jeremy Howard of Answer.AI became a real standard with real adoption. By late 2025, BuiltWith was tracking over 844,000 sites with an llms.txt file in place.

And yet, a crawl of nearly 300,000 domains found that only 10.13% had implemented one. That gap is either a missed opportunity or a reasonable wait-and-see, depending on what your site does and who reads it. This article gives you a clear answer on which camp you're in, and a practical path to writing a file that works.

What llms.txt actually is

llms.txt is a Markdown file placed at the root of your domain, at the path https://yourdomain.com/llms.txt. It lists your most important pages, groups them by topic, and gives each one a short plain-language description. The format was proposed by Jeremy Howard in September 2024 as a way to give large language models a structured, reliable map of your content, without forcing them to crawl every page and interpret HTML, JavaScript, and navigation chrome along the way.

Think of it as sitting one layer above robots.txt and sitemap.xml. robots.txt tells crawlers what they can't touch. sitemap.xml lists URLs. llms.txt explains what matters and why, in a format that an AI agent can read in a single context window. It doesn't enforce rules. It provides guidance. The distinction matters because it shifts the framing from access control to content curation.

The file is not a ranking signal. It's a clarity signal. When an AI tool fetches your llms.txt, it gets a curated summary of your site's authoritative content instead of probabilistic guesses assembled from partial crawls.

The problem it's solving

Large language models have limited context windows. A full website, including all its HTML, navigation, footers, ads, and duplicate content, rarely fits cleanly inside one. Even when an LLM can fetch a page, converting messy HTML into useful plain text is lossy. The result is that AI systems often produce answers about a product or library that are approximately right but subtly wrong.

For documentation-heavy sites, this is a real cost. A developer asks their AI coding assistant how to authenticate with your API. The assistant pulls from a stale crawl, misses a parameter change from six months ago, and generates broken code. The developer files a bug report or posts a confused question in your community forum. That sequence plays out thousands of times a day across the ecosystem of AI-assisted development.

An llms.txt file short-circuits that chain by giving AI agents a single, maintained, authoritative source. It reduces the hallucination surface area. It doesn't eliminate wrong answers, but it narrows the gap between what the model guesses and what you actually want it to say.

Who actually needs one right now

The honest answer in mid-2026 is that llms.txt is a developer-experience play, not an SEO play. Google's John Mueller has said directly: "None of the AI services have said they're using llms.txt, and you can tell when you look at your server logs that they don't even check for it." ChatGPT, Claude, Gemini, and Perplexity do not currently show measurable citation improvements for sites that have implemented the file.

But IDE agents are a different story. Cursor, Continue, Cline, and Aider increasingly fetch llms.txt when a user points them at a documentation site. MCP server integrations from platforms like Mintlify and GitBook consume llms.txt directly. If your users are developers who work inside AI-assisted coding tools, an accurate llms.txt file can produce 100% syntactically correct code suggestions referencing your library. That means fewer support tickets, fewer GitHub issues, and a measurably better experience for the people most likely to champion your product.

Companies that have already moved include Anthropic, Cloudflare, Stripe, Vercel, Next.js, Zapier, and Supabase. The pattern is clear: API-first, developer-facing products with structured documentation. If your site fits that profile, the cost-benefit calculation is straightforward. One afternoon of work, immediate improvement in how AI tools represent your product.

For content marketing sites, SaaS landing pages, and local business sites, the calculus is murkier. SuggestedByGPT's GEO benchmark, tracking 100 queries over the past 14 days, shows our own site cited in 10 out of 100 queries. Competitors like Semrush appeared in 19, Moz in 12, Profound in 11. None of those citation rates correlate cleanly with llms.txt adoption. The factors driving AI citations right now are content quality, topical authority, and how often your brand appears in training data, not file structure. That will likely change, but it hasn't yet.

How to write an llms.txt file that works

The spec is minimal by design. The only required element is an H1 heading with your project or site name. After that, you add a blockquote with a one-to-two sentence summary of what the site covers, followed by grouped sections of links with short descriptions. That's it.

For a site under 100 pages, write it by hand. Pick your best 30 pages. Group them into three to five logical sections. Write a one-sentence description for each link that explains what a reader will find there, not what you hope they'll feel. "API authentication reference covering OAuth 2.0 and API key flows" is useful. "Our comprehensive authentication guide" is not.

The section most companies skip is the one that matters most: the exact queries you want to be the answer to. Write those out. If someone asks an AI coding assistant "how do I set up webhooks with [Your Product]," which page should it cite? Link that page. Describe it in terms that match the question. The LLM processes what you write literally, so write literally.

A few structural rules that separate effective files from placeholder ones:

For implementation details and worked examples, this guide from 2Point Agency covers the file structure clearly, and Codeboxr's complete guide includes templates for different site types.

Real-world implementations worth studying

Zapier's llms.txt is one of the more thorough public examples. Their file is API-oriented, grouping content around specific integration tasks, and they maintain both llms.txt and llms-full.txt in parallel. The llms.txt file acts as an index; llms-full.txt gives agents the full text when they need it.

Anthropic's implementation, made live through their Mintlify documentation setup, focuses on Claude's API docs and model capabilities. The descriptions are technical and direct, which matches the audience: developers building on top of the Claude API who are likely using AI tools to write that code.

Stripe's approach prioritizes API reference pages and getting-started flows, the pages most likely to be queried by a developer mid-implementation. They don't try to include every page. They include the pages that answer the questions developers actually ask.

The pattern across all three: restraint. None of these files try to be a complete site index. They're curated, maintained, and written for a specific reader with a specific intent. That discipline is harder to maintain than it sounds, especially for marketing teams conditioned to treat every page as equally important. It isn't. Pick the ones that matter and describe them honestly. For a deeper look at how this connects to your broader AI visibility strategy, the GEO fundamentals guide on the SuggestedByGPT blog covers the content signals that AI systems weight most heavily.

What the spec doesn't cover (and what to do about it)

llms.txt has no enforcement mechanism. An LLM can ignore it entirely, and the major AI search surfaces currently do. GitBook's overview of the standard notes this directly: the file is a convention, not a protocol. There's no handshake, no verification, no callback that confirms an agent read your file.

This means the file's value depends entirely on which AI tools choose to respect it. Right now, IDE agents respect it. AI search surfaces mostly don't. That balance will shift as the standard matures and as more AI tools build explicit support for it, but you're writing for the audience that reads it today, not the hypothetical future audience.

The practical implication: treat llms.txt as infrastructure for your developer-facing content, write it carefully, and don't expect it to move your citation rates in Perplexity or ChatGPT. Those are driven by different factors. Codersera's 2026 analysis puts this well: the file is a floor, not a ceiling. It ensures AI tools that do look for it get something accurate. What they do with that accuracy is still up to them.

The concrete next steps

If you run a developer tool, API product, or documentation-heavy site, write your llms.txt this week. Spend one afternoon on it. Pick 20 to 30 pages, group them by task, write honest descriptions, and put the file at your root domain. Check your server logs in 30 days to see which AI user agents are fetching it. That data will tell you more about your specific audience than any general adoption curve.

If you run a content site or SaaS marketing site, add it to your roadmap for Q3 and revisit when the major AI search platforms announce explicit support. The cost is low enough that early adoption carries little risk, but the current evidence doesn't justify treating it as urgent.

Either way, don't write a file full of marketing language and assume it helps. LLMs read literally. A description that would embarrass you in a technical conversation should be cut.

Monitoring whether any of this moves your actual citation rates requires tracking infrastructure that most sites don't have yet. SuggestedByGPT was built specifically to track how often AI systems cite your brand across queries, giving you the benchmark data to know whether changes like adding llms.txt are producing measurable results. If you want to know where you actually stand in AI search today, start your free analysis at SuggestedByGPT and get a citation benchmark against the tools your audience uses most.

See how AI describes your business

Run a free 60-second scan against ChatGPT, Gemini, Claude, and Perplexity. Get your visibility score in a personalized PDF.

Run the free scan

← Back to all research