What problem does llms.txt solve?
An AI agent landing on your site has a context budget. Crawling every page to figure out what you do burns tokens on navigation chrome, footers, and duplicate content. llms.txt hands the agent a curated index instead: here's who we are, here are the pages that answer real questions, skip the rest. For retrieval-augmented answers, that curation nudges engines toward your best content rather than whatever page happened to get fetched.
What's the format?
Plain markdown, served as text/plain at /llms.txt:
# Your Company — one-line positioning
> Two or three sentences on what you do and who it's for.
## Pages
- [Page title](https://yoursite.com/page): One line on what this page answers.
- [Another page](https://yoursite.com/other): Same idea.
## Blog Posts
- [Post title](https://yoursite.com/blog/slug): What the post covers.
Rules of thumb: absolute URLs, one honest descriptive line per link (the descriptions are what agents read to decide relevance), most important content first, and keep it maintained — a stale llms.txt pointing at dead pages is worse than none.
llms.txt vs llms-full.txt vs robots.txt
| File | Job | Audience |
|---|---|---|
| robots.txt | Access control: what crawlers may fetch | All crawlers |
| llms.txt | Curation: what's worth reading, with descriptions | LLM agents |
| llms-full.txt | Content delivery: full text of key pages in one file | LLM agents with bigger context budgets |
They compose: robots.txt should allow AI crawlers and can point to your sitemap; llms.txt indexes your best pages; llms-full.txt (optional) inlines their content. We serve all three — robots.txt, llms.txt, llms-full.txt — and generate the latter two dynamically so new blog posts appear without manual edits.
How do you add one?
- Write the header: site name, one-line positioning, a blockquote summary an agent could quote verbatim.
- List 10-30 pages grouped into sections (Pages, Guides, Blog), each with a one-line description that says what question the page answers.
- Serve it at /llms.txt with Content-Type: text/plain. A static file works; generating it dynamically (from your CMS or database) keeps it current automatically.
- Reference it: a comment line in robots.txt pointing at it helps agents find it.
- Verify: curl yoursite.com/llms.txt, check it returns 200 as plain text, and read it as a stranger — does it actually explain your site?
Does llms.txt actually matter yet?
Honest answer: adoption by the engines is uneven and the standard is young. OpenAI, Anthropic, and Perplexity crawlers will fetch it, and agent frameworks increasingly look for it, but nobody publishes hard numbers on citation lift from llms.txt alone. The case for doing it anyway: it costs an afternoon, it can't hurt, it improves how coding and research agents (a fast-growing traffic source) understand your site, and it signals machine-readability that correlates with the practices that do measurably matter — crawler access, structure, and freshness. We track AI crawler behavior across our own site and llms.txt is among the most-fetched non-HTML resources.