AnalysesMay 25, 2026

Make your website citable by AI: robots.txt, llms.txt and more

Share via

Link copied to clipboard

robots.txt, .md, licences, llms.txt: what really makes a website readable and citable by AI — and what’s pure myth, tested from Guadeloupe.

On this page

Make your website citable by AI: robots.txt, llms.txt and more

You may have been pitched the llms.txt file as the key to getting cited by ChatGPT, Perplexity or Google. Here’s a figure that puts things in perspective: out of more than 500 million AI-bot visits measured over three months, barely 408 targeted that famous file. Making your site readable by an artificial intelligence is useful and doable — but it doesn’t go through the recipes you read everywhere. Here’s what really works, what does nothing, and what we do ourselves at Kimoun, in full transparency.

Important
Read this before you panic. You may have seen the news: Chrome Lighthouse now checks for llms.txt, and a few hasty analysts read it as Google walking back its “llms.txt is useless for SEO” line. Too quick a read. Why? The answer is further down — it comes down to a single distinction drawn by John Mueller on May 20, 2026.

What an AI actually “reads” from your site

Tip
An AI that wants to cite your site reads the same thing a visitor and Google read: your pages. No special file, no hidden version — your pages as they are.

The big answer engines rely on classic web search to ground their answers. In practice, they crawl your site like any indexing robot and pick up what they find. There’s no second entrance reserved for AI. It’s the same logic as Google’s official guide on AI search: a clean, readable page is enough on its own.

So what follows isn’t a list of magic files. It’s the sorting between the levers that have a real effect and the ones you’re sold for nothing.

robots.txt: the only file that really controls AI bots

Tip
The only file with real power over AI bots is the one most often forgotten: robots.txt.

It, and it alone, decides who’s allowed to crawl your site. You can name the AI bots in it and allow or deny them: GPTBot (OpenAI), ClaudeBot (Anthropic), PerplexityBot, or Google-Extended (the bot Google uses for its AI features). OpenAI itself documents robots.txt as the right way to control its crawler.

Two important nuances. First, it’s an instruction, not a wall: well-behaved bots respect it, but it doesn’t make your site technically inaccessible. Second, it’s a trade-off. Blocking everything protects your content… and removes you from AI answers at the same time. For a local business after visibility, the goal isn’t to close the door, but to hold it open properly.

llms.txt and llms-full.txt: what they’re good for, and what they’re not

This is the topic with the most loose claims, so let’s be precise. The llms.txt is a Markdown file listing your important pages; the llms-full.txt contains your site’s full text in a single document. The starting idea is honest: help an AI find its way.

The problem is that it doesn’t work for the reason you’re sold. Google has confirmed it doesn’t read the file, and one of its spokespeople compared it to the old “keywords” meta tag — abandoned long ago because it’s declared by the site itself, and therefore easy to manipulate. No major AI player — OpenAI, Google, Anthropic, Meta, Mistral — has announced using it in its search answers. A study of 300,000 sites found no link between having an llms.txt and being cited; a prediction model even improved when that variable was removed, a sign it added noise rather than signal. And the bots that matter almost never go looking for it — hence the 408 visits out of 500 million.

It does have a real use, though, but elsewhere: developer tools. Coding assistants like Cursor, GitHub Copilot or Claude fetch documentation in real time, and there, a clean Markdown index saves them time. That’s actually why Anthropic publishes an llms.txt for its own documentation. Nothing to do with the referencing of your hardware store or guesthouse site.

There’s a second reason, more forward-looking: experimentation. We watch the weak signals, and one of them deserves attention. Large platforms are starting to generalize the file: Wix now generates and maintains it automatically, and Shopify has added native /llms.txt and /agents.md routes for its stores. These files don’t serve classic SEO, but they point to resources for AI agents — search, catalog, MCP endpoints. If the “agentic web” takes hold, that may be where llms.txt finally finds a real use. Meanwhile, we keep experimenting on kimoun.com: it costs almost nothing, and it teaches us where the ground is moving before our clients get there.

Note
Our position, and Google’s latest move. At Kimoun, we maintain an llms-full.txt — in full transparency, it brings us no AI citations, and that’s not why we keep it: it’s a clean index of our pages, almost free, handy for our own tools. A telling detail: when an AI assistant equipped to browse recently needed to read our pages, it didn’t go looking for that file on its own — it had to be pointed to it. And the news confirms the read: in May 2026, Chrome Lighthouse started checking for an llms.txt (“Agentic Browsing” audits), but John Mueller (Google) settled it at once — discovery (being found, SEO) vs functionality (helping an agent, especially coding tools, once the page is found). llms.txt belongs to the second: a “crutch” for AI tools, not a ranking lever.

.md pages and the myth of the “version for robots”

Warning
Creating a Markdown version of every page “for the robots” is a false good idea. If those files are indexable, you manufacture duplicate content at scale and dilute your crawl budget. And serving a different version to robots and humans verges on cloaking, which Google penalizes.

The logic looks tempting: “my site is heavy on JavaScript, let’s give the AIs a lightweight text version.” In practice, you create a second version of your site that nobody sees, with the technical risks that come with it. Google, for its part, has bluntly called it a bad idea to serve Markdown pages meant only for robots. The right approach is simpler, and cheaper: write real pages, clean and fast, once, for your readers. The machines read the same ones.

AccessiWeb: the French-language web accessibility standard, maintained by the BrailleNet association.

Olivier Watte in the AccessiWeb working group — IPEOS, first FLOSS company in the Caribbean, 2005 — Olivier Watte in the AccessiWeb working group (BrailleNet / IPEOS, 2005) — first certified web accessibility evaluator from the French overseas territories.

I’ve known this principle well before AI arrived. I spent years on digital accessibility: AccessiWeb evaluation expert, certified by the BrailleNet association in March 2005, at IPEOS — the first free and open-source software services company in the Caribbean — and the first certified web accessibility evaluator from the French overseas territories. And it’s exactly the same rule that applies here.

“In accessibility, we never solved the problem by rigging up a separate page for people with disabilities. You make a single resource accessible to everyone, by following shared rules. Building a parallel site for one audience means creating a digital ghetto — and it always ends badly. With AI, it’s the same: a single clean page, readable by everyone, humans and machines alike. And the real benefit is right there: that interoperability is what grows both your search referencing and your pickup by AI.”
— Olivier Watte, known as Oliver · founder of Kimoun

A page that respects web standards is readable by a screen reader, by Google and by an AI. The same work serves all three — not three separate projects, just one, done well.

Licences and terms: stating what an AI may do

You can express usage preferences. Refusing training bots in robots.txt (for example Google-Extended or GPTBot) signals that you don’t want your content used to train models. Under European law — and Guadeloupe is French territory — you can also reserve your rights against text and data mining with a machine-readable signal.

Let’s be honest about the limits: enforcing those reservations remains imperfect, and the legal framework is changing fast. I’m not a lawyer, so for a real content-protection strategy, get advice suited to your situation. And above all, ask the right question: most businesses here want more visibility, not less. The goal is almost never to block the AI — it’s to become the source it cites.

What actually makes you citable

It always comes back to the same fundamentals, and that’s good news, because they’re within your reach.

A clear structure, first: logical headings, one question per section, crisp answers. Real freshness next: an update date that matches genuinely revised content, not a cosmetic timestamp. A healthy technical base: a site that’s fast on mobile and well indexed — on Guadeloupe’s 4G it comes down to the second, and it’s largely a matter of hosting and clean configuration. And above all, what nobody can copy: your first-hand experience and your local data. An AI can only cite what exists somewhere; if you’re the most precise source on a local reality, you become the citation.

No file replaces that. If you’d like us to look concretely at where your site stands — what makes it readable, citable and locally visible — that’s what we do on the SEO & local visibility page.

This blog has no comment forum, and the question I’ll leave you with deserves better than silence: which “magic trick” to please the AI have you been sold, or advised to install?

So let’s continue elsewhere. Come discuss it on LinkedIn: I reply to comments, and the best exchanges often turn into a future article. And if you’d rather get the next one without thinking about it, subscribe to the Kimoun newsletter: once or twice a month, a curated watch that gets to the point, breakdowns of local cases, and advice you can apply the following week — sized for realistic budgets and teams. One-click unsubscribe.

Sources

Frequently asked questions

So far, nothing shows it does. A study of 300,000 sites found no correlation between having an llms.txt and being cited by AI. Google has confirmed it doesn’t read it, and no major player has announced using it in its answers. Having one isn’t forbidden, but don’t count on it for your visibility.

No, it’s actually risky. If those files are indexable, you create duplicate content at scale and dilute your crawl budget. Serving a different version to robots and humans verges on cloaking, which Google penalizes. Write good pages once, for your readers: that’s what the machines read too.

Yes — it’s the only file with a real effect. You can allow or deny named bots like GPTBot, ClaudeBot, PerplexityBot or Google-Extended in it. It’s an instruction that well-behaved bots respect — not an unbreakable wall. And beware: blocking everything also means disappearing from AI answers. It’s a trade-off, not a default setting.

Yes, an llms-full.txt. In full transparency: it brings us no AI citations, and that’s not why we keep it. It’s a clean index of our pages, almost free to generate, handy for our own tools and our workflow. We keep it because it costs almost nothing, not because it ranks us.

Only partly. Under European law — which applies in Guadeloupe — you can reserve your rights against text and data mining with a machine-readable signal, and refuse training bots in robots.txt. But enforcement remains imperfect and the framework keeps changing. I’m not a lawyer: for a real protection strategy, get proper advice. For most local businesses, the question isn’t blocking the AI anyway — it’s becoming the source it cites.

Article précédent SEO and GEO in 2026: the Google guide

Article suivant Website cost in Guadeloupe — real prices

Make your website citable by AI: robots.txt, llms.txt and more

What an AI actually “reads” from your site

robots.txt: the only file that really controls AI bots

llms.txt and llms-full.txt: what they’re good for, and what they’re not

.md pages and the myth of the “version for robots”

Licences and terms: stating what an AI may do

What actually makes you citable

Sources

Does an llms.txt file improve my chances of being cited by ChatGPT or Perplexity?

Should I create a Markdown (.md) version of every page for the robots?

Can robots.txt really block AI bots?

Does Kimoun use an llms.txt?

Do a licence or terms on my site stop an AI from using my content?