The Uncomfortable Truth About llms.txt Generators

Table of Contents

The llms.txt standard emerged in 2024 as a way to tell AI systems which pages on your site actually matter. Think of it as a curated menu for ChatGPT, Claude, and Perplexity — directing them to your best content instead of leaving them to guess. The concept is sound. The execution, from most generators, is not.

We tested four different llms.txt generators on the same website — a calculator site with roughly 150 pages. The results ranged from 21 pages to 147. Same site. Same pages. Wildly different outputs. And the differences matter more than the numbers suggest.

The Promise of llms.txt

The llms.txt specification is straightforward: a text file at your domain root that lists the pages you want AI systems to prioritise when they reference your site. It's like robots.txt for AI visibility — except instead of telling crawlers what to avoid, it tells them what to focus on.

When an AI engine encounters your llms.txt file, it uses it as a guide to understand which pages contain your most valuable, authoritative content. A well-structured llms.txt file can be the difference between AI citing your best work and AI citing your cookie policy.

The problem isn't the standard. The problem is the tools generating these files.

Same Site, Four Generators, Four Different Answers

We ran four llms.txt generators against the same website — FreecalcHub.com, a calculator site with approximately 150 pages of content. Here's what each tool produced:

Generator	Pages Found	Quality Scoring
LLM.txt Mastery	147	Yes (1–10 scale)
Keploy (free)	133	No
Writesonic	53	No
SiteSpeakAI	21	No

That's a 7x difference between the lowest and highest page counts. SiteSpeakAI found 21 pages on a 150-page site. That means it missed 86% of the content. Writesonic found barely a third. Even Keploy, which found 133 pages, missed 14 pages that LLM.txt Mastery detected.

Page count alone doesn't tell the full story — but when a generator misses the majority of your content, the llms.txt file it produces is fundamentally incomplete. AI systems relying on it won't know your best pages exist.

The Junk Problem

Finding pages is one thing. Deciding which ones to include is another.

Keploy found 133 pages — close to the full site. But its output was a raw URL dump with no filtering or prioritisation. The generated file included the homepage, the privacy policy, the terms of service, the cookie consent page, and the contact form. All listed alongside the site's actual calculator tools — the pages that AI systems should be referencing.

An llms.txt file that lists everything is functionally useless. The whole point of the standard is curation. If you're telling AI to treat your cookie policy with the same weight as your flagship product page, you're not guiding AI discovery — you're creating noise.

This is where quality scoring matters. A page that explains a complex financial calculation in depth is not equivalent to a page that says "We use cookies to improve your experience." But without scoring, most generators treat them identically.

The JavaScript Rendering Gap

Between 60% and 80% of modern websites are built with JavaScript frameworks — React, Vue, Angular, Next.js, and others. These frameworks render content dynamically in the browser. When you visit one of these sites, your browser executes JavaScript to build the page you see.

Most llms.txt generators don't execute JavaScript. They fetch the raw HTML — the initial shell — and that's all they see. For a React or Vue application, the raw HTML is often nearly empty: a <div id="root"></div> and a script tag. No content. No headings. No text.

A generator that can't render JavaScript literally cannot see the content on a majority of modern websites. It's generating an llms.txt file based on an empty page. The resulting file will either be empty, incomplete, or filled with navigation elements and boilerplate that happened to exist in the initial HTML.

Full JavaScript rendering requires a headless browser — Puppeteer, Playwright, or equivalent. This is significantly more complex and expensive to operate than a simple HTTP fetch. It's also non-negotiable if the generator claims to work with modern websites.

Why Quality Scoring Matters

The llms.txt specification supports a hierarchical structure: pages can be marked as primary, secondary, or optional. This hierarchy should reflect the actual value of each page to AI systems.

Quality scoring evaluates each page against criteria that predict AI citation value:

Content depth and uniqueness — Does the page contain substantial, original information?
Technical relevance — Is the content topically focused and well-structured?
Information architecture — Are headings, sections, and metadata well-organised?
AI citation potential — How likely is an AI system to reference this page when answering a relevant question?

On a 1–10 scale, a deep technical guide with structured data, clear headings, and expert authorship might score an 8 or 9. A cookie policy scores a 3. Your homepage — important for navigation but rarely cited by AI — might score a 5.

Without this scoring, every page looks the same to the generator. And the llms.txt file it produces tells AI systems that every page is equally important — which is the same as telling them nothing at all.

The Generator Landscape in 2025

The market for llms.txt generators currently splits into three tiers:

Free Tools

Tools like Keploy work for basic page discovery. They'll crawl your site and produce a list of URLs. If your site is simple, static, and small, these tools may be adequate. But they lack quality scoring, JavaScript rendering, framework detection, and ongoing updates. You get a URL list, not a curated AI guide.

Enterprise Platforms

At the other end, enterprise solutions priced at $879 per year and above offer comprehensive features but are cost-prohibitive for solo founders, small businesses, and indie SaaS builders. These tools were built for organisations with dedicated SEO teams and substantial budgets.

The Middle Ground

Between free and enterprise sits a gap that most of the market hasn't addressed. Small businesses and solopreneurs need quality scoring, framework detection, and JavaScript rendering — but at a price point that doesn't require a procurement process.

This is the gap that LLM.txt Mastery (opens in new tab) was built to fill: a dedicated llms.txt platform with quality scoring, detection for 17+ deployment platforms, and JavaScript rendering — starting with a free tier and scaling to $20 per month.

What to Look for in a Generator

If you're evaluating llms.txt generators, here's what separates a useful tool from a URL dumper:

Page discovery completeness — Can it find all your pages, not just the ones linked from your homepage? Test it against your sitemap count.
Quality scoring — Does it differentiate between high-value content and boilerplate? Or does every page get the same treatment?
JavaScript rendering — If your site uses React, Vue, Angular, or any SPA framework, the generator must execute JavaScript to see your content.
robots.txt awareness — Does the generator respect your existing crawl rules? A generator that ignores robots.txt may include pages you've deliberately excluded from crawlers.
Platform detection — Different deployment platforms (Netlify, Vercel, WordPress, Shopify) serve llms.txt differently. Your generator should know how to deploy on your stack.
Ongoing updates — Your site changes. Your llms.txt file should change with it. A one-time generation is a snapshot; you need a tool that regenerates as your content evolves.

The llms.txt standard is one of the most practical things you can do for AI visibility right now. But a bad llms.txt file — one that lists every page indiscriminately or misses most of your content entirely — is worse than having no file at all. It actively misleads the AI systems you're trying to reach.

Choose your generator carefully. The file it produces is your introduction to every AI system that visits your site.

Most generators produce junk llms.txt files

LLM.txt Mastery scores every page for AI citation value — free to start.

Try Free Validator