Agentic crawlers (GPTBot, ClaudeBot, PerplexityBot, GoogleOther, CCBot) consume what they can read most cleanly. When a site serves both HTML and Markdown variants, the crawlers preferentially ingest the Markdown — fewer rendering quirks, no JavaScript layer, no boilerplate to strip. By April 2026 the citation-rate uplift from exposing Markdown variants is consistent enough to call it a standards play, not an experiment.
The architecture has three artefacts: an llms.txt at the site root that indexes the high-value pages, an AGENTS.md file (where the site ships agentic features) that mirrors the agent-relevant repo structure, and a Markdown variant served at the same canonical URL with ?format=md or under a parallel /md/... path. This post is the spec.
- 01Crawlers prefer Markdown to HTML when both are offered.Anthropic and OpenAI both publish guidance pointing to Markdown variants as the preferred format. Cloudflare AI rendering data shows Markdown ingestion rates 3-4× higher than HTML on sites that expose both. The preference is consistent across crawlers; not exposing Markdown is leaving citation rate on the table.
- 02llms.txt is the agentic-crawler equivalent of sitemap.xml.It tells the crawler which pages on your site are worth indexing for agent purposes. The syntax is plain Markdown with a defined sectioning convention. Most agencies skip it because it does not exist in any traditional SEO tool's checklist; the citation-rate lift on indexed pages is consistent in the 30-50% range.
- 03AGENTS.md is for sites that ship agentic features, not for marketing sites.If your site has a chat agent, a search-with-AI feature, a code-gen surface, or any tool that calls external agents — AGENTS.md is the standard for telling those agents where to find documentation and capabilities. Marketing-only sites do not need AGENTS.md; they only need llms.txt + Markdown variants.
- 04Markdown rendering is a one-handler change in modern frameworks.In Next.js, Astro, Nuxt, SvelteKit, the change is a route handler that returns the same source content with Content-Type: text/markdown. Engineering cost: 1-2 days for a typical site. The cost is small enough that the question is why this is not already shipped, not whether to ship it.
- 05The 7-point audit is what we run to verify a site is agent-ready.llms.txt valid, AGENTS.md valid (where applicable), Markdown variants render, robots.txt allows agentic crawlers, schema valid, TTFB under 1.5s for crawler IPs, sample queries return citations. Pass all 7 to ship; revisit quarterly.
01 — PremiseWhy agentic crawlers prefer Markdown.
HTML is rendered for browsers. The boilerplate, the rendering scripts, the lazy-loaded chrome, the analytics tags — none of that helps a crawler extracting content for an answer. The crawler has to strip it all to recover the actual prose. Each stripping step is an opportunity for content loss or rendering bugs.
Markdown is the source format. No boilerplate; no rendering layer; the link structure is explicit; the headings are unambiguous. Crawlers built to ingest Markdown — and the major agentic crawlers all are — get cleaner content for less work. They reciprocate by indexing the Markdown route more exhaustively than the HTML route.
"We added Markdown variants on a Friday afternoon. By the end of the next month, ChatGPT was citing the site three times more often. We spent half a day building it."— Engineering lead, B2B SaaS, January 2026
02 — File treeThe file-tree spec.
The minimal agent-ready file tree:
/ # site root
├─ llms.txt # required · index for agentic crawlers
├─ AGENTS.md # if you ship agentic features
├─ robots.txt # allow GPTBot, ClaudeBot, PerplexityBot
├─ sitemap.xml # traditional crawlers · still required
├─ /index.md # Markdown variant of the homepage
└─ /<page>/ # for each canonical page
├─ page.html # rendered HTML
└─ page.md # Markdown variantThe Markdown variant should be at a predictable URL: either /<page>.md, /<page>?format=md, or /md/<page>. All three patterns work; pick one and use it consistently. We default to /<page>?format=md in our reference implementations because it preserves the canonical URL.
03 — llms.txtllms.txt syntax + index.
llms.txt is a plain-text file at the site root, written in Markdown, with a defined sectioning convention. The minimum structure:
# Digital Applied
> Senior agentic-AI strategists for engineering and product
> teams. We design GEO programs, multi-agent workflows, and
> token-budget operations.
## Docs
- [Services overview](https://www.digitalapplied.com/services): Practice areas and engagement types.
- [Agentic AI service](https://www.digitalapplied.com/services/agentic-ai): Reference architectures and rollout patterns.
## Optional
- [Blog index](https://www.digitalapplied.com/blog): Original research, frameworks, and playbooks.
- [Case studies](https://www.digitalapplied.com/case-studies): Anonymised engagement outcomes.Syntax conventions: an H1 with the brand name, a blockquote with a one-paragraph elevator pitch, an ## Docs section listing the high-value canonical pages with one-line descriptions, and an ## Optional section listing supporting content the crawler can index but is lower priority.
04 — AGENTS.mdAGENTS.md sectioning standard.
AGENTS.md is for sites that ship agentic features. The file tells calling agents where to find documentation, where the tool-use endpoints are, and what the conventions are for interacting with the site programmatically. Five required sections.
## Overview
what does this site do agenticallyOne paragraph describing the agentic surface — chat agent, search-with-AI, code-gen, embedded copilot. Including links to the user-facing docs.
Required## Capabilities
what tools / endpoints existBullet list of tools available to calling agents. For each: name, purpose, endpoint, schema link. The capabilities section is what allows another agent to plan a multi-step task that involves your site.
Required## Conventions
rate limits, auth, structured outputRate limits per agent, auth requirements (API key, OAuth, anonymous), structured-output conventions, idempotency keys. The conventions section is what stops calling agents from making bad assumptions.
Required## Examples
worked tool-use examplesTwo or three end-to-end examples of an agent calling the site successfully. Examples accelerate adoption — the calling agent can pattern-match instead of inferring from the spec.
Required## Changelog
dated changes to the agentic surfaceDated entries listing changes to the agentic surface. Calling agents read this to understand whether the spec they cached is current. Skip the changelog and you guarantee a long tail of agents calling against stale conventions.
Required05 — RenderingRendering Markdown alongside HTML.
The implementation is straightforward in modern frameworks. Most sites already author content in Markdown or MDX; the change is a route handler that returns the source instead of the rendered HTML when the request is for the Markdown variant.
Next.js (App Router)
Add a route handler at /<page>/route.ts that reads the source MDX and returns it with Content-Type: text/markdown. Or use the file-routing convention with route segments — both work. Most Next.js apps ship this in 4-8 hours including testing.
Route handlerAstro
Astro has built-in support — the .md content collection can be served as plain Markdown via a route segment with Content-Type: text/markdown. Often the cheapest stack to add Markdown variants to.
Content collectionWordPress
Plugin or PHP route. The WP Markdown export is acceptable but often loses fidelity (shortcodes, embeds). Most agencies on WordPress regenerate Markdown from the source content rather than convert from HTML.
Plugin or customStatic-site generators (Hugo, Jekyll, Eleventy)
Trivial — the source IS Markdown. Add an output format that serves the .md file with the right Content-Type. Often a 1-line config change.
Output format06 — AuditThe 7-point readiness audit.
llms.txt valid + indexed
File exists at /llms.txt, follows the spec sections (H1, blockquote, ## Docs, ## Optional), each linked URL returns 200, descriptions are one-line.
FoundationAGENTS.md valid (where applicable)
If site ships agentic features, AGENTS.md exists with all 5 required sections. If marketing-only site, this audit is N/A.
ConditionalMarkdown variants render
For 10 sample pages, the Markdown variant returns 200 with Content-Type: text/markdown and content matches the HTML version's source. No JavaScript needed to render.
Criticalrobots.txt allows agentic crawlers
GPTBot, ClaudeBot, PerplexityBot, GoogleOther, CCBot all allowed. Most sites have legacy robots.txt that blocks one or more by accident.
Quiet failureSchema valid + minimal
Article + Organization + WebSite + BreadcrumbList. No HowTo, FAQPage, Review (forbidden by Google policy or restricted to specific verticals). Validate with Schema.org's tool.
MultiplierTTFB < 1.5s under crawler load
Sample 20 pages with crawler-IP user-agents during peak hours. P75 TTFB under 1.5 sec. Pages over 1.5 sec get sampled, not exhaustively crawled.
Performance budgetSample queries return citations
Run the brand's top 30 query intents through ChatGPT, Claude, Perplexity. Confirm the brand domain appears in citations on at least 30% of queries (baseline). Below 30% on a site that has passed audits 1-6 indicates an editorial-layer issue.
Smoke test07 — ResultsWhat we measure post-rollout.
Once the architecture ships, the relevant signals to track for the next 90 days:
Crawler hit-rate on .md routes
server logsGPTBot, ClaudeBot, PerplexityBot user-agents hitting the .md routes. Expect ratio of .md hits to .html hits to climb from 0% baseline to 30-60% within 30 days.
Adoption signalCitation rate per engine
100-prompt monthly basketTrack citation rate (CR) per engine. Expect 30-50% lift on engines that have actively recrawled (ChatGPT, Claude move first; Perplexity follows; Gemini lags 2-4 weeks).
Headline signalAnswer share + position
AISVS sub-metricsAnswer share (% of answer text sourced to brand) and position score (where in the answer the citation appears) both lift after CR climbs. Lifts here indicate the editorial layer is also working.
Quality signalRe-audit + drift check
7-point checklistRe-run the 7-point audit each quarter. Common drift: Schema breaks during a CMS upgrade, Markdown route loses Content-Type header, robots.txt gets tightened. Catch drift early before citation rate slides.
Sustain08 — ConclusionStandards-setting, cheaply.
Markdown-first is one of those small standards plays where adopting early is most of the win — and the engineering cost is small enough that the question is why every site has not already shipped it.
llms.txt indexes the site for crawlers. AGENTS.md tells calling agents where to find capabilities and conventions. Markdown variants give crawlers a clean source. Together the three artefacts make a site agent-ready in a way that traditional SEO architecture does not.
Ship them. The engineering cost is 1-2 days for a typical site. The citation-rate uplift is consistent in the 30-50% range over 30 days. Run the 7-point audit before declaring done; revisit quarterly to catch drift.
The standards landscape is still moving — llms.txt and AGENTS.md will likely converge or get superseded over the next 18 months. That is fine. Adopt the current spec; track the standards body; migrate when the time comes. Sitting out the standards play until things settle leaves citation rate on the table for 12-24 months.