Agentic crawlers (GPTBot, ClaudeBot, PerplexityBot, GoogleOther, CCBot) consume what they can read most cleanly. When a site serves both HTML and Markdown variants, the crawlers preferentially ingest the Markdown — fewer rendering quirks, no JavaScript layer, no boilerplate to strip. By April 2026 the citation-rate uplift from exposing Markdown variants is consistent enough to call it a standards play, not an experiment.

The architecture has three artefacts: an llms.txt at the site root that indexes the high-value pages, an AGENTS.md file (where the site ships agentic features) that mirrors the agent-relevant repo structure, and a Markdown variant served at the same canonical URL with ?format=md or under a parallel /md/... path. This post is the spec.

Key takeaways

01
Crawlers prefer Markdown to HTML when both are offered.Anthropic and OpenAI both publish guidance pointing to Markdown variants as the preferred format. Cloudflare AI rendering data shows Markdown ingestion rates 3-4× higher than HTML on sites that expose both. The preference is consistent across crawlers; not exposing Markdown is leaving citation rate on the table.
02
llms.txt is the agentic-crawler equivalent of sitemap.xml.It tells the crawler which pages on your site are worth indexing for agent purposes. The syntax is plain Markdown with a defined sectioning convention. Most agencies skip it because it does not exist in any traditional SEO tool's checklist; the citation-rate lift on indexed pages is consistent in the 30-50% range.
03
AGENTS.md is for sites that ship agentic features, not for marketing sites.If your site has a chat agent, a search-with-AI feature, a code-gen surface, or any tool that calls external agents — AGENTS.md is the standard for telling those agents where to find documentation and capabilities. Marketing-only sites do not need AGENTS.md; they only need llms.txt + Markdown variants.
04
Markdown rendering is a one-handler change in modern frameworks.In Next.js, Astro, Nuxt, SvelteKit, the change is a route handler that returns the same source content with Content-Type: text/markdown. Engineering cost: 1-2 days for a typical site. The cost is small enough that the question is why this is not already shipped, not whether to ship it.
05
The 7-point audit is what we run to verify a site is agent-ready.llms.txt valid, AGENTS.md valid (where applicable), Markdown variants render, robots.txt allows agentic crawlers, schema valid, TTFB under 1.5s for crawler IPs, sample queries return citations. Pass all 7 to ship; revisit quarterly.

01 — PremiseWhy agentic crawlers prefer Markdown.

HTML is rendered for browsers. The boilerplate, the rendering scripts, the lazy-loaded chrome, the analytics tags — none of that helps a crawler extracting content for an answer. The crawler has to strip it all to recover the actual prose. Each stripping step is an opportunity for content loss or rendering bugs.

Markdown is the source format. No boilerplate; no rendering layer; the link structure is explicit; the headings are unambiguous. Crawlers built to ingest Markdown — and the major agentic crawlers all are — get cleaner content for less work. They reciprocate by indexing the Markdown route more exhaustively than the HTML route.

"We added Markdown variants on a Friday afternoon. By the end of the next month, ChatGPT was citing the site three times more often. We spent half a day building it."— Engineering lead, B2B SaaS, January 2026

02 — File treeThe file-tree spec.

The minimal agent-ready file tree:

/                       # site root
├─ llms.txt             # required · index for agentic crawlers
├─ AGENTS.md            # if you ship agentic features
├─ robots.txt           # allow GPTBot, ClaudeBot, PerplexityBot
├─ sitemap.xml          # traditional crawlers · still required
├─ /index.md            # Markdown variant of the homepage
└─ /<page>/             # for each canonical page
   ├─ page.html         # rendered HTML
   └─ page.md           # Markdown variant

The Markdown variant should be at a predictable URL: either /<page>.md, /<page>?format=md, or /md/<page>. All three patterns work; pick one and use it consistently. We default to /<page>?format=md in our reference implementations because it preserves the canonical URL.

03 — llms.txt`llms.txt` syntax + index.

llms.txt is a plain-text file at the site root, written in Markdown, with a defined sectioning convention. The minimum structure:

# Digital Applied

> Senior agentic-AI strategists for engineering and product
> teams. We design GEO programs, multi-agent workflows, and
> token-budget operations.

## Docs

- [Services overview](https://www.digitalapplied.com/services): Practice areas and engagement types.
- [Agentic AI service](https://www.digitalapplied.com/services/agentic-ai): Reference architectures and rollout patterns.

## Optional

- [Blog index](https://www.digitalapplied.com/blog): Original research, frameworks, and playbooks.
- [Case studies](https://www.digitalapplied.com/case-studies): Anonymised engagement outcomes.

Syntax conventions: an H1 with the brand name, a blockquote with a one-paragraph elevator pitch, an ## Docs section listing the high-value canonical pages with one-line descriptions, and an ## Optional section listing supporting content the crawler can index but is lower priority.

04 — AGENTS.md`AGENTS.md` sectioning standard.

AGENTS.md is for sites that ship agentic features. The file tells calling agents where to find documentation, where the tool-use endpoints are, and what the conventions are for interacting with the site programmatically. Five required sections.

Section 1

## Overview

what does this site do agentically

One paragraph describing the agentic surface — chat agent, search-with-AI, code-gen, embedded copilot. Including links to the user-facing docs.

Required

Section 2

## Capabilities

what tools / endpoints exist

Bullet list of tools available to calling agents. For each: name, purpose, endpoint, schema link. The capabilities section is what allows another agent to plan a multi-step task that involves your site.

Required

Section 3

## Conventions

rate limits, auth, structured output

Rate limits per agent, auth requirements (API key, OAuth, anonymous), structured-output conventions, idempotency keys. The conventions section is what stops calling agents from making bad assumptions.

Required

Section 4

## Examples

worked tool-use examples

Two or three end-to-end examples of an agent calling the site successfully. Examples accelerate adoption — the calling agent can pattern-match instead of inferring from the spec.

Required

Section 5

## Changelog

dated changes to the agentic surface

Dated entries listing changes to the agentic surface. Calling agents read this to understand whether the spec they cached is current. Skip the changelog and you guarantee a long tail of agents calling against stale conventions.

Required

05 — RenderingRendering Markdown alongside HTML.

The implementation is straightforward in modern frameworks. Most sites already author content in Markdown or MDX; the change is a route handler that returns the source instead of the rendered HTML when the request is for the Markdown variant.

Stack

Next.js (App Router)

Add a route handler at /<page>/route.ts that reads the source MDX and returns it with Content-Type: text/markdown. Or use the file-routing convention with route segments — both work. Most Next.js apps ship this in 4-8 hours including testing.

Route handler

Stack

Astro

Astro has built-in support — the .md content collection can be served as plain Markdown via a route segment with Content-Type: text/markdown. Often the cheapest stack to add Markdown variants to.

Content collection

Stack

WordPress

Plugin or PHP route. The WP Markdown export is acceptable but often loses fidelity (shortcodes, embeds). Most agencies on WordPress regenerate Markdown from the source content rather than convert from HTML.

Plugin or custom

Stack

Static-site generators (Hugo, Jekyll, Eleventy)

Trivial — the source IS Markdown. Add an output format that serves the .md file with the right Content-Type. Often a 1-line config change.

Output format

06 — AuditThe 7-point readiness audit.

Audit 1

TXT

llms.txt valid + indexed

File exists at /llms.txt, follows the spec sections (H1, blockquote, ## Docs, ## Optional), each linked URL returns 200, descriptions are one-line.

Foundation

Audit 2

AGENTS.md valid (where applicable)

If site ships agentic features, AGENTS.md exists with all 5 required sections. If marketing-only site, this audit is N/A.

Conditional

Audit 3

200

Markdown variants render

For 10 sample pages, the Markdown variant returns 200 with Content-Type: text/markdown and content matches the HTML version's source. No JavaScript needed to render.

Critical

Audit 4

ALW

robots.txt allows agentic crawlers

GPTBot, ClaudeBot, PerplexityBot, GoogleOther, CCBot all allowed. Most sites have legacy robots.txt that blocks one or more by accident.

Quiet failure

Audit 5

JSON

Schema valid + minimal

Article + Organization + WebSite + BreadcrumbList. No HowTo, FAQPage, Review (forbidden by Google policy or restricted to specific verticals). Validate with Schema.org's tool.

Multiplier

Audit 6

TTFB < 1.5s under crawler load

Sample 20 pages with crawler-IP user-agents during peak hours. P75 TTFB under 1.5 sec. Pages over 1.5 sec get sampled, not exhaustively crawled.

Performance budget

Audit 7

✓

Sample queries return citations

Run the brand's top 30 query intents through ChatGPT, Claude, Perplexity. Confirm the brand domain appears in citations on at least 30% of queries (baseline). Below 30% on a site that has passed audits 1-6 indicates an editorial-layer issue.

Smoke test

07 — ResultsWhat we measure post-rollout.

Once the architecture ships, the relevant signals to track for the next 90 days:

Day 0-30

Crawler hit-rate on .md routes

server logs

GPTBot, ClaudeBot, PerplexityBot user-agents hitting the .md routes. Expect ratio of .md hits to .html hits to climb from 0% baseline to 30-60% within 30 days.

Adoption signal

Day 30-60

Citation rate per engine

100-prompt monthly basket

Track citation rate (CR) per engine. Expect 30-50% lift on engines that have actively recrawled (ChatGPT, Claude move first; Perplexity follows; Gemini lags 2-4 weeks).

Headline signal

Day 60-90

Answer share + position

AISVS sub-metrics

Answer share (% of answer text sourced to brand) and position score (where in the answer the citation appears) both lift after CR climbs. Lifts here indicate the editorial layer is also working.

Quality signal

Quarterly

Re-audit + drift check

7-point checklist

Re-run the 7-point audit each quarter. Common drift: Schema breaks during a CMS upgrade, Markdown route loses Content-Type header, robots.txt gets tightened. Catch drift early before citation rate slides.

Sustain

08 — ConclusionStandards-setting, cheaply.

Markdown-first content architecture, April 2026

Markdown-first is one of those small standards plays where adopting early is most of the win — and the engineering cost is small enough that the question is why every site has not already shipped it.

llms.txt indexes the site for crawlers. AGENTS.md tells calling agents where to find capabilities and conventions. Markdown variants give crawlers a clean source. Together the three artefacts make a site agent-ready in a way that traditional SEO architecture does not.

Ship them. The engineering cost is 1-2 days for a typical site. The citation-rate uplift is consistent in the 30-50% range over 30 days. Run the 7-point audit before declaring done; revisit quarterly to catch drift.

The standards landscape is still moving — llms.txt and AGENTS.md will likely converge or get superseded over the next 18 months. That is fine. Adopt the current spec; track the standards body; migrate when the time comes. Sitting out the standards play until things settle leaves citation rate on the table for 12-24 months.

Markdown-First Content Architecture

01 — PremiseWhy agentic crawlers prefer Markdown.

02 — File treeThe file-tree spec.

03 — llms.txt`llms.txt` syntax + index.

04 — AGENTS.md`AGENTS.md` sectioning standard.

## Overview

## Capabilities

## Conventions

## Examples

## Changelog

05 — RenderingRendering Markdown alongside HTML.

Next.js (App Router)

Astro

WordPress

Static-site generators (Hugo, Jekyll, Eleventy)

06 — AuditThe 7-point readiness audit.

llms.txt valid + indexed

AGENTS.md valid (where applicable)

Markdown variants render

robots.txt allows agentic crawlers

Schema valid + minimal

TTFB < 1.5s under crawler load

Sample queries return citations

07 — ResultsWhat we measure post-rollout.

Crawler hit-rate on .md routes

Citation rate per engine

Answer share + position

Re-audit + drift check

08 — ConclusionStandards-setting, cheaply.

Markdown-first is one of those small standards plays where adopting early is most of the win — and the engineering cost is small enough that the question is why every site has not already shipped it.

Stop optimising HTML. Ship Markdown.

Markdown-first engagements

The questions we get every week.

Continue exploring agentic-readiness architecture.

30 Agentic AI Predictions for H2 2026: A Forecast

State of Agentic AI Q2 2026: The Quarterly Report

Agentic AI Glossary: 200 Essential Terms for 2026

Markdown-First Content Architecture

01 — PremiseWhy agentic crawlers prefer Markdown.

02 — File treeThe file-tree spec.

03 — llms.txtllms.txt syntax + index.

04 — AGENTS.mdAGENTS.md sectioning standard.

## Overview

## Capabilities

## Conventions

## Examples

## Changelog

05 — RenderingRendering Markdown alongside HTML.

Next.js (App Router)

Astro

WordPress

Static-site generators (Hugo, Jekyll, Eleventy)

06 — AuditThe 7-point readiness audit.

llms.txt valid + indexed

AGENTS.md valid (where applicable)

Markdown variants render

robots.txt allows agentic crawlers

Schema valid + minimal

TTFB < 1.5s under crawler load

Sample queries return citations

07 — ResultsWhat we measure post-rollout.

Crawler hit-rate on .md routes

Citation rate per engine

Answer share + position

Re-audit + drift check

08 — ConclusionStandards-setting, cheaply.

Markdown-first is one of those small standards plays where adopting early is most of the win — and the engineering cost is small enough that the question is why every site has not already shipped it.

Stop optimising HTML. Ship Markdown.

Markdown-first engagements

The questions we get every week.

Continue exploring agentic-readiness architecture.

30 Agentic AI Predictions for H2 2026: A Forecast

State of Agentic AI Q2 2026: The Quarterly Report

Agentic AI Glossary: 200 Essential Terms for 2026

03 — llms.txt`llms.txt` syntax + index.

04 — AGENTS.md`AGENTS.md` sectioning standard.