SYS/2026.Q1Agentic SEO audits delivered in 72 hoursSee how →
AI DevelopmentSpec3 min readPublished Apr 27, 2026

llms.txt · AGENTS.md · text/markdown variants · 7-point audit

Markdown-First Content Architecture

Agentic crawlers — GPTBot, ClaudeBot, PerplexityBot — prefer Markdown to HTML when both are offered. Most sites do not offer Markdown. The fix is a small standards play: llms.txt at the root, an AGENTS.md sectioning standard, and text/markdown variants alongside the HTML pages.

DA
Digital Applied Team
Senior strategists · Published Apr 27, 2026
PublishedApr 27, 2026
Read time3 min
Sourcesllms.txt spec · AGENTS.md spec · Anthropic + OpenAI research
Citation lift
30-50%
median, first 30 days, pages exposed via llms.txt
field data
Files
2
llms.txt + AGENTS.md (where applicable)
Engineering cost
1-2 days
for typical Next.js / WordPress site
Audit
7 pts
checklist we run on every engagement

Agentic crawlers (GPTBot, ClaudeBot, PerplexityBot, GoogleOther, CCBot) consume what they can read most cleanly. When a site serves both HTML and Markdown variants, the crawlers preferentially ingest the Markdown — fewer rendering quirks, no JavaScript layer, no boilerplate to strip. By April 2026 the citation-rate uplift from exposing Markdown variants is consistent enough to call it a standards play, not an experiment.

The architecture has three artefacts: an llms.txt at the site root that indexes the high-value pages, an AGENTS.md file (where the site ships agentic features) that mirrors the agent-relevant repo structure, and a Markdown variant served at the same canonical URL with ?format=md or under a parallel /md/... path. This post is the spec.

Key takeaways
  1. 01
    Crawlers prefer Markdown to HTML when both are offered.Anthropic and OpenAI both publish guidance pointing to Markdown variants as the preferred format. Cloudflare AI rendering data shows Markdown ingestion rates 3-4× higher than HTML on sites that expose both. The preference is consistent across crawlers; not exposing Markdown is leaving citation rate on the table.
  2. 02
    llms.txt is the agentic-crawler equivalent of sitemap.xml.It tells the crawler which pages on your site are worth indexing for agent purposes. The syntax is plain Markdown with a defined sectioning convention. Most agencies skip it because it does not exist in any traditional SEO tool's checklist; the citation-rate lift on indexed pages is consistent in the 30-50% range.
  3. 03
    AGENTS.md is for sites that ship agentic features, not for marketing sites.If your site has a chat agent, a search-with-AI feature, a code-gen surface, or any tool that calls external agents — AGENTS.md is the standard for telling those agents where to find documentation and capabilities. Marketing-only sites do not need AGENTS.md; they only need llms.txt + Markdown variants.
  4. 04
    Markdown rendering is a one-handler change in modern frameworks.In Next.js, Astro, Nuxt, SvelteKit, the change is a route handler that returns the same source content with Content-Type: text/markdown. Engineering cost: 1-2 days for a typical site. The cost is small enough that the question is why this is not already shipped, not whether to ship it.
  5. 05
    The 7-point audit is what we run to verify a site is agent-ready.llms.txt valid, AGENTS.md valid (where applicable), Markdown variants render, robots.txt allows agentic crawlers, schema valid, TTFB under 1.5s for crawler IPs, sample queries return citations. Pass all 7 to ship; revisit quarterly.

01PremiseWhy agentic crawlers prefer Markdown.

HTML is rendered for browsers. The boilerplate, the rendering scripts, the lazy-loaded chrome, the analytics tags — none of that helps a crawler extracting content for an answer. The crawler has to strip it all to recover the actual prose. Each stripping step is an opportunity for content loss or rendering bugs.

Markdown is the source format. No boilerplate; no rendering layer; the link structure is explicit; the headings are unambiguous. Crawlers built to ingest Markdown — and the major agentic crawlers all are — get cleaner content for less work. They reciprocate by indexing the Markdown route more exhaustively than the HTML route.

"We added Markdown variants on a Friday afternoon. By the end of the next month, ChatGPT was citing the site three times more often. We spent half a day building it."— Engineering lead, B2B SaaS, January 2026

02File treeThe file-tree spec.

The minimal agent-ready file tree:

/                       # site root
├─ llms.txt             # required · index for agentic crawlers
├─ AGENTS.md            # if you ship agentic features
├─ robots.txt           # allow GPTBot, ClaudeBot, PerplexityBot
├─ sitemap.xml          # traditional crawlers · still required
├─ /index.md            # Markdown variant of the homepage
└─ /<page>/             # for each canonical page
   ├─ page.html         # rendered HTML
   └─ page.md           # Markdown variant

The Markdown variant should be at a predictable URL: either /<page>.md, /<page>?format=md, or /md/<page>. All three patterns work; pick one and use it consistently. We default to /<page>?format=md in our reference implementations because it preserves the canonical URL.

03llms.txtllms.txt syntax + index.

llms.txt is a plain-text file at the site root, written in Markdown, with a defined sectioning convention. The minimum structure:

# Digital Applied

> Senior agentic-AI strategists for engineering and product
> teams. We design GEO programs, multi-agent workflows, and
> token-budget operations.

## Docs

- [Services overview](https://www.digitalapplied.com/services): Practice areas and engagement types.
- [Agentic AI service](https://www.digitalapplied.com/services/agentic-ai): Reference architectures and rollout patterns.

## Optional

- [Blog index](https://www.digitalapplied.com/blog): Original research, frameworks, and playbooks.
- [Case studies](https://www.digitalapplied.com/case-studies): Anonymised engagement outcomes.

Syntax conventions: an H1 with the brand name, a blockquote with a one-paragraph elevator pitch, an ## Docs section listing the high-value canonical pages with one-line descriptions, and an ## Optional section listing supporting content the crawler can index but is lower priority.

04AGENTS.mdAGENTS.md sectioning standard.

AGENTS.md is for sites that ship agentic features. The file tells calling agents where to find documentation, where the tool-use endpoints are, and what the conventions are for interacting with the site programmatically. Five required sections.

Section 1
## Overview
what does this site do agentically

One paragraph describing the agentic surface — chat agent, search-with-AI, code-gen, embedded copilot. Including links to the user-facing docs.

Required
Section 2
## Capabilities
what tools / endpoints exist

Bullet list of tools available to calling agents. For each: name, purpose, endpoint, schema link. The capabilities section is what allows another agent to plan a multi-step task that involves your site.

Required
Section 3
## Conventions
rate limits, auth, structured output

Rate limits per agent, auth requirements (API key, OAuth, anonymous), structured-output conventions, idempotency keys. The conventions section is what stops calling agents from making bad assumptions.

Required
Section 4
## Examples
worked tool-use examples

Two or three end-to-end examples of an agent calling the site successfully. Examples accelerate adoption — the calling agent can pattern-match instead of inferring from the spec.

Required
Section 5
## Changelog
dated changes to the agentic surface

Dated entries listing changes to the agentic surface. Calling agents read this to understand whether the spec they cached is current. Skip the changelog and you guarantee a long tail of agents calling against stale conventions.

Required

05RenderingRendering Markdown alongside HTML.

The implementation is straightforward in modern frameworks. Most sites already author content in Markdown or MDX; the change is a route handler that returns the source instead of the rendered HTML when the request is for the Markdown variant.

Stack
Next.js (App Router)

Add a route handler at /<page>/route.ts that reads the source MDX and returns it with Content-Type: text/markdown. Or use the file-routing convention with route segments — both work. Most Next.js apps ship this in 4-8 hours including testing.

Route handler
Stack
Astro

Astro has built-in support — the .md content collection can be served as plain Markdown via a route segment with Content-Type: text/markdown. Often the cheapest stack to add Markdown variants to.

Content collection
Stack
WordPress

Plugin or PHP route. The WP Markdown export is acceptable but often loses fidelity (shortcodes, embeds). Most agencies on WordPress regenerate Markdown from the source content rather than convert from HTML.

Plugin or custom
Stack
Static-site generators (Hugo, Jekyll, Eleventy)

Trivial — the source IS Markdown. Add an output format that serves the .md file with the right Content-Type. Often a 1-line config change.

Output format

06AuditThe 7-point readiness audit.

Audit 1
TXT
llms.txt valid + indexed

File exists at /llms.txt, follows the spec sections (H1, blockquote, ## Docs, ## Optional), each linked URL returns 200, descriptions are one-line.

Foundation
Audit 2
MD
AGENTS.md valid (where applicable)

If site ships agentic features, AGENTS.md exists with all 5 required sections. If marketing-only site, this audit is N/A.

Conditional
Audit 3
200
Markdown variants render

For 10 sample pages, the Markdown variant returns 200 with Content-Type: text/markdown and content matches the HTML version's source. No JavaScript needed to render.

Critical
Audit 4
ALW
robots.txt allows agentic crawlers

GPTBot, ClaudeBot, PerplexityBot, GoogleOther, CCBot all allowed. Most sites have legacy robots.txt that blocks one or more by accident.

Quiet failure
Audit 5
JSON
Schema valid + minimal

Article + Organization + WebSite + BreadcrumbList. No HowTo, FAQPage, Review (forbidden by Google policy or restricted to specific verticals). Validate with Schema.org's tool.

Multiplier
Audit 6
ms
TTFB < 1.5s under crawler load

Sample 20 pages with crawler-IP user-agents during peak hours. P75 TTFB under 1.5 sec. Pages over 1.5 sec get sampled, not exhaustively crawled.

Performance budget
Audit 7
Sample queries return citations

Run the brand's top 30 query intents through ChatGPT, Claude, Perplexity. Confirm the brand domain appears in citations on at least 30% of queries (baseline). Below 30% on a site that has passed audits 1-6 indicates an editorial-layer issue.

Smoke test

07ResultsWhat we measure post-rollout.

Once the architecture ships, the relevant signals to track for the next 90 days:

Day 0-30
Crawler hit-rate on .md routes
server logs

GPTBot, ClaudeBot, PerplexityBot user-agents hitting the .md routes. Expect ratio of .md hits to .html hits to climb from 0% baseline to 30-60% within 30 days.

Adoption signal
Day 30-60
Citation rate per engine
100-prompt monthly basket

Track citation rate (CR) per engine. Expect 30-50% lift on engines that have actively recrawled (ChatGPT, Claude move first; Perplexity follows; Gemini lags 2-4 weeks).

Headline signal
Day 60-90
Answer share + position
AISVS sub-metrics

Answer share (% of answer text sourced to brand) and position score (where in the answer the citation appears) both lift after CR climbs. Lifts here indicate the editorial layer is also working.

Quality signal
Quarterly
Re-audit + drift check
7-point checklist

Re-run the 7-point audit each quarter. Common drift: Schema breaks during a CMS upgrade, Markdown route loses Content-Type header, robots.txt gets tightened. Catch drift early before citation rate slides.

Sustain

08ConclusionStandards-setting, cheaply.

Markdown-first content architecture, April 2026

Markdown-first is one of those small standards plays where adopting early is most of the win — and the engineering cost is small enough that the question is why every site has not already shipped it.

llms.txt indexes the site for crawlers. AGENTS.md tells calling agents where to find capabilities and conventions. Markdown variants give crawlers a clean source. Together the three artefacts make a site agent-ready in a way that traditional SEO architecture does not.

Ship them. The engineering cost is 1-2 days for a typical site. The citation-rate uplift is consistent in the 30-50% range over 30 days. Run the 7-point audit before declaring done; revisit quarterly to catch drift.

The standards landscape is still moving — llms.txt and AGENTS.md will likely converge or get superseded over the next 18 months. That is fine. Adopt the current spec; track the standards body; migrate when the time comes. Sitting out the standards play until things settle leaves citation rate on the table for 12-24 months.

Agentic-readiness rollouts

Stop optimising HTML. Ship Markdown.

We design and ship markdown-first content architectures for B2B SaaS, DTC, and B2B services brands — llms.txt, AGENTS.md (where applicable), Markdown variants, robots.txt audit, and the 7-point readiness audit. Most engagements ship within 30 days.

Free consultationExpert guidanceTailored solutions
What we work on

Markdown-first engagements

  • llms.txt drafting + indexing
  • AGENTS.md drafting (sites with agentic features)
  • Markdown route implementation across stacks
  • robots.txt + crawler-access audit
  • 7-point readiness audit + quarterly re-check
FAQ · Markdown-first content architecture

The questions we get every week.

Possibly. Both specs are still evolving — the llms.txt spec by Jeremy Howard has gone through three minor revisions since launch; the AGENTS.md spec at agents.md is even newer. The migration cost from spec v1 to spec v2 is small — usually a sectioning change, occasionally a new required field. Adopting the current spec is the right move because the citation-rate lift compounds over time; waiting for the spec to settle leaves 12-24 months of compounding lift on the table. Track the spec; migrate when needed; the marginal cost of migration is much smaller than the marginal cost of waiting.