This agentic SEO case study covers a six-month rollout at a mid-market SaaS — anonymised at the client's request, but with the operational shape preserved in full. Going in, the archive was four hundred pages strong but the organic traffic curve had been declining for three quarters, ranking volatility from Google's AI Mode rollouts was eating featured snippets week-over-week, and citation share across the major LLM engines was effectively zero. Six months later, organic recovered to within two percent of its prior peak, citation share lifted to twenty-two percent, and the publishing engine that produced those numbers is now in steady-state operation.
The rollout followed the cadence we run across SaaS engagements of this size — a hundred-point audit in month one, a three-subagent agentic crawler standing up alongside it, a velocity ramp from twelve to eighty posts per quarter staged against a topical map, citation tracking installed in month two, and a quarterly refresh queue that turns crawler-surfaced decay into writer tickets. The cadence is not novel; the outcomes are. What is worth documenting is the discipline that held the program on the curve when the early indicators went quiet between weeks four and seven, when the typical client instinct is to abandon the plan.
What follows is the engagement broken down by phase: the situation the team inherited, the audit and crawler that constituted month one, the velocity and citation workstreams that filled months two and three, the outcomes at the day-one- eighty board review, and the lessons that replicate to similar mid-market SaaS rollouts. The numbers are real, the patterns are reusable, and the failure modes worth naming are the ones that nearly derailed this program despite a well-specified plan.
- 01Audit-first surfaces the highest-leverage fixes before any new content ships.A severity-weighted hundred-point audit run in week one identified twelve critical findings — broken canonicals, schema drift, orphaned high-value pages — that would have eaten roughly forty percent of any new-content lift if the velocity ramp had started unscaffolded.
- 02The agentic crawler scales editorial judgment without scaling the team.A three-subagent crawler — orchestrator, auditor, reporter — running on a weekly cron caught template regressions, surfaced refresh candidates, and fed the citation tracker without an analyst running spreadsheets. One engineering week of build, six months of compounding leverage.
- 03Velocity ramp must lag readiness, not lead it.The team held publishing at twelve posts in month one while the audit closed and the topical map approved. Ramping to forty posts in month two and eighty in month three only after the foundation was clean is what kept quality from drifting and the per-post citation share from collapsing.
- 04Citation tracking is the new ranking proxy that moves first.Share of citation across ChatGPT, Claude, and Perplexity climbed from near zero to fifteen percent by week eight — four to six weeks before organic sessions inflected. Without the tracker, the program would have read as flat for two months longer than it actually was.
- 05Quarterly refresh prevents the corpus from decaying back to baseline.Two refresh tickets per writer per week, sourced from the crawler and prioritised by traffic-at-risk, kept the published corpus compounding through the second quarter. Without refresh discipline, the curve flattens by month nine on every program we have run.
01 — SituationA four-hundred-page archive in slow decline.
The client was a vertical-SaaS company in a regulated industry with roughly four hundred indexed pages, an in-house content team of one editor and two writers, and an organic traffic curve that had been declining for three quarters at the time of engagement kickoff. The decline was not catastrophic — a roughly eighteen-percent fall from the prior peak — but the trend line was unambiguous, and the team had already cycled through two consultants without moving the curve.
The diagnosis from the first kickoff conversation was three overlapping problems. First, ranking volatility from Google AI Mode rollouts had been chewing featured snippets and position-zero placements at a roughly monthly cadence for two quarters. Second, the archive had accumulated technical debt — schema drift, broken canonicals after a CMS migration the prior year, internal-link rot — that no one had time to triage. Third, the corpus was effectively invisible to LLM engines: a manual sample of fifty priority queries against ChatGPT, Claude, and Perplexity surfaced the client's domain zero times. The team knew the program needed a reset; they did not have the capacity to design one in-house.
Indexed corpus at kickoff
Roughly half pillar / cluster content, half blog posts from a 2023-2024 sprint that had aged out. About seventy pages drove eighty percent of remaining organic traffic — typical long-tail decay.
80/20 traffic skewDecline from prior peak
Three quarters of sequential decline. The slope had been steepening — losses concentrated in head-term clusters where AI Mode had begun summarising answers directly in the SERP.
AI Mode pressureAcross ChatGPT · Claude · Perplexity
Manual sample of fifty priority queries. The domain was effectively invisible to LLM engines despite being a recognised brand inside the vertical. No author schema, no FAQ schema, no citation hooks.
Invisible to LLMsEditor + 2 writers
Capable team but operating at a publishing rhythm of three posts per month — too thin for cluster execution at the archive size. No bandwidth for technical SEO or crawler infrastructure.
Bandwidth-constrainedThe board mandate was explicit and quantified: recover organic traffic to within five percent of the prior peak within two quarters, establish a measurable LLM citation share, and put an operating model in place that the in-house team could run without external help by month seven. Budget was tight enough to make a Tier 100 velocity engagement off the table; the answer landed on Tier 40 — forty posts per quarter through the first half of the engagement, ramping to eighty in the second half as the crawler and templates took over the mechanical work.
One element worth naming at the start: the client had run two prior SEO engagements with traditional agencies in the preceding eighteen months. Both had produced volume — roughly a hundred and fifty posts between them — and neither had moved the organic curve. The team was sceptical going in, and rightly so. The argument we made at kickoff was that volume without scaffolding does not compound — that audit, crawler, and refresh discipline were the missing pieces, not more content. The first thirty days were structured to prove that argument before any new posts shipped.
"The team had already shipped a hundred and fifty posts in eighteen months. The problem was not volume — it was that none of it was scaffolded into a system that compounded."— Engagement kickoff retrospective
02 — Approach: 100-Point AuditFive domains, severity-weighted, twelve critical findings.
Week one ran the hundred-point audit across the archive. The checklist is partitioned into five domains — crawl and index, on-page editorial, schema and structured data, internal link graph, and LLM citation readiness — with each finding scored for severity and traffic-at-risk. Out of one hundred checks the audit surfaced twelve critical findings, twenty-eight high findings, and roughly fifty medium-or-low items that would route to the refresh queue rather than block the velocity ramp.
The four domain summaries below show where the archive actually broke down. The twelve critical findings clustered heavily in schema and canonicals — the legacy of the prior CMS migration — with the LLM citation readiness gap essentially universal across the corpus. None of this was surprising; the corpus had been built before structured data and author schema became table stakes for citation, and the team had not been resourced to retrofit.
Crawl & index
Robots · Sitemap · Canonical · DepthAudit surfaced thirty-four orphaned pages at depth four or deeper, eleven canonicals pointing to deprecated URLs, and a sitemap that had not been resubmitted to Search Console in six months. Three critical findings, eight high.
Findings: 3 critical · 8 highOn-page editorial
Title · Meta · H1 · Lede · IntentRoughly forty percent of posts had title tags above the sixty-character truncation threshold; twenty-two posts shared duplicate H1s with their cluster pillars. Two critical findings, six high — all scheduled into early phase two.
Findings: 2 critical · 6 highSchema & structured data
Article · Author · Organization · FAQThe largest single concentration of critical findings — five out of twelve. Article schema missing on forty percent of posts, no Author schema with sameAs, Organization schema present but stale. Schema CI gate became the highest-leverage week-six install.
Findings: 5 critical · 4 highInternal link graph
Pillar reach · Anchor distributionThree pillars reachable in two clicks from homepage, four pillars at depth three-plus. Anchor distribution heavy on exact match — a legacy pattern from earlier agency engagements. Two critical findings, six high.
Findings: 2 critical · 6 highLLM citation readiness
Authorship · Citability · ExtractsEffectively zero. No author bios with credentials, no sameAs links to verifiable profiles, no clean extracts answering specific questions. The gap that explained the citation-share number. Treated as foundational, not optional.
Findings: 0 critical · 4 high (foundational)Remediation ran in parallel with the crawler build through weeks two and three. Schema critical findings closed first — the highest-leverage class because they affected both traditional ranking and LLM citation. Canonical and sitemap fixes followed in week two. Orphaned-page consolidation ran into week three: of the thirty-four orphans, eleven were consolidated into existing cluster posts via redirects, nine were promoted into the internal-link graph via the topical map, and the remaining fourteen were marked for refresh-or- retire in the steady-state queue.
The remaining piece of the audit — the topical map — was the week-three deliverable. The map identified six priority clusters covering the product's core value propositions and the three sub-verticals the business was actively expanding into. Each cluster received a pillar designation (head-term, four-to-five-thousand-word definitive piece) and a supporting list of eight to twelve posts in deliberate publication order. The map was signed off by the in-house editor at the end of week three. Phase two opened on schedule.
03 — Approach: Agentic CrawlerThree subagents, one orchestrator, weekly cron.
The agentic crawler is the program's production infrastructure from week two onward — not a tooling indulgence. It runs on a weekly cron against the production domain and produces three artefacts: a delta report on audit-checklist findings, a refresh-candidate queue with traffic-at-risk priority, and a citation-tracker dataset covering the priority query set. The build is documented in our agentic crawler tutorial; here we cover how it was adapted to this engagement and what it caught.
The architecture is three subagents managed by a single orchestrator. The orchestrator schedules the run, hands work to each subagent, and reconciles the outputs into a single weekly report consumed by the content team. The auditor subagent walks the canonical URL list and runs the hundred-point checklist against each page. The reporter subagent surfaces deltas — new findings, closed findings, regressed findings — and writes the report. A third subagent handles citation tracking against the LLM engines on the same cadence. None of this is exotic; the leverage is in the weekly rhythm and the discipline of feeding the output back into the writer ticket queue.
Orchestrator + scheduler
Weekly cron · Queue · ReconcileOwns the weekly run. Splits the URL set across auditor instances, dispatches the citation tracker, reconciles outputs into a single delta report and ticket queue. Roughly two hundred lines of TypeScript on top of the Claude Code subagent runtime.
Cadence: weekly Sunday 02:00Auditor
100-point checklist · Per-URLCrawls each URL, validates against the audit checklist, emits findings with severity and traffic-at-risk score. Catches schema regressions, canonical drift, and template-level changes the CMS introduced. Roughly 800 URLs per run, fifteen-minute runtime.
Output: findings.jsonCitation tracker
ChatGPT · Claude · PerplexityQueries the three engines for the priority query set (started at fifty queries, grew to two hundred by month four). Logs citation hits, position, and snippet. The leading-indicator feed for the visibility scorecard.
Output: citations.jsonWeekly digest
Deltas · Ticket queue · ScorecardReconciles auditor + citation outputs against the prior week. Writes a markdown digest read by the content team Monday morning, plus a JSON ticket queue consumed by the refresh playbook. The handoff that makes the crawler operational.
Cadence: Monday 06:00 digestEngineering investment
1 week · 1 engineerTotal build: roughly one engineering week spread across weeks two and three of phase one. Subsequent maintenance: roughly two hours per month tuning rules. The leverage ratio is what makes the agentic crawler a foundational install, not an optional one.
ROI: 6+ months compoundingOne detail worth emphasising. The crawler caught a CMS template regression in week six that would have stripped canonical tags from one of the largest pillar clusters had the team not been alerted within twenty-four hours. The engineering fix was thirty minutes; the traffic loss avoided, estimated against historical clickthrough rates, would have been roughly fifteen percent of cluster organic sessions for the duration of the regression. That single catch paid back the crawler build cost inside the first quarter — and the crawler caught two more template regressions of similar shape across the six months of the engagement. For teams considering whether to skip the crawler build in phase one, that pattern is the answer.
The citation-tracker subagent deserves its own note. The initial query set of fifty priority queries was selected by the in-house editor based on intent fit and commercial value, not on search volume. By month four the set had grown to two hundred queries as the visibility scorecard became the primary monthly artefact and broader coverage became more useful than narrower precision. The growth curve — fifty to a hundred to two hundred queries — is the cadence we suggest to most engagements; starting tight keeps the tracker honest, expanding as the data accumulates keeps it relevant.
04 — Approach: Velocity RampTwelve to eighty posts per quarter, staged against the map.
The velocity ramp ran in three stages over the six months of the engagement. The shape mattered: ramping too aggressively in month two would have ridden over the audit remediation and produced thin content stacked onto an unfixed foundation; ramping too slowly would have failed the board mandate by month six. The middle path — twelve, forty, eighty posts across the three quarters of effective publication — is the cadence that worked here and that we have replicated in three subsequent engagements of similar shape.
The team shape backed the ramp at each stage. Month one held at the existing rhythm — three writers, three posts per week, working on cluster fillers while the audit closed. Month two added a second editor and two contracted writer pairs to absorb the jump from twelve to forty posts per quarter. Month four added a third contracted writer pair and tightened the editorial review loop to absorb the ramp to eighty per quarter. Each step was deliberate; each step had a gate that needed to close before the next opened.
Velocity ramp · posts per quarter, staged against the map
Source: engagement publication logThe publishing order against the topical map mattered as much as the volume. The team published pillar pieces first in month two — six clusters meant six pillars, scheduled two per week through weeks five through seven. The supporting cluster posts began landing in week eight in deliberate order: high-volume sub-questions first, then comparison and decision-framework posts, then long-tail specifics. The internal-link graph was wired post-by-post as the cluster filled out, not deferred to a sweep at the end. By the close of month three, four of the six clusters were structurally complete; the remaining two filled out through months four and five.
The quality side of the ramp was non-negotiable. Schema validation was wired into the deploy pipeline at the end of week six: invalid Article, Author, or Organization schema blocked the merge. The agentic crawler caught what the CI gate missed on a weekly cycle. The editorial review loop ran two passes per post — structural fit against the map on draft, citability and schema check before publish. Twenty percent of drafts came back for a substantive rewrite in month two; that rate dropped to under five percent by month four as the writer pairs internalised the style guide and the schema requirements.
"Ramping velocity without team shape produces thin posts on a broken base. The middle path — twelve, forty, eighty — is what worked when the audit gated it correctly."— Engagement retrospective, month-three review
05 — Approach: Citation TrackingThe leading indicator that moves first.
Citation tracking went live in week six, two weeks ahead of the standard cadence — the team wanted the leading-indicator feed up before the velocity ramp made attribution noisy. The choice of engines was deliberate: ChatGPT for breadth, Claude for citability quality, Perplexity for the LLM-with- citations baseline. Gemini was added in month four when its answer-engine surfaces became more relevant in the client's vertical. The matrix below shows the four- way reasoning that drove the engine selection and the metric design.
ChatGPT (breadth)
Largest user base, broadest query distribution, ChatGPT search and answer modes both relevant. Tracked weekly across the full priority query set. The volume baseline against which the other engines normalise.
Track from week 1Claude (citability)
Highest citation-quality bar in benchmark testing — Claude tends to cite primary sources with proper attribution when content is structured for citability. Best signal for whether the citation-readiness work is paying off.
Track from week 1Perplexity (baseline)
LLM-with-citations product whose surfaces explicitly favour citable content with author credentials and structured data. Treated as the calibration baseline — the floor at which citation share should land if the readiness work has held.
Track from week 1Gemini (expansion)
Added in month four as AI Overviews and answer-engine surfaces became more relevant in the client's vertical. The choice point is when an engine's surface impacts the client's actual traffic — earlier is over-investment, later misses signal.
Track from month 4The citation-share metric design matters more than the engines chosen. The tracker logged three things per query per engine: whether the domain was cited, the position in the citation list, and the snippet text. Share-of-citation was reported weekly as the percentage of priority queries on which the domain was cited at all, segmented by engine. Position was tracked but not aggregated — a citation at position three is materially different from position one, but aggregating positions across queries produces a misleading single number. Snippet text became the basis for the refresh queue's citability tickets when citations existed but landed on stale or thin content.
The trajectory of share-of-citation across the six months is the cleanest leading-indicator chart from the engagement. Effectively zero at kickoff, the metric began climbing in week four as the schema and authorship critical findings closed, accelerated in weeks six through ten as the velocity ramp added citable content against the topical map, and stabilised in the high teens through month four before climbing again to twenty-two percent at month six. Crucially, share-of-citation moved approximately five weeks before organic sessions did. The monthly board review used the citation curve to set expectations for the lagging session curve — a conversation that would have been impossible without the tracker.
06 — OutcomesDay-180 board review: three numbers that moved.
The day-one-eighty board review was structured around three numbers: organic traffic recovery against prior peak, LLM citation share, and ranking stability against the AI Mode volatility that had eroded the prior baseline. Each was reported as actuals against the projection set at the day-ninety review, with narrative on where the program had outperformed or underperformed expectations and what the month-seven steady-state operating model would look like.
The honest framing is that the program did not deliver on every projection. Organic sessions recovered to within two percent of the prior peak — the projection had been within five percent, so this was an outperform. LLM citation share landed at twenty-two percent — the projection had been fifteen percent, also an outperform. Ranking stability was mixed — featured snippet retention improved materially but position-one stability on head terms remained volatile, consistent with the broader trend across the vertical. Two-out-of-three on board expectations, both outperformers on the metrics the board most cared about.
Organic sessions vs prior peak
Recovered to within two percent of the eighteen-month peak the prior decline had walked back from. The slope at month six was still positive — the projection for month nine is to clear the prior peak by roughly five percent.
Projection: −5% · Actual: −2%Citation share · weighted
Volume-weighted share across the four engines tracked. Claude led at twenty-eight percent (citability work paid off), Perplexity at twenty-four percent, ChatGPT at nineteen, Gemini at sixteen. All four trending upward at month six.
Projection: 15% · Actual: 22%Featured snippets retained
Featured snippets owned by the domain at month six versus the trailing trend pre-engagement. The schema and snippet-block refresh work compounded with the new-content velocity. AI Mode volatility on head terms was harder to stabilise.
Mixed: snippets up · position-1 volatileCost per published post
Cost per net-new published post at month six versus the prior agency baseline. Cluster execution at scale plus the writer-pair model brought the per-post economics down meaningfully, even before factoring in the refresh queue's compounding leverage.
Per-post economics improvedThe two outperforming numbers were not coincidence. Organic session recovery came in ahead of projection because the citation-share lift began driving direct-from-LLM traffic earlier than the model assumed — Claude and Perplexity in particular surfaced the domain in their answer panels on roughly fifteen percent of in-vertical queries by month five, which contributed measurably to total sessions before traditional organic had fully recovered. The citation-share outperformance came from the LLM-readiness work being more foundational than the projection assumed: the gap was so large at kickoff that even modest authorship and schema improvements moved the needle disproportionately.
The mixed-result number — ranking stability on head terms — is the honest one to discuss. AI Mode rollouts continued through the six months of the engagement and erased position-one placements on three head-term queries the program had hoped to recover. The mitigation was the cluster strategy: where head-term position-ones were volatile, the supporting cluster posts captured long-tail traffic that the AI Mode summaries did not absorb. Net traffic recovered; volatility on individual high-value queries did not. The board accepted this trade-off as the new operating reality of the vertical.
07 — Lessons + ReplicationFour lessons that replicate.
The patterns from this engagement have replicated across three subsequent mid-market SaaS rollouts. The four lessons below are the ones that consistently determine whether a similar program lands its day-one-eighty projections or stalls between month two and month four. None of them are novel; all of them are routinely under-resourced.
Foundation work compounds
The hundred-point audit and the schema CI gate are not setup tasks — they are foundational infrastructure that compounds across every post published afterwards. Programs that defer the audit to ramp velocity earlier consistently underperform on per-post citation share.
Audit first, alwaysCrawler is the moat
One engineering week of agentic crawler build pays back inside the first quarter on regression catches alone — and accumulates leverage every week thereafter via refresh-candidate surfacing and citation tracking. Skipping the build is the single most-violated phase-one discipline.
Build the crawler in week 2Citation tracking moves first
Across this engagement and three subsequent ones, share-of-citation moved four to six weeks before organic sessions did. Installing the tracker in month two and using it as the primary monthly artefact is what keeps stakeholders invested through the lagging-indicator quiet period.
Track from week 6Refresh prevents decay
Programs that publish for six months but never install a refresh queue plateau by month nine on every comparison we have run. Two refresh tickets per writer per week, crawler-sourced and traffic-at-risk prioritised, is what turns the corpus from one-shot into compounding.
Install refresh in month 4One replication caveat is worth naming. This engagement was a mid-market SaaS with a four-hundred-page archive, an in-house team capable of absorbing the velocity ramp with the addition of contracted writer pairs, and a board that held the line on a six-month measurement window. The cadence replicates well to similar mid-market SaaS engagements; it does not necessarily replicate to small domains under fifty pages (where the audit overhead is disproportionate to the corpus), to enterprise multi-locale engagements (where the team shape and orchestration complexity step up sharply), or to programs measured on a ninety-day window (where the lagging indicators do not have time to compound). The pattern is honest about its scope.
For teams considering whether this rollout shape fits their context, the natural starting point is the program plan we run for first-engagement clients — see our 90-day program launch playbook for the phased deliverables, gate criteria, and team-shape recommendations by velocity tier. The agentic crawler build is documented step-by-step in our agentic crawler tutorial — the same three-subagent architecture that ran this engagement, with the orchestrator and reporter agents ready to clone.
The broader takeaway from this engagement is the one we opened with: agentic SEO rollouts compound on a quarterly clock, not a monthly one, and the discipline that holds a program on that clock is the audit-plus-crawler-plus- refresh combination — not raw publishing volume. Three replication engagements later, the pattern has held. The failure modes — cancelling at thirty days, ramping velocity without team shape, skipping refresh, measuring vanity metrics — show up in every program we have audited externally. The mid-market SaaS engagement documented here is the cleanest demonstration to date that getting the discipline right is what produces the compounding, not the talent of any single writer or the cleverness of any single tactic.
Agentic SEO rollouts compound — the audit + crawler + cadence pattern replicates.
Six months, four hundred pages, three subagents — and a mid-market SaaS that walked into the engagement losing ground on the curve walked out with organic sessions within two percent of its prior peak, citation share at twenty-two percent across the four LLM engines tracked, and a steady-state operating model the in-house team could run without external help. The mechanism that produced those outcomes was not novel; it was audit-first, crawler-second, velocity-ramped-to- readiness, citation-tracked, refresh-disciplined. The discipline is what compounded.
The lessons that replicate are the ones documented in section seven. Foundation work compounds. The agentic crawler is the moat. Citation tracking moves first. Refresh discipline prevents decay. Three subsequent engagements have followed the same cadence with similar outcomes — different verticals, different archive sizes, different team shapes, same shape of the curve. The program shape is portable; the discipline to hold to it through the lagging-indicator quiet period is the harder part.
For SaaS teams looking at a similar inflection — a stalling archive, AI Mode volatility eating featured snippets, citation share invisible against the major engines — the answer is rarely more content. It is the scaffolding around the content: the audit that surfaces the highest-leverage fixes, the crawler that catches regressions and feeds the refresh queue, the velocity ramp that lags readiness, the citation tracker that measures the leading indicator, and the quarterly refresh that keeps the corpus compounding. Run that pattern and the curve does what it did here.