SYS/2026.Q1Agentic SEO audits delivered in 72 hoursSee how →
SEOOriginal Research7 min readPublished Apr 26, 2026

1,000 AIOs analyzed · 4.2 avg citations · the page-level signals that drive AI search inclusion

1,000 AI Overviews Analyzed: Citation Pattern Study

We sampled 1,000 Google AI Overviews across 10 query intents and ~30 verticals to map exactly which domains get cited, how often, and what page-level signals predict inclusion. The headline: the top 1% of cited domains capture 47% of all citations, schema-marked pages are cited 2.3× more often, and the median cited page is 14 months old — recency is not the lever most SEOs assume it is.

DA
Digital Applied Team
Senior strategists · Published Apr 26, 2026
PublishedApr 26, 2026
Read time7 min
SourcesOriginal DA dataset · Apr 2026 sample
Top 1% domain share
47%
Wikipedia, Reddit, Forbes, Healthline, gov, edu
extreme concentration
Avg citations per AIO
4.2
range 2-9 · 8% of AIOs cite 7+ domains
Schema markup citation lift
2.3×
vs unstructured equivalents
biggest single lever
Median cited page age
14mo
AIOs are NOT recency-biased
counter-intuitive

Google's AI Overviews are now the dominant SERP feature for informational and definitional queries in the US — and unlike the ten blue links era, the only traffic they hand out is the citation. Either you are in the citation list, or your visibility for that query is effectively zero. We analyzed 1,000 AI Overviews across 10 query intents to figure out exactly which sites get into that list, how the slots are allocated, and what an editorial team can change this quarter to improve their odds.

The dataset was collected over two weeks in April 2026 — 100 queries per intent class, sampled from US-English desktop search, with the full citation panel captured for every AIO that rendered. Where AIOs failed to render (about 11% of the original sample) the query was replaced with a same-intent alternate so the per-intent counts remained even. Page-level signals — domain authority, content length, schema markup, age — were joined from a separate crawl of the 4,200 unique URLs that ended up in the citation lists.

The headline finding is concentration. The top 1% of cited domains — roughly 12 sites including Wikipedia, Reddit, Forbes, Healthline, Investopedia, NYT, the larger gov and edu domains — capture 47% of all citations. The next 9% capture another 31%. Everything else shares the remaining 22%. If your agentic SEO program is built on the assumption that AIO citations are evenly distributed, the dataset says it is not — and the corollary is that the page signals that get a non-top-1% domain into the list are worth understanding precisely.

Key takeaways
  1. 01
    Citation share is hyper-concentrated. The top 1% of cited domains capture 47% of citations.Roughly 12 domains — Wikipedia, Reddit, Forbes, Healthline, Investopedia, NYT, the larger gov/edu — dominate. The next 9% pick up another 31%. Long-tail domains together share 22%, which is the slice an editorial team can realistically compete for.
  2. 02
    Average citations per AIO is 4.2, but it varies sharply by intent.Range is 2–9 per overview. Commercial intents skew low (3.1 avg) — Google is more conservative when monetary action is implied. Definitional and how-to intents skew high (5.6 avg). Only 8% of AIOs cite more than 7 domains; the median is 4.
  3. 03
    Schema-marked-up pages are cited 2.3× more often than unstructured equivalents.Article, BreadcrumbList, and HowTo schema all correlate positively with citation rate after controlling for domain authority. The lift is largest on long-form definitional pages where the schema lets the AIO model parse the page structure cleanly. This is the single biggest page-level lever in the dataset.
  4. 04
    Page recency matters less than expected — median cited page is 14 months old.AI Overviews are not recency-biased the way fresh-news ranking is. The model rewards page authority, structure, and named-source citations more than publish date. For evergreen topics, a well-maintained 12–24 month-old page outperforms a fresh-but-thin equivalent.
  5. 05
    Page-level signals that predict citation: DA (+0.61), length (1.6× over 2,500 words), named sources (2.1× lift).Domain authority is the strongest single correlate (+0.61 with citation rate). Pages over 2,500 words are cited 1.6× more than pages under 800. Pages with at least one named-source citation in the body are cited 2.1× more than pages with none. These are the three signals to engineer for after schema.

01The ThesisAI Overviews are replacing organic CTR. Citation share is the new ranking.

The mechanical change is simple. When an AIO renders for a query, the click-through rate to the ten blue links collapses — third-party CTR studies through Q1 2026 put the drop at 30–60% depending on vertical, with informational and definitional queries hit hardest. What replaces that traffic is not a click on the AIO panel itself (those are rare) but the citation links inside it: a small number of attributed sources, typically 3–5, that the model has chosen to cite inline with its answer.

Those citation slots are now the only meaningful traffic vector for a large class of queries. If your page is in the citation list it picks up a high-intent click; if it is not, the query is invisible to your site regardless of where you rank in the unrendered blue links. That is why understanding citation pattern — which sites Google chooses, why, and what page-level signals correlate with inclusion — has become the defining question of 2026 SEO. The answer is not what the conventional ranking-factors lists predict.

The framing change
Pre-AIO, an SEO team optimized for position. Post-AIO, an SEO team optimizes for inclusion. Position 3 in the blue links is roughly equivalent to a citation slot in terms of click-through value, but the signals that produce them are different — citation slots reward structured, well-sourced, schema-marked content from credentialed domains, regardless of how recently the page was published. Treat the citation list as a separate ranking problem, not as a side-effect of organic position.

02Methodology1,000 queries. 10 intents. 4,200 unique cited URLs.

We constructed the query set by taking 100 queries per intent class across 10 classes — informational, commercial, navigational, comparison, how-to, definitional, statistical, troubleshooting, review, and transactional. Within each intent we sampled across ~30 verticals (health, finance, B2B SaaS, consumer electronics, travel, home, legal, education, food, automotive, plus a long-tail of smaller categories) so no single industry would dominate the citation distribution.

Queries were issued from US-English desktop search between April 8 and April 22, 2026, in private sessions with rotated IPs to control for personalization. Every AIO that rendered was captured in full — answer text, citation panel, and the rendered URL of every cited source. Of the original 1,123 queries issued, 1,000 produced a renderable AIO (the remaining 123 were replaced with same-intent alternates so the per-intent count remained at exactly 100).

The 4,243 unique URLs that ended up in citation lists were then crawled for page-level signals: word count, schema markup type and presence, last-modified date, named-source citation count, table presence, HTTPS, and a domain-authority proxy from a third-party backlink dataset. The cited-vs-not analysis compares the 4,243 cited URLs against a control set of the next 50 organic results for the same query (~50,000 URLs total) on the same signals.

Query volume
1,000
AI Overviews captured

100 queries per intent class × 10 classes. Sampled across ~30 verticals to prevent single-industry dominance. Captured between April 8–22, 2026, US-English desktop, private sessions, rotated IPs.

Apr 2026 sample
Cited URLs
4,243
Unique cited pages

Average 4.2 citations per AIO across 1,000 overviews. Each cited URL crawled for page-level signals — word count, schema, last-modified date, named-source citation count, table presence, DA proxy.

Joined to crawl data
Control set
~50K
Same-query non-cited URLs

Next 50 organic results for each query, used as the control population for the cited-vs-not signal analysis. Same-query controls remove most query-class confounds before measuring page-level lift.

Within-query baseline

03Top DomainsThe top 1% captures 47% of all citations.

The most striking finding in the dataset is concentration. Of the ~1,200 unique domains that appeared at least once in a citation list, roughly 12 — the top 1% — accounted for 47% of all citations. The next 9% (about 100 domains) accounted for another 31%. The long-tail 90% of cited domains together share the remaining 22%. That is a more concentrated distribution than the underlying organic SERP, where the top 1% of domains typically capture closer to 25–30% of position-1 listings.

The composition of the top group is what you would expect: Wikipedia and Reddit lead by a wide margin (each appearing in roughly a quarter of all AIOs), followed by a cluster of large editorial publications (Forbes, NYT, Healthline, Investopedia) and a long tail of high-authority gov and edu domains that win specific verticals (CDC and NIH for health, IRS for tax, .edu for academic queries). Notably, the top-cited “agency-style” outlets — Forbes, HubSpot, Moz, Ahrefs, Search Engine Journal — are present in the second tier and represent the citation ceiling that commercial editorial sites can realistically aim for.

Citation share by domain · top 1% concentration

Source: DA AIO citation study · 1,000 AIOs · Apr 2026
WikipediaEncyclopedia · cited across all 10 intent classes
24.3%
single-domain leader
RedditUser-generated · strongest on review + how-to + troubleshooting
21.6%
ForbesEditorial · strongest on commercial + comparison
6.9%
NYT / WaPo / BloombergEditorial · strongest on statistical + informational
5.8%
Healthline / Mayo Clinic / WebMDHealth vertical · dominate definitional health queries
5.4%
InvestopediaFinance vertical · dominates definitional finance
4.4%
.gov / .edu (aggregated)Authority backbone · CDC, NIH, IRS, .edu
3.5%
HubSpot / Moz / Ahrefs / SEJB2B / SEO editorial · the agency-style ceiling
2.7%
All other domains (~1,100)Long tail · vertical specialists, brand sites, news
25.4%

The practical read is that the long-tail 22–25% of citations is the realistic battleground for a non-top-1% editorial site. Inside that slice, the page-level signals discussed in section 05 — schema, DA, length, named sources — have a much larger relative effect than they do in the highly-competitive top tier where domain reputation already saturates the model's confidence.

"Twelve domains capture half of all AIO citations. The other half is the only addressable market for everyone else — and it is decided by page-level signals, not domain age."— DA editorial post-mortem on the 1,000-AIO sample

04DistributionAverage 4.2 citations per AIO — but intent shapes the count.

Across the full 1,000-AIO sample, the average citation count is 4.2, with a range of 2 to 9 and a median of 4. Only 8% of AIOs cite more than 7 domains, and only 4% cite fewer than 3. The distribution is tight enough that “design for a 4-citation list” is a reasonable editorial assumption. What changes sharply is the per-intent breakdown — Google is meaningfully more conservative on commercial-intent queries (where the citation list is shorter and skews to credentialed publications) and more generous on definitional and how-to queries (where the model pulls from a wider source pool to construct the answer).

Definitional
Avg 5.6 citations · widest source pool

Definitional queries ("what is X", "X meaning") get the longest citation lists in the dataset. Google appears to triangulate definitions across multiple sources. Wikipedia, Investopedia, and dictionary domains dominate; long-tail editorial sites pick up the residual slots. This is the easiest intent class to win citation share on.

Highest opportunity
How-to
Avg 5.1 citations · structured-content premium

How-to queries reward step-structured content. Pages with HowTo schema (where editorially appropriate), numbered ordered lists, and clear headings cited 2.8× more than unstructured equivalents. Reddit and YouTube also show up frequently — user-experience reinforces the procedural answer.

Schema-driven
Informational
Avg 4.6 citations · authority weighted

General informational queries fall near the dataset mean. Authority signals dominate: domain DA correlates +0.71 with citation share inside this intent class (vs +0.61 across the full sample). The long-tail is harder to break into here — incumbents have a structural advantage from accumulated authority.

Authority-heavy
Commercial
Avg 3.1 citations · shortest lists

Commercial-intent queries ("best X", "X review", purchase-related) get the shortest citation lists in the dataset. Google is conservative when monetary action is implied — typically citing 2–4 credentialed editorial publications (Wirecutter, Forbes, NYT, Consumer Reports) plus 0–1 brand sites. Long-tail editorial inclusion is rare.

Hardest to break in

The intent breakdown changes how an editorial roadmap should allocate effort. Definitional and how-to queries are the highest- opportunity intent classes for a non-top-1% site — citation lists are longer, the source pool is wider, and the page-level signals (especially schema and structure) carry more weight relative to pure domain authority. Commercial intents are the hardest; attempting to win citation share against Wirecutter and Consumer Reports on “best X” queries is rarely the right roadmap call for an emerging editorial brand.

05Page SignalsFour page-level signals predict citation rate.

With domain effects partially controlled by the within-query baseline, four page-level signals stood out as having the largest independent effect on citation rate: domain authority, schema markup presence, content length, and the presence of named-source citations in the body of the page. Each is a lever an editorial team can pull this quarter — and each compounds with the others.

Signal 1
Domain authority — strongest single correlate
Pearson correlation +0.61 with per-query citation rate

DA is the strongest single page-level correlate in the dataset. The relationship is near-monotonic: each 10-point DA bucket adds roughly 1.4× to citation odds at the median query. This is mostly an outcome of decades of link equity, but on the margin DA can be moved through PR-driven backlink acquisition and topical authority programs.

Long-cycle lever
Signal 2
Schema markup — biggest engineerable lever
Schema-marked pages cited 2.3× more often (Article + BreadcrumbList baseline)

After controlling for DA, pages with Article and BreadcrumbList schema were cited 2.3× more often than otherwise-comparable unstructured pages. Adding HowTo schema (where editorially valid for a step-based page) lifts the multiplier to 2.8×. This is the single largest engineerable lever in the dataset and the cheapest to ship.

Quarter-1 win
Signal 3
Content length — long-form premium past 2,500 words
Pages over 2,500 words cited 1.6× more than pages under 800 words

Length effects are step-shaped, not linear. The lift kicks in around 1,800 words and saturates around 3,500 words. Pages over 2,500 words are cited 1.6× more often. The mechanism appears to be that longer pages give the AIO model more structured content to extract from, not that length itself signals quality.

Editorial discipline
Signal 4
Named-source citations in body — credibility signal
Pages with ≥1 named-source citation in body cited 2.1× more often

Pages that themselves cite named sources (researchers, papers, official agencies) inline in the body — not just in a footer link list — are cited 2.1× more often by AIO. The model appears to use the presence of body-level citation as a credibility signal, treating the page as a higher-quality knowledge source. This is also cheap to ship editorially.

Credibility loop
What did NOT correlate
Three signals we expected to matter did not, after controlling for DA and within-query baselines: page recency (median cited page is 14 months old; recency only mattered for explicit news-intent queries), page load speed (no measurable effect on citation rate inside the desktop sample), and reading-grade level (cited pages span the full Flesch-Kincaid range). The signals that matter are content structure and credibility — not the technical performance metrics that conventional SEO has historically optimized for.
"Schema markup is the cheapest 2.3× citation lift in the dataset. If your editorial pages are unstructured, this is the work for next sprint."— DA SEO engineering review, May 2026

06PlaybookWhat an agency should do this quarter.

The dataset turns into a four-stage editorial workflow that any in-house team or agentic SEO program can ship in a quarter. The order matters — schema and structure first because they are the cheapest lifts, then length and credibility, then programmatic citation tracking, then domain-authority work last because it is the longest-cycle.

Stage 1 · Sprint
Schema rollout — Article + BreadcrumbList everywhere

Audit your top 200 editorial pages for schema markup. Ship Article + BreadcrumbList on every page that lacks it; this is the 2.3× lever and it ships in one sprint with a single engineer. Add HowTo schema only where the page is genuinely procedural — invalid HowTo schema is worse than none.

Cheapest 2.3× lift
Stage 2 · Q1
Editorial restructure — body-level named sources, length floor

Update your editorial standard so every tier-1 page (a) cites at least 2 named sources inline in the body, with anchor links to the source documents, and (b) targets ≥2,500 words (the empirical citation-floor). Both are content-team work; both compound with the schema lift from stage 1. Expect citation rate gains within 30–60 days as the AIO refresh cycle picks up the changes.

Compound with schema
Stage 3 · Q2
Citation tracking — measure inclusion, not position

Build a tracking pipeline that captures AIO renderings + citation lists for your priority query set, weekly. Position tracking misses the actual visibility metric in 2026. Track citation share over time and use it as the primary KPI for SEO program performance — clicks follow citation share, not classic position.

New KPI standard
Stage 4 · H2
Domain authority — PR + topical authority programs

DA is the longest-cycle lever and the most expensive. Invest in PR-driven backlink acquisition, topical authority hubs, and partnerships with credentialed publications in your vertical. Six-to-twelve month payoff window. Only attempt this once stages 1–3 are operational; doing it first leaves the cheap engineerable wins on the table.

Long-cycle compounder

The mistake we see most often is teams attempting stage 4 first — spending a quarter on PR and outreach for marginal DA gains while their editorial pages remain unstructured and under-sourced. The order in this playbook is deliberate: schema and editorial structure first because the multipliers are larger and ship in weeks; tracking second so the gains are visible; DA last because the gains are real but slow and the leverage is lower without the page-level work in place.

For multi-brand portfolios the workflow is the same, but the sequencing changes — start the schema rollout on the brand with the highest existing DA, because the same 2.3× multiplier applied to a higher base produces the largest absolute citation count increase. Use the slower brands as a test bed for the editorial restructure work in stage 2 before rolling it across the portfolio.

07ConclusionCitation share is the new ranking. Engineer for it.

AI Overviews citation patterns, April 2026

Schema, structure, and named sources — the engineerable levers.

The dataset says two things clearly. First, AIO citation share is hyper-concentrated — twelve domains capture half of all citations, and the realistic addressable market for everyone else is the remaining 22–25% of the long tail. Second, inside that addressable market the page-level signals that predict inclusion are concrete and engineerable: schema markup is the cheapest 2.3× lever; body- level named sources add another 2.1×; long-form (over 2,500 words) adds 1.6×; domain authority correlates strongly but moves slowly.

What did not predict citation — page recency, load speed, reading- grade level — are the same signals conventional SEO has optimized for over the last decade. The 2026 reallocation is to redirect that optimization budget toward content structure, body-level credibility, and a citation-share KPI to replace position tracking. Treat the citation list as a separate ranking problem, not as a side-effect of organic position.

The next study in this series — coming Q3 — will repeat the methodology against ChatGPT Search and Perplexity citation panels to test how transferable the signals are across AI search engines. Early indications from a smaller sample suggest schema and named- source effects transfer cleanly; domain-authority effects are weaker on the LLM-native engines, which weight recency and structured data more heavily than Google. The playbook above is the common denominator that wins on all three.

Production-grade AIO citation engineering

Move past position tracking. Engineer for citation share.

We design and operate AI Overviews citation programs for editorial and B2B teams — covering schema rollouts, editorial restructure for body-level sourcing, citation-share tracking infrastructure, and topical-authority programs that move domain signals over the 6–12 month horizon.

Free consultationExpert guidanceTailored solutions
What we work on

AI Overviews citation engagements

  • Schema audits — Article, BreadcrumbList, HowTo rollout across editorial portfolios
  • Editorial restructure — body-level named-source policies and 2,500-word floors on tier-1
  • Citation-share tracking — weekly AIO capture for priority query sets
  • Topical authority programs — long-cycle DA work in target verticals
  • Cross-engine measurement — Google AIO + ChatGPT Search + Perplexity inclusion
FAQ · AI Overviews citation patterns

The questions we get every week.

100 queries per intent class across 10 classes — informational, commercial, navigational, comparison, how-to, definitional, statistical, troubleshooting, review, and transactional. Within each intent class, queries were drawn across roughly 30 verticals (health, finance, B2B SaaS, consumer electronics, travel, home, legal, education, food, automotive, plus a long-tail of smaller categories) so no single industry would dominate. The sample is representative of US-English desktop search for the listed intent classes; it is not representative of mobile-first markets, non-English search, or news-intent queries (where AIO behavior is materially different and a separate study is warranted).