By April 2026 the schema markup conversation has moved past “does it work for SEO.” It works — that fight ended around 2020. The 2026 question is sharper: among sites that have already deployed structured data, why does a tiny minority dominate Rich Results and AI-Overview citations while the majority sits in a quiet middle, getting nothing measurable from the markup they shipped two years ago.
We pulled a stratified sample of 5,000 production sites across eight CMS platforms — B2B SaaS (28%), ecommerce (24%), agencies (14%), publishers (12%), professional services (10%), and a long tail (12%) — and ran every URL through a Google Rich Results Test plus a cross-check against the schema.org reference vocabulary. The result is the most complete public picture of structured-data health we are aware of for 2026, and it tells a sharper story than the usual “adoption is rising” headline.
The headline: 71% of sites deploy at least one schema type, but only 22% pass the Rich Results Test cleanly across every @type they emit. The 49-point gap between those two numbers is where the entire opportunity lives — and where most agencies, including the larger ones, are leaving wins on the table.
- 01Adoption is high, validation is low — the 49-point gap is the lever.71% of audited sites deploy at least one schema type, but only 22% pass Google's Rich Results Test cleanly across every detected @type. The largest tier (49% of sites) is in the deploy-but-broken middle — schema is shipped, but it does not actually qualify for rich results or feed AI-search citations.
- 02Valid schema correlates +0.34 with AI-search citation rate.Across the 5,000-site sample, sites that pass Rich Results cleanly are cited noticeably more often in AI Overviews, Perplexity, and ChatGPT search. Article + BreadcrumbList combined produces a +47% citation lift vs no-schema baseline; Product + Offer combined produces +29% on commercial-intent queries.
- 03Five schema types cover most of the value — Organization, WebPage, BreadcrumbList, Article, Product.Organization (61%), WebPage (54%), BreadcrumbList (38%), Article (29%), and Product (18% overall / 73% of ecommerce) are the high-leverage @types. LocalBusiness, Person, and WebSite SearchAction round out the baseline. Most sites overshoot on quantity and undershoot on quality.
- 04Errors cluster in five well-known patterns — and they are all preventable.Missing required props (38% of error pages), invalid ISO-8601 dates (24%), wrong @type for the page content (12%), missing or invalid image dimensions (9%), duplicate @id values (7%). Every one of these is catchable in CI with a 50-line validator before code ever reaches production.
- 05WordPress leads adoption (78%); raw HTML lags (19%) — but the quality gap inverts.WordPress wins on volume because Yoast, RankMath, and Schema Pro auto-emit a baseline. Custom/raw-HTML sites are the lowest on adoption (19%) but the highest on per-instance validity — when a hand-rolled team ships schema, they ship it tested. Shopify sits at 89% Product schema by theme default, but only 31% pair it with Organization.
01 — The ThesisThe schema gap is the most measurable AI-search lever in 2026.
Structured data has always been one of those technical SEO levers that engineering teams ship in a sprint and then quietly forget about. It works in the sense that Google parses it, awards rich results when eligible, and feeds the entity graph that AI search engines now consume directly. It does not work in the sense that most production deployments are silently broken — emitting JSON-LD that fails Google's own validator on at least one required field.
What changed in 2026 is the visibility of the consequences. AI search engines (Google AI Overviews, Perplexity, ChatGPT search) preferentially cite sources whose entity descriptions are machine-verifiable. Valid schema is the cheapest proof that an entity description is correct and complete. The Pearson correlation in our sample sits at +0.34 between Rich Results Test pass-rate and AI-citation frequency — not enormous, but conclusive at this sample size, and one of the few SEO levers in 2026 that produces a measurable lift in AI-search visibility within 30 days of deployment.
"Schema is the only on-page SEO lever in 2026 where the gap between deployed and valid is bigger than the gap between deployed and missing."— From the audit synthesis, Apr 2026
02 — MethodologyWhat we audited and how.
The sample was stratified to reflect the realistic distribution of production websites we encounter in agency engagements — not the top-1000-by-traffic skew that public crawl datasets default to. Each site contributed up to twenty representative URLs (homepage, hub pages, two product or article templates, two blog posts), and every URL was processed through Google's Rich Results Test plus a secondary parse against the schema.org vocabulary as of April 2026.
- Sample composition. B2B SaaS 28%, ecommerce 24%, agencies 14%, publishers 12%, professional services 10%, other 12%. Geographic spread skewed US/EU with ~12% APAC.
- CMS distribution. WordPress 38%, Webflow 18%, Shopify 12%, custom/raw HTML 11%, Wix 8%, Squarespace 6%, Framer 4%, others 3%. This roughly tracks the publicly-reported BuiltWith distribution for content sites.
- Validation criteria.A page is “clean” if every detected @type passes the Rich Results Test with zero errors and zero blocking warnings. Non-blocking warnings (missing recommended fields) were counted but did not disqualify.
- AI-citation measurement. Per site, we sampled 25 representative branded and topical queries and measured citation presence across Google AI Overviews, Perplexity, and ChatGPT search over a 30-day window. Citation rate is the percentage of eligible queries that returned the site as a source.
03 — AdoptionThe headline: 71% deploy, 22% pass.
The two numbers that frame everything else: 71% of audited sites ship at least one schema type, and 22% pass Google's Rich Results Test cleanly across every detected @type. The 49-point gap is the entire conversation — schema is being deployed at scale, but it is being deployed wrong at scale, and the Tier 3 “deployed-but-broken” bucket is the largest single segment in the data.
Sites with ≥1 schema
3,550 of 5,000 audited sites emit at least one valid-syntax JSON-LD block. The most common entry point is Organization markup auto-injected by a CMS plugin or theme — fewer than half of these sites manually wrote any schema beyond the default.
DeployedPass Rich Results clean
1,100 of 5,000 sites pass Google's Rich Results Test on every detected @type with zero errors. The 49-point gap between deploy and validation is the largest under-priced opportunity in 2026 technical SEO.
ValidatedTier 1 — comprehensive
Just 8% of sites deploy ≥5 schema types correctly with the right combinations (Article + BreadcrumbList + Person, or Product + Offer + Organization). This tiny Tier 1 segment dominates Rich Results impressions and AI-search citations across every category we measured.
The winnersThe shape of the distribution is striking once you split it. Tier 4 (no schema) is 29% of sites — a large but shrinking minority. Tier 3 (deployed but broken) is 49% of sites — the largest segment by far, and the segment where the biggest competitive wins live. Tier 2 (clean but minimal, ≤3 schemas) is 14%, and Tier 1 (clean and comprehensive, ≥5 schemas) is the dominant 8%. Most agencies and in-house teams sit in Tier 3 — schema is checked off as a project, but it does not actually pass validation, and therefore does not qualify for the Rich Results or AI-citation lift it could.
04 — Schema TypesWhat is actually shipped across 5,000 sites.
Among the 71% of sites that emit any schema, a handful of @types dominate. Organization, WebPage, and BreadcrumbList together form the “baseline” that every well-tended site should ship regardless of vertical. Article, Product, LocalBusiness, and Person are the content-type-specific layer. Below them sit the specialty-vertical schemas (VideoObject, Recipe, Event) that only apply to a subset of the sample.
% of audited sites carrying at least one of this @type
Source: Digital Applied 5K-Site Schema Audit · Apr 202605 — ErrorsFive error patterns account for most failures.
Validation errors are not random. Across the 78% of sites in the deploy-but-broken segment (Tier 3 plus the broken portion of Tier 2), five well-known error patterns account for over 90% of total failures. Every one of them is catchable in CI with a small JSON-LD validator wired into the build — yet almost no team in the sample had any continuous validation in place beyond the occasional one-off Rich Results Test on launch day.
Missing required props
The single largest failure mode. Article schemas missing headline, datePublished, or author. Product schemas missing offers or aggregateRating (when claimed). Organization schemas missing logo or sameAs. Every schema.org @type has a required-field list — missing any one disqualifies the page from rich results entirely.
38% of error pagesInvalid date format (must be ISO-8601)
datePublished and dateModified must be ISO-8601 (2026-04-26 or 2026-04-26T14:30:00Z). The most common failure is a CMS that emits 'April 26, 2026' or '04/26/2026' as a string. Google's parser silently rejects the date, and the Article schema falls out of rich-result eligibility even though every other field validates.
24% of error pagesWrong @type for page content
Product schema deployed on a non-product page (a category index, an article, a blog tag). Article schema on a homepage. Recipe schema on an editorial 'best of' roundup. Schema must describe the actual primary entity of the page — type-content mismatch is treated as deceptive markup and risks a manual penalty in addition to losing eligibility.
12% of error pagesMissing or invalid image dimensions
image[].width and image[].height are required for many rich-result types. Many CMS plugins emit the URL but skip dimensions, or emit dimensions but skip the URL. Either combination invalidates the image — and image is a required field for Article rich results above the AMP era.
9% of error pagesDuplicate @id values across pages
Every entity in your structured-data graph should have a unique @id (typically the absolute URL of the entity it describes). Sites that use a single static @id like 'https://example.com/#article' for every article fragment confuse the entity graph — Google merges them into one entity description and the per-page article context is lost.
7% of error pagesTwo more failure modes round out the long tail: schema present in a location Google does not parse (5% — most often inside <noscript> or rendered client-side after the initial DOM commit), and conflicting structured data describing the same entity (5% — two different schemas with overlapping @id values that contradict each other on basic facts). Both are easy to spot with a Lighthouse audit; neither is what most teams check.
06 — By CMSThe CMS distribution inverts what you would guess.
Adoption rates by CMS produce a counter-intuitive picture. WordPress leads at 78% — the Yoast / RankMath / Schema Pro plugin ecosystem does most of the work. Shopify hits 89% on Product schema thanks to theme defaults, but only 31% of Shopify stores pair Product with Organization (the combination Google rewards on commercial queries). Custom/raw HTML lags at 19% adoption — but the hand-rolled sites that do ship schema overwhelmingly ship it validated.
Schema-adoption rate by CMS platform · % of sites with ≥1 schema
Source: Digital Applied 5K-Site Schema Audit · Apr 2026The pattern for ecommerce is sharper. 73% of ecommerce sites in the sample emit Product schema — a big number on its surface, but only 41% of those Product schemas pair with an Organization schema, and only 19% include the Offer object that Google requires for the price-and-availability rich result. Shopify themes emit a Product block by default; very few shop owners or developers know to pair it with the rest of the entity graph. The result is a large ecommerce cohort that ships Product schema, fails Rich Results cleanly, and then concludes that “schema does not help our store.”
07 — AI Citation LiftValid schema and AI-search citations correlate cleanly.
The AI-search citation correlation is the single most actionable finding in the audit. Across the 5,000-site sample, sites that pass the Rich Results Test cleanly are cited noticeably more often in Google AI Overviews, Perplexity, and ChatGPT search than sites that either skip schema or deploy it broken. The correlation is +0.34 Pearson — not enormous in absolute terms, but at this sample size it is conclusive, and the per-combination lifts are large enough to matter for any growth team.
Pearson · valid schema vs citation rate
Sites that pass Rich Results Test on every detected @type are cited 30-50% more often across AI Overviews, Perplexity, and ChatGPT search than sites that either ship no schema or ship invalid schema. Correlation is consistent across vertical and CMS.
Strongest 2026 SEO signalCitation lift vs no-schema baseline
The strongest single combination in the data. Editorial sites that ship valid Article + BreadcrumbList together (and pass Rich Results) are cited +47% more often in AI Overviews on informational queries than matched sites with no schema.
EditorialCitation lift on commercial queries
Ecommerce sites that pair valid Product + Offer (with price, availability, and currency) are cited +29% more often in commercial-intent AI search than ecommerce sites with Product alone or with broken Offer markup.
EcommerceCitation lift on brand queries
Sites that ship valid Organization + WebSite (with the SearchAction object) are cited +18% more often on branded queries — the entity-graph signal that disambiguates the brand from same-name competitors and reinforces the canonical site URL.
Brand"Three correctly-combined schemas beat five weakly-validated ones every single time."— From the citation-lift sub-analysis
08 — Tiers + FrameworkThe four tiers and the five-stage audit we run.
Splitting the 5,000-site sample into quality tiers is the cleanest way to see where each cohort actually sits. The pattern is almost comically clear: a small Tier 1 cohort dominates rich-result impressions and AI citations; an enormous Tier 3 cohort has done most of the work but lost most of the value because nobody runs the validator against every deploy.
Clean, comprehensive (≥5 schemas, all valid)
Article + BreadcrumbList + Person + Organization + WebSiteThe 8% that wins. Ships the right combinations for the content type, validates every deployment in CI, and updates schema when content changes. Dominates Rich Results impressions and AI-search citations across every vertical we measured. Most are publishers, B2B SaaS, and tier-1 ecommerce.
The winnersClean but minimal (≤3 schemas, all valid)
Organization + WebPage + BreadcrumbList onlyShips the universal baseline correctly but does not add the content-type-specific layer (Article, Product, LocalBusiness). Captures the basic brand-graph and breadcrumb wins but misses the higher-leverage rich results and citation lifts that come from per-content-type schemas.
Halfway thereDeployed with errors (the broken middle)
Schema shipped · ≥1 error per templateThe largest single cohort in the data — and the largest under-priced opportunity. Schema is deployed (the project shipped), but it fails Rich Results validation on at least one detected @type. Most teams do not know they are in this tier because nobody ran the validator after launch.
Most agencies live hereNo schema deployed
Zero JSON-LD blocks emittedNo structured data at all. Predominantly small custom-HTML sites and Squarespace properties without the optional schema add-on. The fastest first-month wins live here, but only after the Tier 3 cohort has been moved to Tier 1 — Tier 3 fixes are higher leverage per hour invested.
GreenfieldThe audit framework we run on every client engagement collapses to five sequential stages. Stages 1-2 are the up-front analysis; Stages 3-4 are the deployment work; Stage 5 is the continuous integration that prevents regression. The mistake we see most often is teams that do Stages 1-4 once and then never run Stage 5 — within twelve months they are back in Tier 3 because content edits have broken the schema without anyone noticing.
Identify content type, choose 2-3 most-eligible schemas
For each page template (homepage, product, article, category, contact), identify the actual primary entity and select the schema.org @types that describe it. Most templates need 2-3 schemas; do not over-decorate. Type-content mismatch is the highest-risk error pattern in the audit.
Per-templateValidate against Rich Results Test for every template
Run every chosen template through Google's Rich Results Test before deploying to production. If a template fails, fix it now, not later — Tier 3 is created by skipping this stage. Capture the passing JSON-LD as a fixture for the CI test in Stage 5.
Pre-launchAdd Organization + WebSite + BreadcrumbList as universal baseline
Every well-tended site in 2026 should ship Organization + WebSite (with SearchAction) + BreadcrumbList as a universal baseline, served from the layout and inherited by every page. This is the entity-graph foundation; everything else stacks on top.
UniversalDeploy content-specific schema (Article, Product, LocalBusiness)
Layer the content-type-specific schema on top of the universal baseline, scoped to the right template. Article on editorial; Product + Offer on commerce; LocalBusiness on storefronts. Every additional @type must be on a page where it actually describes the primary entity.
Per content typeContinuous monitoring (CI check on every deploy)
Wire a JSON-LD validator into CI that runs against every page-template snapshot on every PR. Block merges that introduce schema errors. This is the difference between sustained Tier 1 and inevitable Tier 3 drift — without Stage 5, every team eventually breaks their own schema as content evolves.
Continuous09 — ConclusionThe realistic 2026 picture.
Adoption is the ceiling. Validation is the lever.
The conventional wisdom on schema markup — “ship it, you will see traffic gains” — was correct in 2018 and is now actively misleading. In 2026, shipping schema is table stakes; 71% of sites already do. The differentiator is whether the schema you shipped actually validates, and whether you keep it valid as your content evolves.
The single most defensible 2026 SEO investment we see, in dollars per impression of organic and AI-search lift, is moving an existing site from Tier 3 (deployed but broken) to Tier 1 (clean and comprehensive). It is rarely more than two engineering weeks of work, the win is verifiable inside Google's own tools within 48 hours, and the AI-citation lift compounds for the lifetime of the deployment.
The next frontier — already visible in the Tier 1 cohort — is entity-graph completeness: cross-linked @id values, consistent sameAs declarations, and tightly-linked Organization-Person-Article triples that let AI search engines traverse the site as a coherent entity rather than a list of pages. That is the work that separates the 8% that dominates citations from the 14% that just barely passes validation. The audit framework above is the path between them.