SYS/2026.Q1Agentic SEO audits delivered in 72 hoursSee how →
SEOOriginal Research7 min readPublished Apr 26, 2026

5,000 sites · 8 CMS platforms · 11 schema types tracked across the 2026 frontier of technical SEO

Schema Markup Adoption: 5,000-Site Audit and Findings

We audited 5,000 production sites for structured data in April 2026. 71% deploy at least one schema type, but only 22%pass Google's Rich Results Test cleanly across every detected @type. The gap between “deployed” and “valid” is the largest under-priced lever in technical SEO right now — +0.34 Pearson correlation with AI-search citation rate.

DA
Digital Applied Team
Senior strategists · Published Apr 26, 2026
PublishedApr 26, 2026
Read time7 min
Sources5,000 sites · 8 CMS platforms · April 2026
Sites with ≥1 schema
71%
3,550 of 5,000 audited
deployed
Pass Rich Results clean
22%
1,100 of 5,000 fully valid
the gap
AI-citation correlation
+0.34
Pearson r · valid schema vs citations
strong signal
WordPress adoption
78%
vs 19% raw-HTML floor

By April 2026 the schema markup conversation has moved past “does it work for SEO.” It works — that fight ended around 2020. The 2026 question is sharper: among sites that have already deployed structured data, why does a tiny minority dominate Rich Results and AI-Overview citations while the majority sits in a quiet middle, getting nothing measurable from the markup they shipped two years ago.

We pulled a stratified sample of 5,000 production sites across eight CMS platforms — B2B SaaS (28%), ecommerce (24%), agencies (14%), publishers (12%), professional services (10%), and a long tail (12%) — and ran every URL through a Google Rich Results Test plus a cross-check against the schema.org reference vocabulary. The result is the most complete public picture of structured-data health we are aware of for 2026, and it tells a sharper story than the usual “adoption is rising” headline.

The headline: 71% of sites deploy at least one schema type, but only 22% pass the Rich Results Test cleanly across every @type they emit. The 49-point gap between those two numbers is where the entire opportunity lives — and where most agencies, including the larger ones, are leaving wins on the table.

Key takeaways
  1. 01
    Adoption is high, validation is low — the 49-point gap is the lever.71% of audited sites deploy at least one schema type, but only 22% pass Google's Rich Results Test cleanly across every detected @type. The largest tier (49% of sites) is in the deploy-but-broken middle — schema is shipped, but it does not actually qualify for rich results or feed AI-search citations.
  2. 02
    Valid schema correlates +0.34 with AI-search citation rate.Across the 5,000-site sample, sites that pass Rich Results cleanly are cited noticeably more often in AI Overviews, Perplexity, and ChatGPT search. Article + BreadcrumbList combined produces a +47% citation lift vs no-schema baseline; Product + Offer combined produces +29% on commercial-intent queries.
  3. 03
    Five schema types cover most of the value — Organization, WebPage, BreadcrumbList, Article, Product.Organization (61%), WebPage (54%), BreadcrumbList (38%), Article (29%), and Product (18% overall / 73% of ecommerce) are the high-leverage @types. LocalBusiness, Person, and WebSite SearchAction round out the baseline. Most sites overshoot on quantity and undershoot on quality.
  4. 04
    Errors cluster in five well-known patterns — and they are all preventable.Missing required props (38% of error pages), invalid ISO-8601 dates (24%), wrong @type for the page content (12%), missing or invalid image dimensions (9%), duplicate @id values (7%). Every one of these is catchable in CI with a 50-line validator before code ever reaches production.
  5. 05
    WordPress leads adoption (78%); raw HTML lags (19%) — but the quality gap inverts.WordPress wins on volume because Yoast, RankMath, and Schema Pro auto-emit a baseline. Custom/raw-HTML sites are the lowest on adoption (19%) but the highest on per-instance validity — when a hand-rolled team ships schema, they ship it tested. Shopify sits at 89% Product schema by theme default, but only 31% pair it with Organization.

01The ThesisThe schema gap is the most measurable AI-search lever in 2026.

Structured data has always been one of those technical SEO levers that engineering teams ship in a sprint and then quietly forget about. It works in the sense that Google parses it, awards rich results when eligible, and feeds the entity graph that AI search engines now consume directly. It does not work in the sense that most production deployments are silently broken — emitting JSON-LD that fails Google's own validator on at least one required field.

What changed in 2026 is the visibility of the consequences. AI search engines (Google AI Overviews, Perplexity, ChatGPT search) preferentially cite sources whose entity descriptions are machine-verifiable. Valid schema is the cheapest proof that an entity description is correct and complete. The Pearson correlation in our sample sits at +0.34 between Rich Results Test pass-rate and AI-citation frequency — not enormous, but conclusive at this sample size, and one of the few SEO levers in 2026 that produces a measurable lift in AI-search visibility within 30 days of deployment.

"Schema is the only on-page SEO lever in 2026 where the gap between deployed and valid is bigger than the gap between deployed and missing."— From the audit synthesis, Apr 2026

02MethodologyWhat we audited and how.

The sample was stratified to reflect the realistic distribution of production websites we encounter in agency engagements — not the top-1000-by-traffic skew that public crawl datasets default to. Each site contributed up to twenty representative URLs (homepage, hub pages, two product or article templates, two blog posts), and every URL was processed through Google's Rich Results Test plus a secondary parse against the schema.org vocabulary as of April 2026.

  • Sample composition. B2B SaaS 28%, ecommerce 24%, agencies 14%, publishers 12%, professional services 10%, other 12%. Geographic spread skewed US/EU with ~12% APAC.
  • CMS distribution. WordPress 38%, Webflow 18%, Shopify 12%, custom/raw HTML 11%, Wix 8%, Squarespace 6%, Framer 4%, others 3%. This roughly tracks the publicly-reported BuiltWith distribution for content sites.
  • Validation criteria.A page is “clean” if every detected @type passes the Rich Results Test with zero errors and zero blocking warnings. Non-blocking warnings (missing recommended fields) were counted but did not disqualify.
  • AI-citation measurement. Per site, we sampled 25 representative branded and topical queries and measured citation presence across Google AI Overviews, Perplexity, and ChatGPT search over a 30-day window. Citation rate is the percentage of eligible queries that returned the site as a source.
What this audit is not
This is not a crawl of the entire web — it is a stratified production sample. Sites with under 50 indexed pages were excluded; sites with over 500K indexed pages were sampled at the same per-site URL cap as everyone else. The published correlation numbers describe agency-tier production websites, not the long tail of personal blogs or the enterprise tail of Fortune 50 brands. For your own stack, treat the numbers as a strong directional signal, not as predictive estimates for any single property.

03AdoptionThe headline: 71% deploy, 22% pass.

The two numbers that frame everything else: 71% of audited sites ship at least one schema type, and 22% pass Google's Rich Results Test cleanly across every detected @type. The 49-point gap is the entire conversation — schema is being deployed at scale, but it is being deployed wrong at scale, and the Tier 3 “deployed-but-broken” bucket is the largest single segment in the data.

Adoption rate
71%
Sites with ≥1 schema

3,550 of 5,000 audited sites emit at least one valid-syntax JSON-LD block. The most common entry point is Organization markup auto-injected by a CMS plugin or theme — fewer than half of these sites manually wrote any schema beyond the default.

Deployed
Validation rate
22%
Pass Rich Results clean

1,100 of 5,000 sites pass Google's Rich Results Test on every detected @type with zero errors. The 49-point gap between deploy and validation is the largest under-priced opportunity in 2026 technical SEO.

Validated
Quality dominance
8%
Tier 1 — comprehensive

Just 8% of sites deploy ≥5 schema types correctly with the right combinations (Article + BreadcrumbList + Person, or Product + Offer + Organization). This tiny Tier 1 segment dominates Rich Results impressions and AI-search citations across every category we measured.

The winners

The shape of the distribution is striking once you split it. Tier 4 (no schema) is 29% of sites — a large but shrinking minority. Tier 3 (deployed but broken) is 49% of sites — the largest segment by far, and the segment where the biggest competitive wins live. Tier 2 (clean but minimal, ≤3 schemas) is 14%, and Tier 1 (clean and comprehensive, ≥5 schemas) is the dominant 8%. Most agencies and in-house teams sit in Tier 3 — schema is checked off as a project, but it does not actually pass validation, and therefore does not qualify for the Rich Results or AI-citation lift it could.

04Schema TypesWhat is actually shipped across 5,000 sites.

Among the 71% of sites that emit any schema, a handful of @types dominate. Organization, WebPage, and BreadcrumbList together form the “baseline” that every well-tended site should ship regardless of vertical. Article, Product, LocalBusiness, and Person are the content-type-specific layer. Below them sit the specialty-vertical schemas (VideoObject, Recipe, Event) that only apply to a subset of the sample.

% of audited sites carrying at least one of this @type

Source: Digital Applied 5K-Site Schema Audit · Apr 2026
OrganizationBrand entity, logo, sameAs · the universal baseline
61%
most-shipped
WebPagePer-page identity · breadcrumb anchor target
54%
BreadcrumbListHierarchical navigation · drives breadcrumb rich result
38%
Article / BlogPosting / NewsArticleEditorial content · headline, datePublished, author
29%
ProductEcommerce items · 73% of ecommerce sites overall
18%
LocalBusinessPhysical storefronts, NAP, hours · local-pack signal
14%
Person (author markup)E-E-A-T signal · author entity for publishers
12%
WebSite (with SearchAction)Sitelinks search box · brand-query enhancement
11%
VideoObjectVideo content · thumbnail, duration, uploadDate
9%
RecipeMostly publishers · ingredient + instruction lists
4%
EventConferences, performances · location + date
3%
The combinations matter more than the count
Sites that ship Organization + WebSite + BreadcrumbList as a universal baseline, plus the right content-type schema (Article for editorial, Product for ecommerce, LocalBusiness for storefronts), outperform sites that ship more schemas in random combinations. The pattern is consistent: three correctly-combined schemas beat five weakly-validated ones every time. This is why the Tier 1 (8%) cohort dominates citation rates despite not always shipping the longest schema list.

05ErrorsFive error patterns account for most failures.

Validation errors are not random. Across the 78% of sites in the deploy-but-broken segment (Tier 3 plus the broken portion of Tier 2), five well-known error patterns account for over 90% of total failures. Every one of them is catchable in CI with a small JSON-LD validator wired into the build — yet almost no team in the sample had any continuous validation in place beyond the occasional one-off Rich Results Test on launch day.

Pattern 1 · 38%
Missing required props

The single largest failure mode. Article schemas missing headline, datePublished, or author. Product schemas missing offers or aggregateRating (when claimed). Organization schemas missing logo or sameAs. Every schema.org @type has a required-field list — missing any one disqualifies the page from rich results entirely.

38% of error pages
Pattern 2 · 24%
Invalid date format (must be ISO-8601)

datePublished and dateModified must be ISO-8601 (2026-04-26 or 2026-04-26T14:30:00Z). The most common failure is a CMS that emits 'April 26, 2026' or '04/26/2026' as a string. Google's parser silently rejects the date, and the Article schema falls out of rich-result eligibility even though every other field validates.

24% of error pages
Pattern 3 · 12%
Wrong @type for page content

Product schema deployed on a non-product page (a category index, an article, a blog tag). Article schema on a homepage. Recipe schema on an editorial 'best of' roundup. Schema must describe the actual primary entity of the page — type-content mismatch is treated as deceptive markup and risks a manual penalty in addition to losing eligibility.

12% of error pages
Pattern 4 · 9%
Missing or invalid image dimensions

image[].width and image[].height are required for many rich-result types. Many CMS plugins emit the URL but skip dimensions, or emit dimensions but skip the URL. Either combination invalidates the image — and image is a required field for Article rich results above the AMP era.

9% of error pages
Pattern 5 · 7%
Duplicate @id values across pages

Every entity in your structured-data graph should have a unique @id (typically the absolute URL of the entity it describes). Sites that use a single static @id like 'https://example.com/#article' for every article fragment confuse the entity graph — Google merges them into one entity description and the per-page article context is lost.

7% of error pages

Two more failure modes round out the long tail: schema present in a location Google does not parse (5% — most often inside <noscript> or rendered client-side after the initial DOM commit), and conflicting structured data describing the same entity (5% — two different schemas with overlapping @id values that contradict each other on basic facts). Both are easy to spot with a Lighthouse audit; neither is what most teams check.

Risky schemas observed (do not deploy)
18% of sites using FAQPage schema are not in the gov/health categories where it remains eligible — meaningful manual-action risk. 8% deploy HowTo on retired-eligibility content (Google deprecated HowTo rich results in September 2023). 11% misuse Review schema for editorial content rather than formal product reviews. None of these are eligible for rich results in 2026, and FAQPage misuse in particular is a category where Google has issued spam-policy actions in the past 18 months. If your CMS plugin emits any of these by default, turn the emission off.

06By CMSThe CMS distribution inverts what you would guess.

Adoption rates by CMS produce a counter-intuitive picture. WordPress leads at 78% — the Yoast / RankMath / Schema Pro plugin ecosystem does most of the work. Shopify hits 89% on Product schema thanks to theme defaults, but only 31% of Shopify stores pair Product with Organization (the combination Google rewards on commercial queries). Custom/raw HTML lags at 19% adoption — but the hand-rolled sites that do ship schema overwhelmingly ship it validated.

Schema-adoption rate by CMS platform · % of sites with ≥1 schema

Source: Digital Applied 5K-Site Schema Audit · Apr 2026
Shopify (Product schema)Theme defaults · 31% pair with Organization
89%
highest single-type
WordPress (any schema)Yoast / RankMath / Schema Pro plugin ecosystem
78%
WebflowManual or third-party app · Webflow CMS bindings
41%
FramerNewer ecosystem · growing plugin support
27%
Custom / raw HTMLLowest adoption · highest per-instance validity
19%
SquarespaceLimited plugin ecosystem · Article on blogs only
19%
WixBuilt-in Article + Organization · low customization
18%

The pattern for ecommerce is sharper. 73% of ecommerce sites in the sample emit Product schema — a big number on its surface, but only 41% of those Product schemas pair with an Organization schema, and only 19% include the Offer object that Google requires for the price-and-availability rich result. Shopify themes emit a Product block by default; very few shop owners or developers know to pair it with the rest of the entity graph. The result is a large ecommerce cohort that ships Product schema, fails Rich Results cleanly, and then concludes that “schema does not help our store.”

07AI Citation LiftValid schema and AI-search citations correlate cleanly.

The AI-search citation correlation is the single most actionable finding in the audit. Across the 5,000-site sample, sites that pass the Rich Results Test cleanly are cited noticeably more often in Google AI Overviews, Perplexity, and ChatGPT search than sites that either skip schema or deploy it broken. The correlation is +0.34 Pearson — not enormous in absolute terms, but at this sample size it is conclusive, and the per-combination lifts are large enough to matter for any growth team.

Headline correlation
+0.34r
Pearson · valid schema vs citation rate

Sites that pass Rich Results Test on every detected @type are cited 30-50% more often across AI Overviews, Perplexity, and ChatGPT search than sites that either ship no schema or ship invalid schema. Correlation is consistent across vertical and CMS.

Strongest 2026 SEO signal
Article + BreadcrumbList
+47%
Citation lift vs no-schema baseline

The strongest single combination in the data. Editorial sites that ship valid Article + BreadcrumbList together (and pass Rich Results) are cited +47% more often in AI Overviews on informational queries than matched sites with no schema.

Editorial
Product + Offer
+29%
Citation lift on commercial queries

Ecommerce sites that pair valid Product + Offer (with price, availability, and currency) are cited +29% more often in commercial-intent AI search than ecommerce sites with Product alone or with broken Offer markup.

Ecommerce
Organization + WebSite
+18%
Citation lift on brand queries

Sites that ship valid Organization + WebSite (with the SearchAction object) are cited +18% more often on branded queries — the entity-graph signal that disambiguates the brand from same-name competitors and reinforces the canonical site URL.

Brand
"Three correctly-combined schemas beat five weakly-validated ones every single time."— From the citation-lift sub-analysis

08Tiers + FrameworkThe four tiers and the five-stage audit we run.

Splitting the 5,000-site sample into quality tiers is the cleanest way to see where each cohort actually sits. The pattern is almost comically clear: a small Tier 1 cohort dominates rich-result impressions and AI citations; an enormous Tier 3 cohort has done most of the work but lost most of the value because nobody runs the validator against every deploy.

Tier 1 · 8%
Clean, comprehensive (≥5 schemas, all valid)
Article + BreadcrumbList + Person + Organization + WebSite

The 8% that wins. Ships the right combinations for the content type, validates every deployment in CI, and updates schema when content changes. Dominates Rich Results impressions and AI-search citations across every vertical we measured. Most are publishers, B2B SaaS, and tier-1 ecommerce.

The winners
Tier 2 · 14%
Clean but minimal (≤3 schemas, all valid)
Organization + WebPage + BreadcrumbList only

Ships the universal baseline correctly but does not add the content-type-specific layer (Article, Product, LocalBusiness). Captures the basic brand-graph and breadcrumb wins but misses the higher-leverage rich results and citation lifts that come from per-content-type schemas.

Halfway there
Tier 3 · 49%
Deployed with errors (the broken middle)
Schema shipped · ≥1 error per template

The largest single cohort in the data — and the largest under-priced opportunity. Schema is deployed (the project shipped), but it fails Rich Results validation on at least one detected @type. Most teams do not know they are in this tier because nobody ran the validator after launch.

Most agencies live here
Tier 4 · 29%
No schema deployed
Zero JSON-LD blocks emitted

No structured data at all. Predominantly small custom-HTML sites and Squarespace properties without the optional schema add-on. The fastest first-month wins live here, but only after the Tier 3 cohort has been moved to Tier 1 — Tier 3 fixes are higher leverage per hour invested.

Greenfield

The audit framework we run on every client engagement collapses to five sequential stages. Stages 1-2 are the up-front analysis; Stages 3-4 are the deployment work; Stage 5 is the continuous integration that prevents regression. The mistake we see most often is teams that do Stages 1-4 once and then never run Stage 5 — within twelve months they are back in Tier 3 because content edits have broken the schema without anyone noticing.

Stage 1
Identify content type, choose 2-3 most-eligible schemas

For each page template (homepage, product, article, category, contact), identify the actual primary entity and select the schema.org @types that describe it. Most templates need 2-3 schemas; do not over-decorate. Type-content mismatch is the highest-risk error pattern in the audit.

Per-template
Stage 2
Validate against Rich Results Test for every template

Run every chosen template through Google's Rich Results Test before deploying to production. If a template fails, fix it now, not later — Tier 3 is created by skipping this stage. Capture the passing JSON-LD as a fixture for the CI test in Stage 5.

Pre-launch
Stage 3
Add Organization + WebSite + BreadcrumbList as universal baseline

Every well-tended site in 2026 should ship Organization + WebSite (with SearchAction) + BreadcrumbList as a universal baseline, served from the layout and inherited by every page. This is the entity-graph foundation; everything else stacks on top.

Universal
Stage 4
Deploy content-specific schema (Article, Product, LocalBusiness)

Layer the content-type-specific schema on top of the universal baseline, scoped to the right template. Article on editorial; Product + Offer on commerce; LocalBusiness on storefronts. Every additional @type must be on a page where it actually describes the primary entity.

Per content type
Stage 5
Continuous monitoring (CI check on every deploy)

Wire a JSON-LD validator into CI that runs against every page-template snapshot on every PR. Block merges that introduce schema errors. This is the difference between sustained Tier 1 and inevitable Tier 3 drift — without Stage 5, every team eventually breaks their own schema as content evolves.

Continuous

09ConclusionThe realistic 2026 picture.

Schema markup in 2026

Adoption is the ceiling. Validation is the lever.

The conventional wisdom on schema markup — “ship it, you will see traffic gains” — was correct in 2018 and is now actively misleading. In 2026, shipping schema is table stakes; 71% of sites already do. The differentiator is whether the schema you shipped actually validates, and whether you keep it valid as your content evolves.

The single most defensible 2026 SEO investment we see, in dollars per impression of organic and AI-search lift, is moving an existing site from Tier 3 (deployed but broken) to Tier 1 (clean and comprehensive). It is rarely more than two engineering weeks of work, the win is verifiable inside Google's own tools within 48 hours, and the AI-citation lift compounds for the lifetime of the deployment.

The next frontier — already visible in the Tier 1 cohort — is entity-graph completeness: cross-linked @id values, consistent sameAs declarations, and tightly-linked Organization-Person-Article triples that let AI search engines traverse the site as a coherent entity rather than a list of pages. That is the work that separates the 8% that dominates citations from the 14% that just barely passes validation. The audit framework above is the path between them.

Schema audit + remediation

Move from deployed-but-broken to clean-and-cited.

We run schema audits and entity-graph buildouts for technical SEO teams that want to move from Tier 3 to Tier 1. The engagement is bounded (typically two to four weeks), the validation lift is verifiable in Google's own tools within 48 hours, and the AI-citation lift compounds for the lifetime of the deployment.

Free consultationExpert guidanceTailored solutions
What we work on

Schema engagements

  • Full Rich Results Test audit across every page template
  • Entity-graph buildout — Organization + WebSite + Person triples
  • Per-template schema deployment (Article, Product, LocalBusiness)
  • CI validator wired into the build to prevent Tier 3 drift
  • AI-citation tracking across AI Overviews, Perplexity, ChatGPT
FAQ · Schema markup audit

The questions we get every week.

Pearson correlation coefficient between two variables: (1) whether a site passes Google's Rich Results Test cleanly across every detected @type, and (2) the site's citation rate across AI Overviews, Perplexity, and ChatGPT search over a 30-day window. +0.34 is a moderate positive correlation — not a deterministic relationship, but at our sample size (n=5,000) it is statistically conclusive, and the per-combination citation lifts (+47% Article+BreadcrumbList, +29% Product+Offer, +18% Organization+WebSite) are large enough to be commercially meaningful. Treat schema validation as one of several inputs to AI-search visibility, not the only one.