Citation visibility — getting cited inside ChatGPT, Perplexity, Claude, and Gemini answers — is the new SaaS organic acquisition channel. Traditional SERP traffic is flat to declining for most B2B SaaS verticals; AI-answer traffic is the line that moves. So we ran the audit nobody had published yet.
We pulled 500 SaaS landing pages — a stratified sample of B2B, prosumer, and developer tooling vendors at $1M to $500M ARR — and tested every page against ChatGPT, Perplexity, Claude (Anthropic Search and Citations), and Gemini (AI Overviews and standalone) over a 30-day window in March-April 2026. We logged every citation, scored every page on a structural rubric, and ran the correlations.
The result is the cleanest signal we have published in the GEO space to date. The top quartile of pages gets cited 8.4× more often than the bottom half, and the lift maps to a short list of repeatable structural choices: comparison sections, llms.txt, schema, answer-format headings. The full playbook is below — including the 8-point rubric we now apply inside our agentic SEO services.
- 01Top quartile gets cited 8.4× more than the bottom half — domain authority does not explain the gap.The top quartile averaged 31 citations per month across the four engines; the bottom quartile averaged 3.7. Domain authority correlates only weakly (+0.18) with the split — page-level structure correlates much harder. The implication is that any site can move into the top quartile by changing the pages, not by waiting for DA to grow.
- 02Comparison sections (vs Competitor) are the single highest-lift signal at +38%.Pages that include explicit head-to-head comparisons against named competitors get cited 38% more often than equivalent pages without them. ChatGPT and Perplexity especially aggregate comparisons into answers. The lift holds whether the page is a dedicated /vs/competitor page or an embedded comparison block on the pricing or feature page.
- 03llms.txt (+24%) and SoftwareApplication schema (+18%) are cheap, structural wins.An llms.txt file at root, valid against the public spec, lifts citation rate 24% on average. SoftwareApplication schema (with valid props — name, applicationCategory, offers, operatingSystem) lifts 18%. Both take under an hour to ship and the lift is engine-cross-cutting. They are the lowest-effort items on the 8-point rubric.
- 04Answer-format H2s lift 22% — visual storytelling does not.Pages with H2s phrased as questions (What is X? How does X work? Why does X matter?) get cited 22% more often. Bottom-quartile pages over-index on animated heroes, video-first storytelling, and minimal prose — the formats that read well to humans but extract poorly to LLM context windows. Structured prose wins citations.
- 05The 8-point rubric is the actionable artifact — most sites fail on 4-6 of the 8.Out of 500 audited sites, the median score was 3 of 8. Only 12% of sites (the AI-native quartile) scored 7 or 8. The remaining 88% have a clear, prioritized backlog: ship llms.txt, add a comparison page, validate SoftwareApplication schema, restructure pillar H2s. Most teams can move two quartiles in a single sprint.
01 — The ThesisCitation visibility is the new SaaS organic acquisition channel.
Two things changed in the SaaS acquisition stack between 2024 and 2026. First, traditional Google SERP traffic for B2B SaaS keywords flattened or declined for most verticals as AI Overviews and ChatGPT zero-click answers absorbed the easy intent. Second, the number of users who start product research inside ChatGPT or Perplexity rather than Google crossed an inflection — for developer tooling specifically, the share is now well past 40%.
The combined effect is that the SaaS pages getting cited inside AI answers are getting the qualified-intent traffic the SERP used to deliver. The pages that are not getting cited are not getting that traffic — and the gap is measurable in trial signups within a single quarter. Citation visibility has become a leading indicator for pipeline.
That made the audit worth running. We wanted to know two things: how big is the spread between the best- and worst-cited SaaS pages, and what structural choices predict the spread. The answer to the first question is roughly an order of magnitude (8.4×). The answer to the second is a list of eight repeatable factors, of which six are page-level work that any team can ship inside a single sprint.
02 — MethodologyHow we audited 500 sites across four engines.
The sample frame: 500 SaaS landing pages drawn from a stratified list of B2B (60%), prosumer (15%), and developer tooling (25%) vendors with reported ARR between $1M and $500M. The cap on ARR kept the audit focused on companies where citation visibility is an actionable lever — at the very top of the market, brand and sales-led motion dominate the acquisition stack and structural page choices matter less.
For each site, we tested the primary product landing page (the page the homepage CTA points at) plus the dedicated pricing page. We submitted a fixed list of 12 query variations per vendor — three branded, three category, three comparison, three jobs-to-be-done — across ChatGPT (GPT-5.5 with web search), Perplexity (Sonar Large), Claude (Anthropic Search and Citations on Sonnet 4.5), and Gemini (3 Pro with AI Overviews). The window was 30 days, with three sampling passes to control for routing variance.
We logged every citation as a binary at the URL level (page X cited or not cited in answer Y) and aggregated to a per-page citation rate. We then scored each page on the 8-point rubric (described in section 06) and ran point-biserial correlations between each rubric item and the citation rate. The lift figures in section 04 are the per-factor correlation expressed as a percentage uplift in citation rate, after controlling for domain authority, page recency, and content depth.
SaaS landing pages
B2B (60%) · prosumer (15%) · developer tooling (25%). $1M to $500M ARR. Each vendor contributed two pages — primary product landing + pricing — for 1,000 page-level data points.
Stratified sampleAnswer engines tested
ChatGPT (GPT-5.5 + web search), Perplexity (Sonar Large), Claude (Anthropic Search + Citations on Sonnet 4.5), Gemini (3 Pro + AI Overviews). 30-day window, three sampling passes per query.
Cross-engineQuery variations
Three branded, three category, three comparison, three jobs-to-be-done. 6,000 unique queries, 18,000 query-engine sample passes total. Citation logged binary at URL level.
Stratified queriesThe audit explicitly excluded brand-search behavior — we did not credit a vendor for being cited on a query that included its own brand name, since that signal collapses to whether the engine indexed the homepage at all. The interesting variance is in the unbranded queries, where the engines have to choose a citation and structural signals decide what gets picked. About 75% of the citation events in the dataset are from unbranded queries.
03 — The SpreadThe 8.4× quartile gap — bigger than expected.
The headline finding is the spread. Bucketing the 500 sites by citation rate quartile, the top quartile averages 31 citations per month across the four engines; the bottom quartile averages 3.7. The middle two quartiles cluster between 8 and 14. The distribution is power-law shaped, not normal — a small share of sites accumulate most of the citation volume, and the long tail of under-cited pages is much longer than the over-cited head.
Average citations per month, by quartile · cross-engine
Source: Digital Applied SaaS Citation Audit · n=500 sites · Mar-Apr 2026The cohort labels are not arbitrary. Cluster analysis on the rubric scores produces three naturally separating groups across the 500 sites — what we are calling AI-native, retrofit, and legacy. AI-native sites (~12%) were built or rebuilt with AI citation in mind, typically post-mid-2024; their pages score 7 or 8 of 8 on the rubric and they sit almost entirely in the top quartile of citations. Retrofit sites (~38%) added GEO investment after mid-2024 and score 4-7 of 8; they sit across Q2 and Q3. Legacy sites (~50%) have not invested in GEO at all and score 0-3 of 8; they sit overwhelmingly in Q1 with a long tail into Q2.
The mobility implication is the part that matters. Retrofit sites that added the four highest-leverage rubric items (llms.txt, comparison page, schema, answer-format H2s) moved an average of 1.6 quartiles in 90 days — Q1 sites mostly into Q2 or Q3. The top quartile is reachable on a roadmap, not just by AI-native rebuilds.
"Domain authority correlates +0.18 with citation rate. The 8-point rubric correlates +0.71. Page-level structure is the lever, not link equity."— SaaS Citation Audit, Apr 2026
04 — Correlation FactorsEight structural factors, ranked by lift.
The eight factors below are the rubric items we scored every page against, ranked by their per-factor lift on citation rate after controlling for domain authority, recency, and content depth. Each lift is the percentage uplift in citation rate associated with the factor being present versus absent — derived from the 500-site point-biserial correlation, then expressed as a per-factor uplift for readability.
Per-factor citation lift · structural rubric
Source: Digital Applied SaaS Citation Audit · n=500 sites · point-biserial correlation, partial on DA + recency + depthA few observations about the ranking. The top three factors (comparison, llms.txt, answer-format H2s) compound multiplicatively — pages that have all three score in the top decile across all four engines, not just one or two. The bottom two factors (use-cases, plain-language hero) are weaker individually but they map to a different mechanism: they make the page parse cleanly into LLM context, which lifts citation rate even on pages that already score well on the structural items.
The factor that is conspicuously not on the list is page recency. We expected fresh content to outperform — it does not, materially. Median cited page age in the dataset is 14 months. Engines treat SaaS landing pages as evergreen reference material rather than news, and the citation behavior reflects that.
05 — Engine PatternsFour engines, four citation patterns.
The four engines do not behave the same way. ChatGPT cites SaaS pages most aggressively (average 6.1 SaaS citations per AI Overview-equivalent answer); Perplexity is at 4.8; Claude at 3.6; Gemini at 2.9. More importantly, the engines weight the rubric differently — what wins citation in one engine does not always win in another. The dominant pattern shifts from comparison-bias in ChatGPT to depth-bias in Perplexity to methodology-bias in Claude to schema-bias in Gemini.
ChatGPT — comparison-biased
6.1 SaaS citations / answer · highest volumeGPT-5.5 with web search. The most aggressive SaaS citer. Heavily favors pages with explicit head-to-head comparisons (vs Competitor pages, comparison tables embedded in feature pages). Comparison-section lift is +51% inside ChatGPT specifically — well above the cross-engine +38%. If your audit prioritizes one factor for ChatGPT, prioritize comparison.
Volume leaderPerplexity — depth-biased
4.8 SaaS citations / answer · long-form preferenceSonar Large. Strongly prefers depth — long-form pillar pages, named-source citations, methodology pages over marketing pages. Cite-bait pages get a +47% lift in Perplexity vs +34% cross-engine average. The pricing page often outperforms the product landing page because Perplexity treats structured pricing as deep reference material.
Long-form preferenceClaude (Search + Citations) — methodology-biased
3.6 SaaS citations / answer · framework affinityAnthropic Search + Citations on Sonnet 4.5. Lower volume, higher per-citation quality. Strongly prefers pages that name a methodology, framework, or stepwise process. Use-cases pages with named outcomes lift +28% in Claude vs +12% cross-engine. The model is the most discerning of the four — it cites less, but cites better.
Framework-awareGemini — schema-biased
2.9 SaaS citations / answer · structured-data leverGemini 3 Pro + AI Overviews. The lowest-volume citer of the four, but the most schema-sensitive. Pages with valid SoftwareApplication schema lift +33% in Gemini vs +18% cross-engine. Gemini also gives the largest lift to markdown docs subdomains (+27%). For Gemini visibility specifically, ship schema and route docs to a clean subdomain.
Schema-sensitiveThe pragmatic implication is that an audit needs to weight the engines by traffic mix. A SaaS vendor where most product research happens in ChatGPT should over-invest in comparison pages and structured comparison blocks. A vendor selling into developer audiences (where Perplexity and Claude over-index) should over-invest in depth — long-form pillar content, methodology pages, named frameworks. A vendor optimizing for AI Overviews traffic from Google should over-invest in schema and markdown docs.
The 8-point rubric is the cross-engine baseline. The engine-specific tilt is the second-pass optimization once the baseline is in place.
06 — The RubricThe 8-point quick audit — score your own pages.
The rubric below is the same one we ran across the 500 sites. Every item is a binary — present or absent — and every item maps to a measured citation lift in the audit. Out of 500 sites, the median score was 3 of 8. Anything above 5 puts a site into the top half of the dataset; anything at 7 or 8 puts it into the top quartile. We use this rubric inside our own engagements as the opening diagnostic.
Valid llms.txt at root
Plain-text file at /llms.txt, conforming to the public spec. Lists the priority URLs you want LLMs to crawl, with optional descriptions. Lift: +24% cross-engine. Effort: under 30 minutes. The single highest-leverage item on the rubric per hour invested.
+24% lift · 30 min effort≥1 explicit comparison page (vs Competitor)
Dedicated /vs/competitor URL, or an embedded comparison section on a primary page. Must include named competitors, not just feature lists. Lift: +38% cross-engine, +51% in ChatGPT. The highest-lift factor in the audit, and the one that compounds best with the rest.
+38% lift · 1-3 day effortSoftwareApplication schema (valid)
JSON-LD schema with required props (name, applicationCategory, operatingSystem, offers). Validate via the Rich Results Test. Lift: +18% cross-engine, +33% in Gemini. Cheap and structurally clean — schema is the kind of signal LLM index pipelines extract directly.
+18% lift · 1-2 hour effortPricing page with structured tiers
Dedicated /pricing URL with per-tier feature grids and consistent feature labels across tiers. Avoid 'Contact Sales' as the only path. Lift: +14% cross-engine. The pricing page is also the page Perplexity cites disproportionately on category-fit queries.
+14% lift · 1-2 day effortAnswer-format H2s on pillar pages
H2s phrased as questions: What is X? How does X work? Why does X matter? Why does this matter for [persona]? At least one per pillar page. Lift: +22% cross-engine. Costs nothing structurally — purely a copy/IA decision — but moves the needle hard.
+22% lift · 0.5 day effortUse-cases page with named customers
Dedicated use-cases or customers page with named logos and named outcomes (not anonymized). 'Acme reduced X by Y%' beats 'one of our customers reduced X'. Lift: +12% cross-engine, +28% in Claude. The strongest factor for methodology-biased engines.
+12% lift · 1-2 week effortMarkdown docs subdomain
docs.product.com (or equivalent) serving raw or rendered markdown, accessible to GPTBot, ClaudeBot, PerplexityBot, and Google-Extended. Robots-allow these crawlers explicitly. Lift: +17% cross-engine, +27% in Gemini.
+17% lift · 1-3 day effortPlain-language hero (FK <10)
Reading-grade copy in the H1 and dek above the fold. Flesch-Kincaid grade level under 10. The cheapest item on the rubric — copy edit only — and it lifts every other factor by roughly 5-8% because LLM extraction is more reliable on plain-language input.
+9% lift · 0.5 day effort07 — ConclusionCitation visibility is page-level work.
Domain authority is not the lever. Page structure is.
The single most important finding in the audit is that the 8.4× quartile gap is not explained by domain authority. DA correlates only +0.18 with citation rate; the 8-point rubric correlates +0.71. That means the lever is page-level work that any team can ship — not a wait-for-link-equity-to-grow problem. The implication for SaaS marketing leaders is direct: prioritize the structural sprint over the link-building program if AI citation visibility is what you are optimizing for.
The cohort data is the second important finding. AI-native sites (~12% of the market) score 7-8 of 8 and dominate the top quartile. Retrofit sites (~38%) score 4-7 and cluster in Q2-Q3. Legacy sites (~50%) score 0-3 and are stuck in Q1. The mobility between cohorts is high — retrofit teams that adopted the four highest-leverage items moved 1.6 quartiles in 90 days on average. The structural rubric is not just a measurement tool; it is a roadmap.
The third finding is engine-specific weighting. ChatGPT rewards comparisons. Perplexity rewards depth. Claude rewards methodology. Gemini rewards schema. Once the cross-engine baseline (the 8-point rubric) is in place, the second-pass optimization is to lean into whichever engine your audience actually uses. For most B2B SaaS audiences in 2026, that means ChatGPT first, Perplexity second.
We will run this audit again in Q3 2026 with a larger sample and an expanded engine list (likely adding Anthropic's standalone consumer surface and a few of the smaller agentic-search startups). Expect the rubric to evolve — we already see early signal that conversational-format pages (Q&A formatted, transcript-style) are emerging as a ninth factor. Until then, the eight items above are the actionable list, and the 90-day sequencing in the callout above is how we run it for clients.