By April 2026 most marketing leaders are tracking some flavour of AI-search performance. Few are tracking it in a way that holds up to a CFO question. Citation rate alone misses where in the answer you appear. Answer share alone misses how long the citation sticks. Position-only metrics miss the universe of queries where you are not cited at all.
AI Search Visibility Score (AISVS) is the composite metric we built to fold the four signals into a single, comparable, 0–100 number. We use it on every client engagement; this post is the full spec, including the formula derivation, the sub-metric sampling protocol, the weighting rationale, and the reference benchmarks across SaaS, B2C retail, and B2B services.
- 01AISVS is one number, four signals, weighted to reflect what predicts pipeline.Citation rate (35%), position score (25%), answer share (25%), persistence (15%). The weighting is empirical — derived by regressing each sub-metric against pipeline-influenced revenue across 28 client engagements.
- 02The 200-prompt monthly basket is the floor for a defensible score.Smaller baskets produce noisy scores that swing 8–12 points month-to-month on the same brand. The 200-prompt basket holds month-to-month variance under 3 points absent a real change in the program.
- 03Position score is the most-skipped sub-metric and the most-predictive of click-through.Citations in the first 80 words of an answer convert at 3.4× the rate of citations beyond word 200, by our agency telemetry. Tracking position changes the editorial brief away from 'get cited' toward 'get cited early'.
- 04Persistence separates evergreen reference content from news-cycle content cleanly.Median persistence: 6 weeks for news-cycle pages, 22 weeks for evergreen reference. Programs heavy on news cycles run high citation rate and low persistence; reference-heavy programs invert. Both can hit AISVS 70+ with different mixes.
- 05AISVS is comparable across categories because each sub-metric is normalised to its category baseline.Raw citation rate in B2B SaaS is structurally higher than in DTC retail. The normalisation step compares each brand to its category's 90th percentile. AISVS 80 means the same thing in either category: 'top decile presence in answer engines'.
01 — ProblemWhy a single AI-visibility metric?
The state of AI-search measurement in early 2026: every vendor ships a different metric. Profound reports Brand Visibility Index; Otterly reports Mention Rate; Ahrefs reports Brand Mentions across their LLM panel; Semrush reports AI Visibility under their AI Toolkit. Each is useful; none is comparable to the others.
The result is that marketing leaders end up reporting three or four metrics to leadership, none of which roll up to a single number, and all of which the CFO has to take on faith. AISVS is the simplifying move: one composite number, with the underlying sub-metrics retained for diagnosis.
"We had four AI-visibility numbers in our QBR deck and the CRO asked which one mattered. Nobody could answer. The next quarter we started reporting AISVS instead."— CMO, mid-market B2B SaaS, March 2026
02 — FormulaThe AISVS formula.
The full formula, written out:
AISVS = (CR_norm × 0.35)
+ (PS_norm × 0.25)
+ (AS_norm × 0.25)
+ (P_norm × 0.15)
where each sub-metric is normalised to 0–100
against the category's 90th-percentile band.Output range: 0 (no presence in any sampled answer) to 100 (top decile across all four sub-metrics). A typical mature B2B SaaS brand scores 55–72; a typical newcomer scores 12–28; a category leader scores 80+.
03 — Sub-metricsThe four sub-metrics.
Each sub-metric is defined precisely so that two analysts running the protocol arrive at the same number within a small margin. Below is the canonical definition for each.
Citation Rate
binary per prompt · averagedOf the N prompts in the monthly basket, what share contain at least one citation to the brand's domain? Sampled per engine (ChatGPT, Claude, Perplexity, Gemini) and averaged across engines for the headline number.
Headline signalPosition Score
ordinal · 0–10 per cited promptWhen the brand is cited, where in the answer? Position 1 (first 80 words) = 10 pts, position 2 (words 80-200) = 7 pts, position 3 (words 200+) = 4 pts, mention-only = 1 pt. Averaged across all cited prompts in the basket.
Conversion proxyAnswer Share
% of answer text · per cited promptWhen the brand is cited, what percentage of the answer's total word count is sourced to the brand? Token-counted; whitespace excluded. Averaged across all cited prompts; capped at 100% to prevent outlier monopolies.
Authority signalPersistence
weeks · per cited promptOnce cited, how many consecutive weekly snapshots does the citation hold before the engine swaps to a fresher source? Capped at 26 weeks to prevent stale evergreen pages from inflating the score indefinitely.
Stickiness signal04 — WeightingWeighting rationale.
The four weights (0.35 / 0.25 / 0.25 / 0.15) are not arbitrary. They are empirical, derived by regressing each sub-metric against pipeline-influenced revenue across 28 client engagements between mid-2024 and Q1 2026. The procedure is reproducible; the rationale for each weight is below.
Citation rate · 0.35
Citation rate is the largest single predictor in the regression. Brands with high citation rate but low position still see meaningful pipeline lift; brands with low citation rate cannot make it up on the other axes. Largest weight reflects this dominance.
Largest predictorPosition score · 0.25
Position score is the second-largest predictor on B2B engagements (where the answer is read in full) and the largest on consumer engagements (where the user often stops reading mid-answer). The 0.25 weight is the average across both contexts.
Reads-to-action proxyAnswer share · 0.25
Answer share captures authority — when an answer is composed mostly of one brand's content, that brand wins. Tied with position score in importance; 0.25 reflects the parallel role.
Authority weightPersistence · 0.15
Persistence matters most for evergreen reference content; it matters little for news-cycle content. The 0.15 weight reflects the average across mixed-mix programs. Programs with reference-heavy mixes see persistence's predictive power double.
Mix-dependent05 — Worked exampleA 200-prompt basket walk-through.
The example below is anonymised but real — a mid-market B2B SaaS client, March 2026 monthly cycle. The 200-prompt basket spans the client's top 10 query intents. Sampled across four engines.
Citations in 94 of 200 prompts
The client was cited at least once in 47% of the basket. Category 90th percentile is 62%; normalised score is 47/62 × 100 = 76. Weighted contribution: 76 × 0.35 = 26.6.
47/62 = 76 normalisedMean position score, cited prompts
Of the 94 cited prompts, the mean position score was 6.8 / 10 (most citations land in word range 80-200). Category 90th percentile is 7.4; normalised score is 6.8/7.4 × 100 = 92. Weighted contribution: 92 × 0.25 = 23.0.
92 normalisedMean answer share when cited
When cited, the brand contributed an average 11% of the answer text. Category 90th percentile is 18%; normalised score is 11/18 × 100 = 61. Weighted contribution: 61 × 0.25 = 15.3.
61 normalisedMean persistence on cited prompts
Mean persistence: 14 weeks. Category 90th percentile is 22 weeks; normalised score is 14/22 × 100 = 64. Weighted contribution: 64 × 0.15 = 9.6.
64 normalisedFinal AISVS = 26.6 + 23.0 + 15.3 + 9.6 = 74.5. The brand sits in the top quartile of its B2B SaaS peer set with position-and-citation-rate strengths and an answer-share gap that is the obvious focus for the next quarter.
06 — BenchmarksReference benchmarks.
The category baselines below are pulled from Q1 2026 agency sampling across roughly 380 brands. Use them to interpret a raw AISVS in context. Categories are intentionally broad; sub-category baselines are available on request.
B2B SaaS — leaders 78+, median 52, newcomers 12–28
Category leaders score 78+ AISVS with high citation rate and category-defining persistence (think Stripe, HubSpot, Notion). Median brand sits at 52, where the gap is usually answer share. Newcomers score 12–28; the floor is being raised by the answer-engines reaching for established brands by default.
Median 52DTC retail — leaders 71+, median 41, newcomers 8–18
DTC categories show lower headline citation rates than B2B SaaS (the answer-engines surface comparison content rather than brand pages). Position score and persistence carry more weight than headline citation rate. Leaders 71+, median 41.
Median 41B2B services — leaders 69+, median 38, newcomers 6–14
B2B services categories (agencies, consultancies) show the widest distribution. Top decile owns category-defining citations; long tail is invisible. The 90th-percentile gap between leader and median is the largest of the three categories tracked.
Median 38Regulated industries — leaders 64+, median 31
Regulated industries (healthcare, financial services, legal) show structurally lower scores because the answer-engines weight authoritative editorial sources (.gov, established journals) above brand domains. AISVS 64+ in these categories is a strong showing.
Median 3107 — ImplementationHow to implement AISVS.
The implementation path below assumes a small marketing or analytics team standing this up from scratch. Total time-to-first defensible score: 30 days.
Build the 200-prompt basket
Days 1–5 · 8 hoursList the top 30-50 query intents the brand is targeting. Generate 4-6 phrasings per intent — variations the engines will see. The basket should reflect actual customer search behaviour, not internal product taxonomy.
Spec the universeSample across engines
Days 6–8 · automated thereafterRun the basket through ChatGPT, Claude, Perplexity, Gemini. Capture the full answer text, the citations, and the position of each citation. Tools: Profound, Otterly, AthenaHQ, or a custom script with the engine APIs.
Sampling protocolCompute the four sub-metrics
Day 9 · template includedCitation rate (count cited prompts / total). Position score (assign 10/7/4/1 per cited prompt, average). Answer share (token-count brand contribution / total answer tokens, average). Persistence (weeks since first cited, capped at 26).
Compute layerNormalise + weight + report
Day 10 · monthly thereafterNormalise each sub-metric against the category 90th percentile (0-100). Apply the weights (0.35, 0.25, 0.25, 0.15). Compose AISVS. Report monthly with weekly snapshots flagging deltas above 4 points.
Headline number08 — ConclusionOne number, four signals.
AISVS exists so a CFO can read AI-search performance like any other line item — defensible, comparable, normalised.
Citation rate alone misses position. Answer share alone misses persistence. Position alone misses universe coverage. AISVS folds the four signals into a single 0-100 number, weighted to reflect what predicts pipeline, normalised so it is comparable across categories.
Adopt it as the headline metric. Keep the sub-metrics live for diagnosis. The composite number is what goes into the QBR; the sub-metrics are what tells the team where to invest next quarter.
The metric is open — fork it, weight it differently for your own mix, publish your variant. The point is to give the marketing function a defensible number for AI-search performance, not to own the spec. The 0.35 / 0.25 / 0.25 / 0.15 weighting is what we use; your mileage will vary by mix.