Analytics & Insights26 min readUpdated April 16, 2026

Analytics Glossary 2026: 200+ Data and Metrics Terms

200+ analytics terms defined for 2026 covering GA4, attribution models, data warehousing, KPIs, and AI-powered predictive analytics methods.

Analytics vocabulary has exploded since GA4 deprecated Universal Analytics, warehouse-native stacks replaced point tools, and AI features entered every reporting layer. Marketers, product managers, and founders now read dashboards that mix product analytics, attribution science, experimentation statistics, and predictive models in the same view — often without a shared definition of what each number means.

This glossary defines the 200+ analytics terms we reference most often in agency work: the core metrics that power dashboards, GA4-specific concepts, attribution models, data warehousing vocabulary, marketing KPIs, product analytics language, experimentation statistics, and AI-driven predictive techniques. Each definition is short enough to scan and precise enough to use in a stakeholder conversation.

1. Core metrics and dimensions

The foundational building blocks of every analytics tool. These terms predate GA4 and will outlast it — every dashboard, SQL query, or BI view is built from some combination of the concepts below.

MetricA quantitative measurement, always a number (sessions, revenue, bounce rate). Metrics are what you aggregate; dimensions are what you aggregate by.
DimensionA qualitative attribute used to slice data (country, device, campaign). A single dimension value pairs with many metric values.
SessionA group of user interactions within a bounded time window. GA4 defaults to a 30-minute inactivity timeout before starting a new session.
UserA unique visitor identified by a client ID, user ID, or device fingerprint. GA4 reports active users (users with engaged sessions in the period) by default.
PageviewA recorded instance of a page loading in a browser. In GA4, pageviews are a specific event (page_view) rather than a separate hit type.
EventAny discrete user interaction — click, scroll, video play, form submission. GA4 treats pageviews and conversions as specialized events.
ConversionAn event with business value: purchase, signup, lead form, qualified meeting. In GA4 these are now called "key events."
Engagement rateThe percentage of sessions that lasted longer than 10 seconds, had a conversion, or had two or more pageviews. GA4's replacement for bounce rate.
Bounce rateThe percentage of sessions that were not engaged. Calculated as 1 – engagement rate in GA4; the legacy "single pageview" definition no longer applies.
Engaged sessionA GA4-specific concept: a session meeting one of the engagement criteria above. Most reports default to engaged sessions rather than all sessions.
SegmentA subset of users, sessions, or events filtered by a condition. Segments are the primary lens for comparing performance across groups.
AudienceA persistent group of users meeting defined criteria, usable for targeting and reporting. Audiences persist across sessions; segments are typically ad-hoc.
CohortA group of users who share an acquisition characteristic — usually the week or month they first visited. Cohort tables show retention over time.
Average engagement timeTime the browser tab was in focus during a session. More accurate than legacy "average session duration" which counted background time.
Pages per sessionTotal pageviews divided by total sessions. A quick depth-of-engagement indicator for content sites.
New vs returning usersA split based on whether the client ID had been seen previously. Returning users typically convert at 2–3× the rate of new users.
Exit rateThe percentage of sessions that ended on a given page. High exit rate on a checkout page matters; on a thank-you page it does not.
Landing pageThe first page viewed in a session. The single strongest dimension for diagnosing acquisition and SEO performance.
Source / mediumThe origin of a session: source is the platform (google, newsletter_april), medium is the category (organic, cpc, email).
Channel groupingA rule-based bucketing of source/medium into categories (Organic Search, Paid Social, Direct). GA4 allows custom channel groups.
Goal valueThe monetary value assigned to a conversion event. Required for calculating ROAS and comparing campaigns across conversion types.
SamplingWhen an analytics tool calculates metrics from a subset of sessions rather than all of them. Triggers at high query complexity; BigQuery export eliminates sampling.
Data thresholdGA4's automatic suppression of rows with low user counts to prevent re-identification. Can cause totals to not match sums.
CardinalityThe number of unique values a dimension has. High-cardinality dimensions (page URL, user ID) are the main driver of sampling.
Rollup propertyA GA4 360 feature that combines data from multiple source properties into a single reporting view, useful for multi-brand/multi-region accounts.

2. GA4-specific concepts

Google Analytics 4 introduced a new data model — events, parameters, and user properties replaced the hit-based Universal Analytics schema. These terms show up exclusively in GA4 contexts. For adoption context, see our GA4 adoption statistics.

Data streamA source of data sent to a GA4 property. Types: web, iOS, Android. A single property can combine multiple streams for cross-platform reporting.
Event parameterAdditional context attached to an event (page_title, value, method). Parameters become custom dimensions after registration.
User propertyA persistent attribute of a user (plan_type, account_age). Unlike parameters, user properties persist across all events for that user.
Enhanced measurementAutomatic event tracking GA4 enables by default: scrolls, outbound clicks, site search, video engagement, file downloads, form interactions.
Recommended eventsGoogle's predefined event schema for common use cases (purchase, sign_up, add_to_cart). Using recommended names unlocks extra reporting features.
Custom eventAn event you define yourself for business-specific actions. Must follow GA4's naming rules (snake_case, under 40 characters).
Key event (conversion)GA4's current name for conversion events. Marked in admin; only key events are used in Google Ads import and conversion paths.
ExplorationsGA4's ad-hoc analysis workspace. Includes free-form tables, funnel, path exploration, segment overlap, cohort, and user lifetime reports.
DebugViewA real-time event stream view in GA4 Admin, used to validate implementations. Requires debug mode enabled on the sending device.
BigQuery exportFree nightly (and streaming) export of raw GA4 event data to BigQuery. Required for any serious analytics work — Explorations sample at scale, BigQuery does not.
GA4 audiencesPersistent user groups defined by event/parameter/property conditions. Can be exported to Google Ads and DV360 for activation.
Predictive audienceA GA4 audience built from ML predictions: likely 7-day purchasers, likely 7-day churners, predicted revenue users. Requires minimum data volume.
Consent Mode v2Google's framework for respecting user consent signals. When consent is denied, GA4 uses modeled conversions instead of observed data.
Modeled dataMachine-learning estimates GA4 fills in for users who declined cookies or cross-device tracking. Indicated by a small icon on affected metrics.
Attribution pathsThe sequence of channels that preceded a conversion. Available in GA4's Attribution reports with lookback windows up to 90 days.
Data retentionHow long GA4 keeps user-level data. Default is 2 months; standard properties cap at 14 months. BigQuery export keeps data indefinitely.
Cross-domain trackingStitching sessions across owned domains (brand.com → checkout.brand.com). Configured in data stream settings, not via tags.
Internal traffic filterA GA4 data filter that excludes traffic from specified IP ranges. Critical for small sites where team browsing skews metrics.
Unwanted referralsA list of domains whose sessions should not be treated as new referrals — typically payment processors and auth providers.
Measurement ProtocolA server-side API for sending events to GA4 directly, used for offline conversions, CRM events, and deduplicating with client-side tracking.
Server-side GTMA Google Tag Manager container that runs on a server you control. Reduces first-party script load and enables event enrichment before sending to destinations.
User IDA unique authenticated identifier sent to GA4 to stitch sessions across devices. Enables accurate LTV and retention reporting for logged-in users.
Google SignalsAn optional GA4 feature that uses Google's own cross-device graph for signed-in users. Enables demographic reports and cross-device remarketing.
Property vs accountAn account can contain many properties; each property represents one business entity. Permissions can be granted at either level.
Data APIGoogle's official programmatic interface for pulling GA4 reporting data. Powers embedded dashboards, Looker Studio, and custom alerts.

3. Attribution models

How credit for a conversion is distributed across the touchpoints that preceded it. The rise of privacy-driven modeled data has pushed attribution from deterministic path-based rules toward data-driven and incrementality-based methods.

First-touch attributionAssigns 100% of credit to the first channel that introduced the user. Good for brand-awareness analysis; poor for measuring closing channels.
Last-touch (last-click) attributionAssigns 100% of credit to the most recent marketing touchpoint before conversion. The legacy default; systematically over-credits branded search and retargeting.
Last non-direct clickA variation that ignores direct traffic when attributing, on the theory that direct usually represents a user returning after discovery.
Linear attributionDistributes credit evenly across all touchpoints in the conversion path. Simple and fair, but treats all interactions as equally influential.
Time-decay attributionWeights touches closer to conversion more heavily, using exponential decay (default half-life: 7 days). Good for short consideration cycles.
U-shaped (position-based) attributionAssigns 40% to first touch, 40% to last touch, and distributes 20% across middle touches. Credits both discovery and closing.
W-shaped attributionExtends U-shaped by also giving 30% weight to a middle event (typically lead creation), with 30% first, 30% last, and the remainder distributed among other touches.
Data-driven attribution (DDA)Google's default GA4/Ads model that uses machine learning to assign credit based on actual conversion path performance. Uses a Shapley-style method under the hood.
Markov chain attributionA probabilistic model that represents customer journeys as state transitions; credit is assigned by measuring the "removal effect" — how conversion probability drops when a channel is removed.
Shapley value attributionA game-theory method that calculates each channel's marginal contribution across all possible channel combinations. Mathematically rigorous but computationally expensive.
Media Mix Modeling (MMM)Top-down regression using aggregated spend and outcome data. Not dependent on cookies — increasingly popular post-iOS 14.5. Examples: Meridian (Google), Robyn (Meta).
Incrementality testingHoldout or geo-based experiments that measure the true causal lift of a channel, independent of any attribution model. The gold standard for measurement.
Conversion lift studyA randomized holdout test run inside an ad platform (Meta, Google, TikTok) that compares conversions between exposed and unexposed groups.
Geo holdout testA platform-agnostic incrementality design that pauses a channel in randomly chosen geographies and compares performance to matched control regions.
Lookback windowThe maximum time before a conversion during which touchpoints receive credit. GA4 defaults are 30 days for acquisition, 90 days for paid search.
View-through conversionA conversion where the user saw but did not click the ad. Controversial — prone to over-crediting display and video — but sometimes the only available signal.
Assisted conversionA conversion where the channel appeared in the path but was not the final click. Reports like GA4's Conversion Paths highlight these.
Attribution windowThe combined click + view window used by an ad platform to credit conversions. Meta defaults shifted to 7-day click / 1-day view post-iOS 14.5.
Attribution model comparisonA side-by-side view of how credit shifts across models. Always diagnostic; never a substitute for incrementality.
Unified Marketing Measurement (UMM)A hybrid approach that combines MTA (for optimization), MMM (for budget planning), and incrementality (for validation). The modern measurement stack.
Conversion modelingEstimating conversions that cannot be observed due to consent or tracking limits. Google, Meta, and others fill reporting gaps this way.
Enhanced conversionsGoogle Ads' feature for sending hashed first-party identifiers with conversion events to improve match rates and reduce modeling reliance.
Conversions API (CAPI)Meta's server-side conversion API — the direct analogue to Enhanced Conversions. Strongly recommended for iOS performance.
Probabilistic attributionAttribution based on statistical similarity (IP, user-agent, timestamps) rather than deterministic identifiers. Lower accuracy than cookie-based tracking but resilient to privacy changes.
Dark socialTraffic from untracked sharing (WhatsApp, Slack, email, copy- paste). Shows up as direct in analytics and cannot be attributed to a campaign.

4. Data warehousing and modeling

Modern analytics lives in a warehouse — Snowflake, BigQuery, Redshift, or Databricks. These terms describe how data is moved, stored, modeled, and made trustworthy before it reaches a dashboard.

Data warehouseA columnar, query-optimized store for analytical workloads. Designed for read-heavy aggregation across billions of rows.
Data lakeCheap object storage (S3, GCS) holding raw, schema-on-read data in formats like Parquet or JSON. Used for archival and flexibility.
LakehouseA hybrid architecture combining lake economics with warehouse semantics — via transactional table formats (Iceberg, Delta, Hudi). Databricks popularized the term.
ETLExtract, Transform, Load. The legacy pattern where transformations happen before loading into the warehouse. Bottlenecked by the transformation engine.
ELTExtract, Load, Transform. The modern pattern — load raw data into the warehouse, then transform inside it using SQL. Leverages warehouse scale and separation of storage/compute.
Reverse ETLPushing warehouse data out to operational tools (Salesforce, HubSpot, Iterable). Vendors: Hightouch, Census, RudderStack.
CDC (Change Data Capture)A technique for streaming database changes in near real time. Powers tools like Debezium, Fivetran HVR, and Airbyte's CDC connectors.
Fact tableA warehouse table that stores measurable, event-like records (orders, sessions, page_views). Usually the largest tables in the schema.
Dimension tableA table holding descriptive attributes joined onto facts (customers, products, campaigns). Usually small but high-impact on report readability.
Star schemaA warehouse design pattern: one fact table surrounded by dimension tables joined on keys. Simple, performant, and easy to explain.
Snowflake schemaA normalized variant where dimension tables are themselves split into sub-dimensions. Reduces storage; increases join complexity.
Slowly Changing Dimension (SCD)A pattern for handling historical changes in dimension attributes. Type 2 (keep history with valid_from/valid_to columns) is the most common.
Surrogate keyAn artificial primary key (usually auto-incremented or hashed) used instead of business keys to simplify joins across sources.
GrainThe level of detail a fact table captures (one row per order, per line item, per session). Defining grain is the first modeling decision.
dbtThe de-facto SQL transformation tool. Treats SQL models as code: version-controlled, tested, documented. Core to modern ELT.
dbt modelA single SQL file representing one materialized view or table in the warehouse. Models can reference other models via the {{ ref() }} function.
AirflowApache Airflow, the open-source orchestrator for scheduling and monitoring pipelines. Dominant in data engineering; increasingly replaced by Dagster, Prefect, and cloud-native schedulers.
Semantic layerA metric-definition layer between the warehouse and BI tools — used so that "revenue" means the same thing everywhere. dbt Semantic Layer, Cube, LookML.
Data catalogA searchable inventory of datasets, owners, schemas, and lineage. Tools: Atlan, Collibra, DataHub, Amundsen.
Data lineageA graph showing how a field was produced — which sources, transformations, and models fed into it. Essential for impact analysis.
Data contractAn explicit agreement between producers and consumers about the schema, semantics, and SLAs of a dataset. Enforced via schema registries or dbt tests.
PartitioningPhysical organization of a warehouse table by a column (usually date). Reduces scan costs and speeds up queries with partition filters.
ClusteringSorting or co-locating rows by specific columns to improve filter selectivity. BigQuery clustering keys are a common example.
IdempotencyA pipeline property: re-running with the same input produces the same output. Critical for safe backfills.
Freshness SLAAn agreed-upon maximum age for a dataset at consumption time (e.g. "sales data must be under 2 hours old by 9 AM"). Monitored via tests or observability tools.

5. Marketing KPIs

The financial and efficiency metrics that translate analytics output into business language. Benchmark context is in our marketing analytics statistics and conversion rate benchmarks.

CAC (Customer Acquisition Cost)Total acquisition spend divided by new customers in the period. Usually calculated with fully loaded cost (ad spend + salaries + tooling).
LTV (Lifetime Value)The total gross margin a customer generates over their lifetime. Simple formula: (ARPU × gross margin) / churn rate.
LTV:CAC ratioThe efficiency ratio of customer value to acquisition cost. 3:1 is a common healthy benchmark for SaaS; below 1:1 is unsustainable.
CAC payback periodThe months required to recover CAC from gross margin. Under 12 months is excellent; 18–24 months is typical for B2B SaaS.
MER (Marketing Efficiency Ratio)Total revenue divided by total marketing spend, ignoring attribution. Popular in ecommerce as an attribution-free north star.
ROAS (Return on Ad Spend)Revenue divided by ad spend for a specific channel or campaign. Channel-level; contrast with blended ROAS, which is closer to MER.
Blended ROASTotal revenue across channels / total ad spend. Eliminates cross-channel attribution disputes at the cost of channel-level visibility.
nCAC (new customer CAC)CAC calculated using only new customers acquired. The modern ecommerce north star, separating growth efficiency from retention revenue.
CPC (Cost Per Click)Ad spend divided by clicks. The most common auction bid type; also an outcome of auction dynamics in CPM bidding.
CPM (Cost Per Mille)Cost per 1,000 ad impressions. The natural unit for brand and awareness campaigns, and the underlying currency of programmatic auctions.
CPA (Cost Per Acquisition)Ad spend divided by conversions. "Acquisition" can mean lead, signup, or sale — always clarify the event.
CPL (Cost Per Lead)The CPA variant specific to top-of-funnel lead events. B2B benchmark: $50–$300 depending on industry and lead quality threshold.
AOV (Average Order Value)Total revenue / total orders. The primary lever for ecommerce profitability that does not require acquiring new customers.
ACV (Annual Contract Value)The annualized contract value of a SaaS deal. Distinct from TCV (total contract value), which sums across the full term.
ARPU / ARPAAverage revenue per user / per account. ARPA is preferred for B2B; ARPU for consumer subscriptions.
Gross margin(Revenue – COGS) / revenue. The foundation of LTV and payback calculations — a 90% margin SaaS business has very different economics than a 25% margin ecommerce brand.
Contribution marginRevenue minus all variable costs (COGS, shipping, payment processing, and often attributed ad spend). The more realistic margin for unit-economics decisions.
NPSNet Promoter Score, calculated as % promoters (9–10) minus % detractors (0–6). Widely criticized methodology, widely used anyway.
CSATCustomer Satisfaction — usually a 1–5 or 1–7 rating after a specific interaction. Focused on a transaction; NPS is focused on the relationship.
CESCustomer Effort Score — "how easy was it to resolve your issue" — typically on a 1–7 scale. Strong predictor of churn in support contexts.
Conversion rateThe proportion of a defined population that completed a target action. Always report the numerator/denominator definition with the number.
Click-through rate (CTR)Clicks divided by impressions. The efficiency metric for ad creative and email subject lines.
Open rateEmail opens divided by delivered. Degraded by Apple Mail Privacy Protection (MPP) since 2021 — inflated by roughly 30% for MPP-heavy lists.
Share of voice (SOV)A brand's presence on a given keyword, platform, or topic relative to competitors. Used in SEO (% of available SERP clicks) and PR (% of media mentions).
Brand liftThe incremental change in brand awareness, favorability, or consideration caused by a campaign, measured via pre/post surveys of exposed vs control groups.

6. Product analytics

The vocabulary of Amplitude, Mixpanel, Heap, and PostHog. Product analytics is about user behavior inside an application rather than marketing performance, with its own canonical metrics for engagement and retention.

DAU (Daily Active Users)Unique users who took a meaningful action in the past day. Definition of "active" varies by product — always document it.
WAU (Weekly Active Users)Unique active users in a rolling 7-day window. More stable than DAU, less lagging than MAU.
MAU (Monthly Active Users)Unique active users in a rolling 28- or 30-day window. Most common headline engagement metric.
Stickiness (DAU/MAU)Ratio of daily actives to monthly actives. 50%+ indicates near-daily use; 20% is typical for weekly-use products.
L28 / LN metricsCount of days a user was active in the past 28 days. Meta and TikTok use variants internally. L28=28 is a true daily user.
Retention curveA plot of the share of a cohort still active over time. A flattening curve indicates product-market fit; a continuously decaying curve does not.
N-day retentionThe share of a cohort that returns on exactly day N after signup. Strict definition used by most mobile apps.
Rolling retentionThe share that returns on day N or later. More generous than N-day retention; favored for products with irregular usage patterns.
Churn rateThe share of customers (logo churn) or revenue (revenue churn) lost in a period. Monthly for consumer SaaS, annual for enterprise.
Gross retention rate (GRR)Retained revenue from the starting cohort, excluding expansion. Caps at 100%.
Net retention rate (NRR)Retained revenue including upsell and expansion. 120%+ is elite for B2B SaaS — Snowflake famously reported 158% at IPO.
Activation rateThe share of signups that complete a defined "aha moment" event within a time window. The single highest-leverage metric for most SaaS products.
Aha momentThe behavior correlated with long-term retention. Famous examples: Facebook's "7 friends in 10 days," Slack's "2,000 team messages."
North Star metricA single metric a product team optimizes for. Should capture delivered customer value — "nights booked" (Airbnb), "messages sent" (Slack) — not vanity volume.
Funnel analysisA conversion view across a sequence of steps, showing drop-off between each. The workhorse report for onboarding and checkout optimization.
Path analysisAn exploration of the sequences users actually take, rather than ones pre-defined by the analyst. Useful for discovering unexpected behaviors.
Feature adoptionThe share of eligible users who have used a specific feature at least once (or N times) in a window. Input to roadmap prioritization.
Power usersUsers whose engagement dramatically exceeds the median, typically the top 5–10%. Often drive 50%+ of revenue and referrals.
Session lengthTime from first to last event in a session. Should be tracked separately for foreground-only time in mobile apps.
Sessions per userTotal sessions divided by unique users in the period. A habit-formation indicator — trends matter more than absolute values.
Cohort heatmapA retention matrix with cohorts (rows) × time (columns), colored by retention percentage. The canonical view for spotting onboarding regressions.
Time to value (TTV)The time between signup and the activation event. Shortening TTV is a dominant onboarding strategy.
Engagement scoreA composite metric weighting multiple product actions. Used for PQL scoring, health dashboards, and churn prediction.
Product-Qualified Lead (PQL)A user whose in-product behavior indicates sales-readiness. Core concept in product-led-growth sales motions.
Session replayA recording of a user's screen during a session, overlaid on analytics events. Tools: FullStory, LogRocket, PostHog, Hotjar.

7. Experimentation and statistics

The statistical and methodological vocabulary for running trustworthy A/B tests. Treat this section as the minimum glossary a test program needs before it can claim "wins" or "lifts" with any rigor.

A/B testA randomized controlled experiment comparing two variants. The simplest and most common online experiment design.
A/B/n testA test with more than two variants. Requires correction for multiple comparisons to avoid false positives.
Multivariate test (MVT)A factorial design testing combinations of multiple elements. Requires much larger samples than A/B tests.
Null hypothesis (H0)The default assumption that there is no difference between variants. The entire frequentist framework is about evidence against H0.
Alternative hypothesis (H1)The hypothesis that there is a real difference. Can be one-sided (variant is better) or two-sided (variant is different).
p-valueThe probability of observing a result at least as extreme as the data, assuming H0 is true. Not the probability the variant "wins."
Statistical significanceA p-value below a pre-specified threshold (usually 0.05). Not a measure of business importance.
Confidence interval (CI)A range that would contain the true effect in X% of repeated experiments. A 95% CI that excludes zero implies p < 0.05.
Type I error (α)A false positive — declaring a winner when there is no real effect. The significance threshold equals the accepted Type I error rate.
Type II error (β)A false negative — missing a real effect. Power = 1 – β; a power of 0.8 is typical.
Statistical powerThe probability of detecting a real effect of a given size. Determined by sample size, baseline variance, and the detectable effect.
Minimum Detectable Effect (MDE)The smallest effect size a test has reasonable power to detect. Known before the test; calculated from sample size and baseline.
Sample Ratio Mismatch (SRM)When the actual split between variants differs from the planned split more than random variation would predict. An SRM is a show-stopper: the test cannot be trusted.
Novelty effectAn early bump in engagement caused by users reacting to newness rather than actual value. Often fades within 1–2 weeks.
Primacy effectExisting users performing worse on a new variant because they need to relearn. The mirror image of novelty.
PeekingChecking results before the pre-registered sample size is reached and stopping early if significant. Inflates false positive rates from 5% to 20%+ without sequential-testing corrections.
Sequential testingMethods (mSPRT, Always-Valid p-values, group sequential) that allow continuous monitoring without inflating Type I error. The modern answer to peeking.
Bayesian testingAn alternative framework that reports probability of being best instead of p-values. More intuitive for stakeholders; requires prior selection.
CUPED (Controlled-experiment Using Pre-Existing Data)A variance reduction technique using pre-period covariates. Pioneered at Microsoft; can cut required sample sizes by 30%+.
Stratified samplingRandomizing within pre-defined strata (e.g. country, device) to guarantee balance on those dimensions.
Multi-armed banditAn adaptive allocation algorithm that shifts traffic toward better-performing variants during the test. Trades learning for short-term earnings.
Thompson samplingA Bayesian bandit algorithm that samples from each variant's posterior to decide allocation. Popular in production personalization systems.
Holdout groupA permanent unexposed segment used to measure the aggregate impact of an entire program (e.g. all personalization, lifecycle email).
Guardrail metricA metric monitored to catch unintended harm from an experiment, even when the primary metric looks positive (page load time, unsubscribe rate, error rate).
Effect sizeThe magnitude of the treatment effect — the number your business actually cares about. Often expressed as absolute or relative lift.

8. AI and predictive analytics

Predictive and generative AI are now native features of analytics platforms. These terms describe the models, tasks, and patterns most relevant to marketing and product teams. For hands-on context, see our AI digital transformation services.

Predictive LTV (pLTV)An ML-forecast of a customer's future value based on early behavior. Used to bid for high-value customers in ad platforms via Enhanced Conversions for Leads or CAPI.
Propensity modelA model that scores a user's probability of taking an action (purchase, upgrade, cancel). The workhorse of CRM segmentation.
Churn predictionA specific propensity model for predicting customer loss. Features typically include engagement decline, support tickets, billing events.
Uplift modelingPredicting incremental effect of treatment (e.g. a discount email) on each user. Finds "persuadables" — users who only act when treated.
Lookalike modelingFinding users who resemble an existing high-value seed audience. Native to Meta, Google, and LinkedIn; also implementable in warehouse with embeddings.
Customer segmentation (clustering)Unsupervised grouping of users by behavior or attributes. Common algorithms: k-means, DBSCAN, hierarchical clustering.
RFM analysisScoring customers on Recency, Frequency, and Monetary value. The pre-ML approach to customer segmentation — still surprisingly effective.
Anomaly detectionAutomated flagging of statistical outliers in metrics. Approaches: rolling thresholds, STL decomposition, isolation forest, Prophet.
Time-series forecastingPredicting future values of a metric. Methods: ARIMA, exponential smoothing, Prophet, deep-learning models like Temporal Fusion Transformer.
ProphetMeta's open-source forecasting library. Designed to be usable by non-statisticians; strong with seasonality and holiday effects.
Feature engineeringCreating model inputs from raw data — rolling averages, ratios, lags, categorical encodings. Often the single biggest driver of model performance.
Feature storeA shared infrastructure layer for computing and serving features consistently between training and inference. Examples: Feast, Tecton, Databricks Feature Store.
Train/test splitPartitioning data into a model-training set and a held-out evaluation set. Typical split: 70/15/15 train/val/test.
Cross-validationA more robust evaluation that averages performance across multiple splits. k-fold is the standard; time-series cross-validation respects temporal ordering.
AUC / ROCArea Under the ROC Curve — a classification performance metric insensitive to class balance. 0.5 is random; 0.9+ indicates strong discrimination.
Precision / recallPrecision = true positives / predicted positives. Recall = true positives / actual positives. Always trade off against each other.
CalibrationWhether predicted probabilities match observed frequencies — a model that predicts 80% should convert 80% of the time across that bucket. Critical when ranking matters less than absolute probability.
Data driftA shift in input distributions between training and inference that degrades model performance. Monitored with PSI or KS tests.
Concept driftA shift in the underlying relationship between features and target — even if inputs look the same, the ground truth has moved. Common after major product changes.
MLOpsThe practices and tooling around deploying, monitoring, and updating ML models in production. Encompasses CI/CD, observability, and retraining pipelines.
EmbeddingA dense numerical representation of an entity (user, product, sentence) produced by a model. Used for similarity search, clustering, and as features in downstream models.
Vector databaseA datastore optimized for nearest-neighbor search over embeddings. Examples: Pinecone, Weaviate, pgvector, Qdrant.
Retrieval-Augmented Generation (RAG)A pattern where an LLM answers questions using retrieved context from a vector database. The default architecture for internal document Q&A.
LLM-based insightsNatural-language summaries generated over structured analytics output. Native in GA4 Insights, Amplitude AI, Mixpanel AI. Verify, don't trust.
Text-to-SQLLLM generation of SQL queries from natural language. Accuracy improves dramatically when grounded in a semantic layer or warehouse schema.
Agentic analyticsMulti-step AI workflows that formulate hypotheses, pull data, run analyses, and deliver findings autonomously. The current frontier in BI tooling.

Common definitional pitfalls

Bounce rate (GA4 vs UA)UA defined bounces as single-pageview sessions. GA4 inverts the concept into engagement rate. Year-over-year comparisons across the migration are not meaningful.
Sessions (GA4 vs UA)GA4 sessions are counted differently: no new session on UTM change, no midnight split. Expected to be 5–15% lower than UA for identical traffic.
Conversions (GA4 vs Ads)GA4 counts unique events by default; Google Ads counts every event. Lookback and attribution windows also differ. Discrepancies of 10–30% are normal.
Users vs sessions vs pageviewsAlways confirm the denominator when reading a conversion or bounce rate. "2% conversion rate" with session-based denominator is a different number than user-based.
CAC (with or without payroll)Investors usually expect fully loaded CAC; marketers usually quote paid-media-only CAC. The difference is often 2–4×.
LTV horizonIs LTV lifetime, 24 months, or 12 months? For a subscription product, the difference can be huge — always state the horizon.

Putting the vocabulary to work

A glossary alone doesn't improve reporting — aligning the organization on definitions does. The highest-leverage next steps, in order:

  • Document your company's canonical definitions for the 20–30 KPIs that matter most — not all 200. Pin them in your semantic layer or data catalog.
  • Enforce definitions in the stack: dbt models, Looker explores, or a Cube semantic layer are stronger than tribal knowledge in a Notion doc.
  • Separate marketing optimization metrics (ROAS, CPA) from business-truth metrics (contribution margin, net new revenue) — and measure both.
  • Choose attribution per decision, not per company: data- driven for bid optimization, incrementality for budget decisions, MMM for annual planning.
  • When an AI insight surfaces in GA4 or Amplitude, verify against the underlying query. Modeled data is not interchangeable with observed data.

Related reading

Turn this vocabulary into a working stack

We help product and marketing teams implement GA4, architect warehouses, build dbt models, and deploy predictive analytics — ending with dashboards your executives actually read.