How to use this glossary

Definitions are grouped by domain, not alphabetically, so related terms stay together. Use the section headings as a table of contents, or search the page for a specific term. For a broader marketing dictionary, see our 500-term digital marketing glossary.

Analytics vocabulary has exploded since GA4 deprecated Universal Analytics, warehouse-native stacks replaced point tools, and AI features entered every reporting layer. Marketers, product managers, and founders now read dashboards that mix product analytics, attribution science, experimentation statistics, and predictive models in the same view — often without a shared definition of what each number means.

This glossary defines the 200+ analytics terms we reference most often in agency work: the core metrics that power dashboards, GA4-specific concepts, attribution models, data warehousing vocabulary, marketing KPIs, product analytics language, experimentation statistics, and AI-driven predictive techniques. Each definition is short enough to scan and precise enough to use in a stakeholder conversation.

1. Core metrics and dimensions

The foundational building blocks of every analytics tool. These terms predate GA4 and will outlast it — every dashboard, SQL query, or BI view is built from some combination of the concepts below.

Metric — A quantitative measurement, always a number (sessions, revenue, bounce rate). Metrics are what you aggregate; dimensions are what you aggregate by.

Dimension — A qualitative attribute used to slice data (country, device, campaign). A single dimension value pairs with many metric values.

Session — A group of user interactions within a bounded time window. GA4 defaults to a 30-minute inactivity timeout before starting a new session.

User — A unique visitor identified by a client ID, user ID, or device fingerprint. GA4 reports active users (users with engaged sessions in the period) by default.

Pageview — A recorded instance of a page loading in a browser. In GA4, pageviews are a specific event (page_view) rather than a separate hit type.

Event — Any discrete user interaction — click, scroll, video play, form submission. GA4 treats pageviews and conversions as specialized events.

Conversion — An event with business value: purchase, signup, lead form, qualified meeting. In GA4 these are now called "key events."

Engagement rate — The percentage of sessions that lasted longer than 10 seconds, had a conversion, or had two or more pageviews. GA4's replacement for bounce rate.

Bounce rate — The percentage of sessions that were not engaged. Calculated as 1 – engagement rate in GA4; the legacy "single pageview" definition no longer applies.

Engaged session — A GA4-specific concept: a session meeting one of the engagement criteria above. Most reports default to engaged sessions rather than all sessions.

Segment — A subset of users, sessions, or events filtered by a condition. Segments are the primary lens for comparing performance across groups.

Audience — A persistent group of users meeting defined criteria, usable for targeting and reporting. Audiences persist across sessions; segments are typically ad-hoc.

Cohort — A group of users who share an acquisition characteristic — usually the week or month they first visited. Cohort tables show retention over time.

Average engagement time — Time the browser tab was in focus during a session. More accurate than legacy "average session duration" which counted background time.

Pages per session — Total pageviews divided by total sessions. A quick depth-of-engagement indicator for content sites.

New vs returning users — A split based on whether the client ID had been seen previously. Returning users typically convert at 2–3× the rate of new users.

Exit rate — The percentage of sessions that ended on a given page. High exit rate on a checkout page matters; on a thank-you page it does not.

Landing page — The first page viewed in a session. The single strongest dimension for diagnosing acquisition and SEO performance.

Source / medium — The origin of a session: source is the platform (google, newsletter_april), medium is the category (organic, cpc, email).

Channel grouping — A rule-based bucketing of source/medium into categories (Organic Search, Paid Social, Direct). GA4 allows custom channel groups.

Goal value — The monetary value assigned to a conversion event. Required for calculating ROAS and comparing campaigns across conversion types.

Sampling — When an analytics tool calculates metrics from a subset of sessions rather than all of them. Triggers at high query complexity; BigQuery export eliminates sampling.

Data threshold — GA4's automatic suppression of rows with low user counts to prevent re-identification. Can cause totals to not match sums.

Cardinality — The number of unique values a dimension has. High-cardinality dimensions (page URL, user ID) are the main driver of sampling.

Rollup property — A GA4 360 feature that combines data from multiple source properties into a single reporting view, useful for multi-brand/multi-region accounts.

2. GA4-specific concepts

Google Analytics 4 introduced a new data model — events, parameters, and user properties replaced the hit-based Universal Analytics schema. These terms show up exclusively in GA4 contexts. For adoption context, see our GA4 adoption statistics.

Data stream — A source of data sent to a GA4 property. Types: web, iOS, Android. A single property can combine multiple streams for cross-platform reporting.

Event parameter — Additional context attached to an event (page_title, value, method). Parameters become custom dimensions after registration.

User property — A persistent attribute of a user (plan_type, account_age). Unlike parameters, user properties persist across all events for that user.

Enhanced measurement — Automatic event tracking GA4 enables by default: scrolls, outbound clicks, site search, video engagement, file downloads, form interactions.

Recommended events — Google's predefined event schema for common use cases (purchase, sign_up, add_to_cart). Using recommended names unlocks extra reporting features.

Custom event — An event you define yourself for business-specific actions. Must follow GA4's naming rules (snake_case, under 40 characters).

Key event (conversion) — GA4's current name for conversion events. Marked in admin; only key events are used in Google Ads import and conversion paths.

Explorations — GA4's ad-hoc analysis workspace. Includes free-form tables, funnel, path exploration, segment overlap, cohort, and user lifetime reports.

DebugView — A real-time event stream view in GA4 Admin, used to validate implementations. Requires debug mode enabled on the sending device.

BigQuery export — Free nightly (and streaming) export of raw GA4 event data to BigQuery. Required for any serious analytics work — Explorations sample at scale, BigQuery does not.

GA4 audiences — Persistent user groups defined by event/parameter/property conditions. Can be exported to Google Ads and DV360 for activation.

Predictive audience — A GA4 audience built from ML predictions: likely 7-day purchasers, likely 7-day churners, predicted revenue users. Requires minimum data volume.

Consent Mode v2 — Google's framework for respecting user consent signals. When consent is denied, GA4 uses modeled conversions instead of observed data.

Modeled data — Machine-learning estimates GA4 fills in for users who declined cookies or cross-device tracking. Indicated by a small icon on affected metrics.

Attribution paths — The sequence of channels that preceded a conversion. Available in GA4's Attribution reports with lookback windows up to 90 days.

Data retention — How long GA4 keeps user-level data. Default is 2 months; standard properties cap at 14 months. BigQuery export keeps data indefinitely.

Cross-domain tracking — Stitching sessions across owned domains (brand.com → checkout.brand.com). Configured in data stream settings, not via tags.

Internal traffic filter — A GA4 data filter that excludes traffic from specified IP ranges. Critical for small sites where team browsing skews metrics.

Unwanted referrals — A list of domains whose sessions should not be treated as new referrals — typically payment processors and auth providers.

Measurement Protocol — A server-side API for sending events to GA4 directly, used for offline conversions, CRM events, and deduplicating with client-side tracking.

Server-side GTM — A Google Tag Manager container that runs on a server you control. Reduces first-party script load and enables event enrichment before sending to destinations.

User ID — A unique authenticated identifier sent to GA4 to stitch sessions across devices. Enables accurate LTV and retention reporting for logged-in users.

Google Signals — An optional GA4 feature that uses Google's own cross-device graph for signed-in users. Enables demographic reports and cross-device remarketing.

Property vs account — An account can contain many properties; each property represents one business entity. Permissions can be granted at either level.

Data API — Google's official programmatic interface for pulling GA4 reporting data. Powers embedded dashboards, Looker Studio, and custom alerts.

3. Attribution models

How credit for a conversion is distributed across the touchpoints that preceded it. The rise of privacy-driven modeled data has pushed attribution from deterministic path-based rules toward data-driven and incrementality-based methods.

First-touch attribution — Assigns 100% of credit to the first channel that introduced the user. Good for brand-awareness analysis; poor for measuring closing channels.

Last-touch (last-click) attribution — Assigns 100% of credit to the most recent marketing touchpoint before conversion. The legacy default; systematically over-credits branded search and retargeting.

Last non-direct click — A variation that ignores direct traffic when attributing, on the theory that direct usually represents a user returning after discovery.

Linear attribution — Distributes credit evenly across all touchpoints in the conversion path. Simple and fair, but treats all interactions as equally influential.

Time-decay attribution — Weights touches closer to conversion more heavily, using exponential decay (default half-life: 7 days). Good for short consideration cycles.

U-shaped (position-based) attribution — Assigns 40% to first touch, 40% to last touch, and distributes 20% across middle touches. Credits both discovery and closing.

W-shaped attribution — Extends U-shaped by also giving 30% weight to a middle event (typically lead creation), with 30% first, 30% last, and the remainder distributed among other touches.

Data-driven attribution (DDA) — Google's default GA4/Ads model that uses machine learning to assign credit based on actual conversion path performance. Uses a Shapley-style method under the hood.

Markov chain attribution — A probabilistic model that represents customer journeys as state transitions; credit is assigned by measuring the "removal effect" — how conversion probability drops when a channel is removed.

Shapley value attribution — A game-theory method that calculates each channel's marginal contribution across all possible channel combinations. Mathematically rigorous but computationally expensive.

Media Mix Modeling (MMM) — Top-down regression using aggregated spend and outcome data. Not dependent on cookies — increasingly popular post-iOS 14.5. Examples: Meridian (Google), Robyn (Meta).

Incrementality testing — Holdout or geo-based experiments that measure the true causal lift of a channel, independent of any attribution model. The gold standard for measurement.

Conversion lift study — A randomized holdout test run inside an ad platform (Meta, Google, TikTok) that compares conversions between exposed and unexposed groups.

Geo holdout test — A platform-agnostic incrementality design that pauses a channel in randomly chosen geographies and compares performance to matched control regions.

Lookback window — The maximum time before a conversion during which touchpoints receive credit. GA4 defaults are 30 days for acquisition, 90 days for paid search.

View-through conversion — A conversion where the user saw but did not click the ad. Controversial — prone to over-crediting display and video — but sometimes the only available signal.

Assisted conversion — A conversion where the channel appeared in the path but was not the final click. Reports like GA4's Conversion Paths highlight these.

Attribution window — The combined click + view window used by an ad platform to credit conversions. Meta defaults shifted to 7-day click / 1-day view post-iOS 14.5.

Attribution model comparison — A side-by-side view of how credit shifts across models. Always diagnostic; never a substitute for incrementality.

Unified Marketing Measurement (UMM) — A hybrid approach that combines MTA (for optimization), MMM (for budget planning), and incrementality (for validation). The modern measurement stack.

Conversion modeling — Estimating conversions that cannot be observed due to consent or tracking limits. Google, Meta, and others fill reporting gaps this way.

Enhanced conversions — Google Ads' feature for sending hashed first-party identifiers with conversion events to improve match rates and reduce modeling reliance.

Conversions API (CAPI) — Meta's server-side conversion API — the direct analogue to Enhanced Conversions. Strongly recommended for iOS performance.

Probabilistic attribution — Attribution based on statistical similarity (IP, user-agent, timestamps) rather than deterministic identifiers. Lower accuracy than cookie-based tracking but resilient to privacy changes.

Dark social — Traffic from untracked sharing (WhatsApp, Slack, email, copy- paste). Shows up as direct in analytics and cannot be attributed to a campaign.

4. Data warehousing and modeling

Modern analytics lives in a warehouse — Snowflake, BigQuery, Redshift, or Databricks. These terms describe how data is moved, stored, modeled, and made trustworthy before it reaches a dashboard.

Data warehouse — A columnar, query-optimized store for analytical workloads. Designed for read-heavy aggregation across billions of rows.

Data lake — Cheap object storage (S3, GCS) holding raw, schema-on-read data in formats like Parquet or JSON. Used for archival and flexibility.

Lakehouse — A hybrid architecture combining lake economics with warehouse semantics — via transactional table formats (Iceberg, Delta, Hudi). Databricks popularized the term.

ETL — Extract, Transform, Load. The legacy pattern where transformations happen before loading into the warehouse. Bottlenecked by the transformation engine.

ELT — Extract, Load, Transform. The modern pattern — load raw data into the warehouse, then transform inside it using SQL. Leverages warehouse scale and separation of storage/compute.

Reverse ETL — Pushing warehouse data out to operational tools (Salesforce, HubSpot, Iterable). Vendors: Hightouch, Census, RudderStack.

CDC (Change Data Capture) — A technique for streaming database changes in near real time. Powers tools like Debezium, Fivetran HVR, and Airbyte's CDC connectors.

Fact table — A warehouse table that stores measurable, event-like records (orders, sessions, page_views). Usually the largest tables in the schema.

Dimension table — A table holding descriptive attributes joined onto facts (customers, products, campaigns). Usually small but high-impact on report readability.

Star schema — A warehouse design pattern: one fact table surrounded by dimension tables joined on keys. Simple, performant, and easy to explain.

Snowflake schema — A normalized variant where dimension tables are themselves split into sub-dimensions. Reduces storage; increases join complexity.

Slowly Changing Dimension (SCD) — A pattern for handling historical changes in dimension attributes. Type 2 (keep history with valid_from/valid_to columns) is the most common.

Surrogate key — An artificial primary key (usually auto-incremented or hashed) used instead of business keys to simplify joins across sources.

Grain — The level of detail a fact table captures (one row per order, per line item, per session). Defining grain is the first modeling decision.

dbt — The de-facto SQL transformation tool. Treats SQL models as code: version-controlled, tested, documented. Core to modern ELT.

dbt model — A single SQL file representing one materialized view or table in the warehouse. Models can reference other models via the {{ ref() }} function.

Airflow — Apache Airflow, the open-source orchestrator for scheduling and monitoring pipelines. Dominant in data engineering; increasingly replaced by Dagster, Prefect, and cloud-native schedulers.

Semantic layer — A metric-definition layer between the warehouse and BI tools — used so that "revenue" means the same thing everywhere. dbt Semantic Layer, Cube, LookML.

Data catalog — A searchable inventory of datasets, owners, schemas, and lineage. Tools: Atlan, Collibra, DataHub, Amundsen.

Data lineage — A graph showing how a field was produced — which sources, transformations, and models fed into it. Essential for impact analysis.

Data contract — An explicit agreement between producers and consumers about the schema, semantics, and SLAs of a dataset. Enforced via schema registries or dbt tests.

Partitioning — Physical organization of a warehouse table by a column (usually date). Reduces scan costs and speeds up queries with partition filters.

Clustering — Sorting or co-locating rows by specific columns to improve filter selectivity. BigQuery clustering keys are a common example.

Idempotency — A pipeline property: re-running with the same input produces the same output. Critical for safe backfills.

Freshness SLA — An agreed-upon maximum age for a dataset at consumption time (e.g. "sales data must be under 2 hours old by 9 AM"). Monitored via tests or observability tools.

From vocabulary to implementation

Knowing the terms is table stakes — using them in a working stack is the hard part. Our analytics and insights services cover GA4 implementation, warehouse architecture, dbt modeling, and executive dashboards.

5. Marketing KPIs

The financial and efficiency metrics that translate analytics output into business language. Benchmark context is in our marketing analytics statistics and conversion rate benchmarks.

CAC (Customer Acquisition Cost) — Total acquisition spend divided by new customers in the period. Usually calculated with fully loaded cost (ad spend + salaries + tooling).

LTV (Lifetime Value) — The total gross margin a customer generates over their lifetime. Simple formula: (ARPU × gross margin) / churn rate.

LTV:CAC ratio — The efficiency ratio of customer value to acquisition cost. 3:1 is a common healthy benchmark for SaaS; below 1:1 is unsustainable.

CAC payback period — The months required to recover CAC from gross margin. Under 12 months is excellent; 18–24 months is typical for B2B SaaS.

MER (Marketing Efficiency Ratio) — Total revenue divided by total marketing spend, ignoring attribution. Popular in ecommerce as an attribution-free north star.

ROAS (Return on Ad Spend) — Revenue divided by ad spend for a specific channel or campaign. Channel-level; contrast with blended ROAS, which is closer to MER.

Blended ROAS — Total revenue across channels / total ad spend. Eliminates cross-channel attribution disputes at the cost of channel-level visibility.

nCAC (new customer CAC) — CAC calculated using only new customers acquired. The modern ecommerce north star, separating growth efficiency from retention revenue.

CPC (Cost Per Click) — Ad spend divided by clicks. The most common auction bid type; also an outcome of auction dynamics in CPM bidding.

CPM (Cost Per Mille) — Cost per 1,000 ad impressions. The natural unit for brand and awareness campaigns, and the underlying currency of programmatic auctions.

CPA (Cost Per Acquisition) — Ad spend divided by conversions. "Acquisition" can mean lead, signup, or sale — always clarify the event.

CPL (Cost Per Lead) — The CPA variant specific to top-of-funnel lead events. B2B benchmark: $50–$300 depending on industry and lead quality threshold.

AOV (Average Order Value) — Total revenue / total orders. The primary lever for ecommerce profitability that does not require acquiring new customers.

ACV (Annual Contract Value) — The annualized contract value of a SaaS deal. Distinct from TCV (total contract value), which sums across the full term.

ARPU / ARPA — Average revenue per user / per account. ARPA is preferred for B2B; ARPU for consumer subscriptions.

Gross margin — (Revenue – COGS) / revenue. The foundation of LTV and payback calculations — a 90% margin SaaS business has very different economics than a 25% margin ecommerce brand.

Contribution margin — Revenue minus all variable costs (COGS, shipping, payment processing, and often attributed ad spend). The more realistic margin for unit-economics decisions.

NPS — Net Promoter Score, calculated as % promoters (9–10) minus % detractors (0–6). Widely criticized methodology, widely used anyway.

CSAT — Customer Satisfaction — usually a 1–5 or 1–7 rating after a specific interaction. Focused on a transaction; NPS is focused on the relationship.

CES — Customer Effort Score — "how easy was it to resolve your issue" — typically on a 1–7 scale. Strong predictor of churn in support contexts.

Conversion rate — The proportion of a defined population that completed a target action. Always report the numerator/denominator definition with the number.

Click-through rate (CTR) — Clicks divided by impressions. The efficiency metric for ad creative and email subject lines.

Open rate — Email opens divided by delivered. Degraded by Apple Mail Privacy Protection (MPP) since 2021 — inflated by roughly 30% for MPP-heavy lists.

Share of voice (SOV) — A brand's presence on a given keyword, platform, or topic relative to competitors. Used in SEO (% of available SERP clicks) and PR (% of media mentions).

Brand lift — The incremental change in brand awareness, favorability, or consideration caused by a campaign, measured via pre/post surveys of exposed vs control groups.

6. Product analytics

The vocabulary of Amplitude, Mixpanel, Heap, and PostHog. Product analytics is about user behavior inside an application rather than marketing performance, with its own canonical metrics for engagement and retention.

DAU (Daily Active Users) — Unique users who took a meaningful action in the past day. Definition of "active" varies by product — always document it.

WAU (Weekly Active Users) — Unique active users in a rolling 7-day window. More stable than DAU, less lagging than MAU.

MAU (Monthly Active Users) — Unique active users in a rolling 28- or 30-day window. Most common headline engagement metric.

Stickiness (DAU/MAU) — Ratio of daily actives to monthly actives. 50%+ indicates near-daily use; 20% is typical for weekly-use products.

L28 / LN metrics — Count of days a user was active in the past 28 days. Meta and TikTok use variants internally. L28=28 is a true daily user.

Retention curve — A plot of the share of a cohort still active over time. A flattening curve indicates product-market fit; a continuously decaying curve does not.

N-day retention — The share of a cohort that returns on exactly day N after signup. Strict definition used by most mobile apps.

Rolling retention — The share that returns on day N or later. More generous than N-day retention; favored for products with irregular usage patterns.

Churn rate — The share of customers (logo churn) or revenue (revenue churn) lost in a period. Monthly for consumer SaaS, annual for enterprise.

Gross retention rate (GRR) — Retained revenue from the starting cohort, excluding expansion. Caps at 100%.

Net retention rate (NRR) — Retained revenue including upsell and expansion. 120%+ is elite for B2B SaaS — Snowflake famously reported 158% at IPO.

Activation rate — The share of signups that complete a defined "aha moment" event within a time window. The single highest-leverage metric for most SaaS products.

Aha moment — The behavior correlated with long-term retention. Famous examples: Facebook's "7 friends in 10 days," Slack's "2,000 team messages."

North Star metric — A single metric a product team optimizes for. Should capture delivered customer value — "nights booked" (Airbnb), "messages sent" (Slack) — not vanity volume.

Funnel analysis — A conversion view across a sequence of steps, showing drop-off between each. The workhorse report for onboarding and checkout optimization.

Path analysis — An exploration of the sequences users actually take, rather than ones pre-defined by the analyst. Useful for discovering unexpected behaviors.

Feature adoption — The share of eligible users who have used a specific feature at least once (or N times) in a window. Input to roadmap prioritization.

Power users — Users whose engagement dramatically exceeds the median, typically the top 5–10%. Often drive 50%+ of revenue and referrals.

Session length — Time from first to last event in a session. Should be tracked separately for foreground-only time in mobile apps.

Sessions per user — Total sessions divided by unique users in the period. A habit-formation indicator — trends matter more than absolute values.

Cohort heatmap — A retention matrix with cohorts (rows) × time (columns), colored by retention percentage. The canonical view for spotting onboarding regressions.

Time to value (TTV) — The time between signup and the activation event. Shortening TTV is a dominant onboarding strategy.

Engagement score — A composite metric weighting multiple product actions. Used for PQL scoring, health dashboards, and churn prediction.

Product-Qualified Lead (PQL) — A user whose in-product behavior indicates sales-readiness. Core concept in product-led-growth sales motions.

Session replay — A recording of a user's screen during a session, overlaid on analytics events. Tools: FullStory, LogRocket, PostHog, Hotjar.

7. Experimentation and statistics

The statistical and methodological vocabulary for running trustworthy A/B tests. Treat this section as the minimum glossary a test program needs before it can claim "wins" or "lifts" with any rigor.

A/B test — A randomized controlled experiment comparing two variants. The simplest and most common online experiment design.

A/B/n test — A test with more than two variants. Requires correction for multiple comparisons to avoid false positives.

Multivariate test (MVT) — A factorial design testing combinations of multiple elements. Requires much larger samples than A/B tests.

Null hypothesis (H0) — The default assumption that there is no difference between variants. The entire frequentist framework is about evidence against H0.

Alternative hypothesis (H1) — The hypothesis that there is a real difference. Can be one-sided (variant is better) or two-sided (variant is different).

p-value — The probability of observing a result at least as extreme as the data, assuming H0 is true. Not the probability the variant "wins."

Statistical significance — A p-value below a pre-specified threshold (usually 0.05). Not a measure of business importance.

Confidence interval (CI) — A range that would contain the true effect in X% of repeated experiments. A 95% CI that excludes zero implies p < 0.05.

Type I error (α) — A false positive — declaring a winner when there is no real effect. The significance threshold equals the accepted Type I error rate.

Type II error (β) — A false negative — missing a real effect. Power = 1 – β; a power of 0.8 is typical.

Statistical power — The probability of detecting a real effect of a given size. Determined by sample size, baseline variance, and the detectable effect.

Minimum Detectable Effect (MDE) — The smallest effect size a test has reasonable power to detect. Known before the test; calculated from sample size and baseline.

Sample Ratio Mismatch (SRM) — When the actual split between variants differs from the planned split more than random variation would predict. An SRM is a show-stopper: the test cannot be trusted.

Novelty effect — An early bump in engagement caused by users reacting to newness rather than actual value. Often fades within 1–2 weeks.

Primacy effect — Existing users performing worse on a new variant because they need to relearn. The mirror image of novelty.

Peeking — Checking results before the pre-registered sample size is reached and stopping early if significant. Inflates false positive rates from 5% to 20%+ without sequential-testing corrections.

Sequential testing — Methods (mSPRT, Always-Valid p-values, group sequential) that allow continuous monitoring without inflating Type I error. The modern answer to peeking.

Bayesian testing — An alternative framework that reports probability of being best instead of p-values. More intuitive for stakeholders; requires prior selection.

CUPED (Controlled-experiment Using Pre-Existing Data) — A variance reduction technique using pre-period covariates. Pioneered at Microsoft; can cut required sample sizes by 30%+.

Stratified sampling — Randomizing within pre-defined strata (e.g. country, device) to guarantee balance on those dimensions.

Multi-armed bandit — An adaptive allocation algorithm that shifts traffic toward better-performing variants during the test. Trades learning for short-term earnings.

Thompson sampling — A Bayesian bandit algorithm that samples from each variant's posterior to decide allocation. Popular in production personalization systems.

Holdout group — A permanent unexposed segment used to measure the aggregate impact of an entire program (e.g. all personalization, lifecycle email).

Guardrail metric — A metric monitored to catch unintended harm from an experiment, even when the primary metric looks positive (page load time, unsubscribe rate, error rate).

Effect size — The magnitude of the treatment effect — the number your business actually cares about. Often expressed as absolute or relative lift.

8. AI and predictive analytics

Predictive and generative AI are now native features of analytics platforms. These terms describe the models, tasks, and patterns most relevant to marketing and product teams. For hands-on context, see our AI digital transformation services.

Predictive LTV (pLTV) — An ML-forecast of a customer's future value based on early behavior. Used to bid for high-value customers in ad platforms via Enhanced Conversions for Leads or CAPI.

Propensity model — A model that scores a user's probability of taking an action (purchase, upgrade, cancel). The workhorse of CRM segmentation.

Churn prediction — A specific propensity model for predicting customer loss. Features typically include engagement decline, support tickets, billing events.

Uplift modeling — Predicting incremental effect of treatment (e.g. a discount email) on each user. Finds "persuadables" — users who only act when treated.

Lookalike modeling — Finding users who resemble an existing high-value seed audience. Native to Meta, Google, and LinkedIn; also implementable in warehouse with embeddings.

Customer segmentation (clustering) — Unsupervised grouping of users by behavior or attributes. Common algorithms: k-means, DBSCAN, hierarchical clustering.

RFM analysis — Scoring customers on Recency, Frequency, and Monetary value. The pre-ML approach to customer segmentation — still surprisingly effective.

Anomaly detection — Automated flagging of statistical outliers in metrics. Approaches: rolling thresholds, STL decomposition, isolation forest, Prophet.

Time-series forecasting — Predicting future values of a metric. Methods: ARIMA, exponential smoothing, Prophet, deep-learning models like Temporal Fusion Transformer.

Prophet — Meta's open-source forecasting library. Designed to be usable by non-statisticians; strong with seasonality and holiday effects.

Feature engineering — Creating model inputs from raw data — rolling averages, ratios, lags, categorical encodings. Often the single biggest driver of model performance.

Feature store — A shared infrastructure layer for computing and serving features consistently between training and inference. Examples: Feast, Tecton, Databricks Feature Store.

Train/test split — Partitioning data into a model-training set and a held-out evaluation set. Typical split: 70/15/15 train/val/test.

Cross-validation — A more robust evaluation that averages performance across multiple splits. k-fold is the standard; time-series cross-validation respects temporal ordering.

AUC / ROC — Area Under the ROC Curve — a classification performance metric insensitive to class balance. 0.5 is random; 0.9+ indicates strong discrimination.

Precision / recall — Precision = true positives / predicted positives. Recall = true positives / actual positives. Always trade off against each other.

Calibration — Whether predicted probabilities match observed frequencies — a model that predicts 80% should convert 80% of the time across that bucket. Critical when ranking matters less than absolute probability.

Data drift — A shift in input distributions between training and inference that degrades model performance. Monitored with PSI or KS tests.

Concept drift — A shift in the underlying relationship between features and target — even if inputs look the same, the ground truth has moved. Common after major product changes.

MLOps — The practices and tooling around deploying, monitoring, and updating ML models in production. Encompasses CI/CD, observability, and retraining pipelines.

Embedding — A dense numerical representation of an entity (user, product, sentence) produced by a model. Used for similarity search, clustering, and as features in downstream models.

Vector database — A datastore optimized for nearest-neighbor search over embeddings. Examples: Pinecone, Weaviate, pgvector, Qdrant.

Retrieval-Augmented Generation (RAG) — A pattern where an LLM answers questions using retrieved context from a vector database. The default architecture for internal document Q&A.

LLM-based insights — Natural-language summaries generated over structured analytics output. Native in GA4 Insights, Amplitude AI, Mixpanel AI. Verify, don't trust.

Text-to-SQL — LLM generation of SQL queries from natural language. Accuracy improves dramatically when grounded in a semantic layer or warehouse schema.

Agentic analytics — Multi-step AI workflows that formulate hypotheses, pull data, run analyses, and deliver findings autonomously. The current frontier in BI tooling.

Common definitional pitfalls

Numbers that don't mean what stakeholders think

The most common cause of bad decisions isn't bad data — it's shared terms with different meanings. Below are the definitions most often confused in stakeholder meetings.

Bounce rate (GA4 vs UA) — UA defined bounces as single-pageview sessions. GA4 inverts the concept into engagement rate. Year-over-year comparisons across the migration are not meaningful.

Sessions (GA4 vs UA) — GA4 sessions are counted differently: no new session on UTM change, no midnight split. Expected to be 5–15% lower than UA for identical traffic.

Conversions (GA4 vs Ads) — GA4 counts unique events by default; Google Ads counts every event. Lookback and attribution windows also differ. Discrepancies of 10–30% are normal.

Users vs sessions vs pageviews — Always confirm the denominator when reading a conversion or bounce rate. "2% conversion rate" with session-based denominator is a different number than user-based.

CAC (with or without payroll) — Investors usually expect fully loaded CAC; marketers usually quote paid-media-only CAC. The difference is often 2–4×.

LTV horizon — Is LTV lifetime, 24 months, or 12 months? For a subscription product, the difference can be huge — always state the horizon.

Putting the vocabulary to work

A glossary alone doesn't improve reporting — aligning the organization on definitions does. The highest-leverage next steps, in order:

Document your company's canonical definitions for the 20–30 KPIs that matter most — not all 200. Pin them in your semantic layer or data catalog.
Enforce definitions in the stack: dbt models, Looker explores, or a Cube semantic layer are stronger than tribal knowledge in a Notion doc.
Separate marketing optimization metrics (ROAS, CPA) from business-truth metrics (contribution margin, net new revenue) — and measure both.
Choose attribution per decision, not per company: data- driven for bid optimization, incrementality for budget decisions, MMM for annual planning.
When an AI insight surfaces in GA4 or Amplitude, verify against the underlying query. Modeled data is not interchangeable with observed data.

Make Your Analytics Actually Decide

Vocabulary makes the conversation possible; attribution, experimentation, and predictive models make the decisions better. We build both.

Get Started Explore Analytics & Insights

Free consultation

Expert guidance

Tailored solutions

Analytics Glossary 2026: 200+ Data and Metrics Terms

Key Takeaways

1. Core metrics and dimensions

2. GA4-specific concepts

3. Attribution models

4. Data warehousing and modeling

5. Marketing KPIs

6. Product analytics

7. Experimentation and statistics

8. AI and predictive analytics

Common definitional pitfalls

Putting the vocabulary to work

Make Your Analytics Actually Decide

Frequently Asked Questions

Related Guides

Key Takeaways

1. Core metrics and dimensions

2. GA4-specific concepts

3. Attribution models

4. Data warehousing and modeling

5. Marketing KPIs

6. Product analytics

7. Experimentation and statistics

8. AI and predictive analytics

Common definitional pitfalls

Putting the vocabulary to work

Make Your Analytics Actually Decide

Frequently Asked Questions

How is this glossary organized?

What makes 2026 analytics terminology different from a few years ago?

Does this glossary still cover Universal Analytics (GA3) terms?

Which attribution model terms matter most in 2026?

How does this glossary handle privacy-era terminology?

How often is this glossary refreshed?

Related Guides