Usage-based pricing is no longer a niche experiment for developer tools — it is the dominant architectural response to a structural shift in SaaS economics: when AI does the work, seat count stops being a proxy for value delivered, and variable LLM inference cost becomes a line item that can destroy gross margin if the pricing model isn't designed around it.

According to OpenView Partners' 2023 survey, 61% of SaaS companies had already adopted some form of usage-based pricing, with a further 21% planning to test it. But pure pay-as-you-go remains the minority position — only 15% of companies run a largely usage-based or PAYG model. Three times as many (46%) take a hybrid approach that bundles usage within seat plans or layers usage fees on top. That gap between theory and practice reflects the real operational tension: UBP aligns pricing with value, but it also introduces revenue unpredictability and, for buyers, bill-shock risk.

This guide covers the three pricing eras in SaaS, the LLM inference cost dynamic that is forcing the transition, the four archetypes available today, a gross margin exposure matrix mapping them against COGS volatility, real-world vendor implementations, and a migration playbook for founders deciding which model fits their product. Internal-link context: current LLM API token costs feed directly into the COGS exposure math — read that alongside this framework.

Key takeaways

01
Seat-based pricing has a structural AI problem.When AI handles resolutions, completions, and generated output, usage scales independently of human headcount. Charging per seat while absorbing variable LLM inference COGS as a vendor means your gross margin is exposed to a cost you don't control.
02
LLM inference cost has dropped 1,000× in three years — but frontier pricing hasn't.Per a16z's LLMflation research, commodity-tier LLM inference cost fell from $60 per million tokens (GPT-3, November 2021) to $0.06 (Llama 3.2 3B via Together.ai, November 2024). Frontier models intentionally hold premium pricing — OpenAI's o1 launched at $60 per million output tokens, matching GPT-3's 2021 price.
03
Hybrid models dominate: 46% vs 15% pure PAYG.OpenView Partners (February 2023, the most recent published survey) found 46% of SaaS companies use hybrid UBP — bundling usage within seat plans — versus only 15% running pure pay-as-you-go. Hybrid reduces revenue volatility while still aligning pricing with AI consumption.
04
Outcome-based pricing is the frontier model — but carries the highest billing complexity.Intercom Fin charges $0.99 per resolved customer interaction. Salesforce Agentforce charges approximately $2 per conversation or lead. These outcome-based models pass through all inference COGS to the buyer, but require robust metering infrastructure to track outcomes reliably at scale.
05
Bill shock is a pricing design failure, not a customer education problem.When Cursor generated a $7,225 invoice for a single developer in July 2025, the root cause was uncapped usage in an annually-billed plan — a metering architecture decision. Spend caps, usage dashboards, and alert thresholds are engineering requirements, not optional UX add-ons.

01 — ContextThree pricing eras: perpetual, seat, and usage.

Software pricing has moved through three distinct eras, each triggered by a shift in how customers access and consume software value. Understanding the arc matters because pricing transitions do not happen cleanly — most vendors today are navigating two eras simultaneously, with an existing seat-based install base and new AI-powered features that demand a different economic model.

Era 1

Perpetual license

On-premise · one-time fee

Software shipped as physical or digital media; customers paid once for a version and owned it. Upgrade revenue required a new sale cycle. Infrastructure was the customer's problem.

Pre-SaaS era

Era 2

Per-seat subscription

SaaS era · recurring monthly/annual

Seat count became the dominant value metric because access was the scarcity: each human user needed a login. Predictable MRR, simple billing, and straightforward NRR expansion via seat adds.

Late 1990s–2020s

Era 3

Usage and outcome

AI era · metered consumption

AI agents perform work that scales independently of human headcount. Value is delivered in resolutions, tokens, completions, and generated outputs — not in access granted. Seat count becomes a poor proxy for value exchanged.

2020s+

The shift from Era 2 to Era 3 is not uniform across product categories. Infrastructure tools (AWS EC2 moved to per-second billing in 2017), API-native developer platforms (Twilio has charged per SMS sent since its founding), and database platforms (Snowflake has used a credit-based compute model since launch) were already in Era 3 before AI became a mass-market concern. What AI has done is pull every other SaaS category — CRM, customer support, code tooling, content platforms — into the same structural question: when the AI does the work, what is the right unit of value to charge for?

02 — LLM EconomicsLLMflation: the COGS curve that changes everything.

Guido Appenzeller of a16z documented what he called LLMflation in November 2024: for LLMs of equivalent performance, inference cost has been decreasing by roughly 10× every year. Over the three years from November 2021 to November 2024, the cost of running a model achieving an MMLU score of approximately 42 fell from $60 per million tokens (GPT-3) to $0.06 per million tokens (Llama 3.2 3B via Together.ai) — a 1,000× decline. At the frontier quality level (MMLU ~83), costs had fallen by approximately 62× since GPT-4's launch in March 2023, per the same research.

What this means for SaaS vendors is a two-sided exposure map. On one side, vendors who priced AI add-ons at a fixed monthly fee in 2023 may now be selling at 10× to 100× above their actual inference COGS — healthy margin, but potentially vulnerable to commoditisation pressure from competitors who pass those savings through. On the other side, vendors who locked in fixed-price enterprise contracts with unlimited AI usage face a very different problem if frontier model quality is required: the cost of the best-performing models has not followed the commodity deflation curve. OpenAI's o1 model at launch (as documented in the a16z LLMflation article, November 2024) had the same cost per output token as GPT-3 at its launch — $60 per million output tokens — evidence that frontier providers intentionally hold premium tier pricing even as the commodity tier collapses.

LLM inference cost trajectory · commodity vs frontier tiers

Sources: a16z LLMflation (Nov 2024); fact-pack.md verified API anchors (May 2026)

GPT-3 (Nov 2021)MMLU ~42 performance · commodity tier baseline

$60/Mtok

GPT-4 (Mar 2023)MMLU ~83 performance · frontier tier at launch

$~30/Mtok

Claude Sonnet 4.6 (May 2026)Input $3 / Output $15 per million tokens

$3–15/Mtok

GPT-5.5 standard (May 2026)Input $5 / Output $30 per million tokens (under 272K context)

$5–30/Mtok

Llama 3.2 3B via Together.ai (Nov 2024)MMLU ~42 equivalent · commodity tier today

$0.06/Mtok

The cost curve · a16z LLMflation

Andreessen Horowitz partner Guido Appenzeller's LLMflation analysis (a16z Infra blog, November 2024) framed the trajectory bluntly: for an LLM of equivalent performance, the cost is decreasing by 10x every year. That deflation is the structural backdrop for every usage-based pricing decision that follows.

The practical consequence for pricing design: if your AI product runs on commodity-tier inference, your COGS floor is falling faster than your pricing should. The question is whether to pass those savings to customers (competitive differentiation through lower prices), hold margin (profitable but vulnerable), or reinvest in capability and charge for outcomes rather than tokens. If your product runs on frontier models, the opposite risk applies — COGS may not deflate at the same rate as commodity, meaning an outcome-based pricing model that priced resolutions at $0.99 in 2024 could face margin compression if your underlying inference cost doesn't fall in line with the commodity curve.

This asymmetry between frontier and commodity inference cost trajectories is one of the least-discussed risks in AI product pricing. The decision matrix in Section 04 maps each archetype against COGS volatility to help identify which pricing model gives the most margin protection relative to the inference tier you actually operate on. For a deeper dive on current per-token costs across providers, see our AI agent deployment cost tracker.

03 — ArchetypesFour pricing archetypes — what each actually protects.

The SaaS pricing landscape in 2026 has converged on four primary archetypes. Most discussions frame these as a spectrum from "predictable" to "value-aligned" — but that framing obscures the more operationally important question: which archetype protects or exposes your gross margin relative to variable LLM inference COGS?

Archetype 1

Per-seat subscription

Fixed monthly or annual fee per human user. COGS is primarily infrastructure, not AI inference — because AI usage is either absent, capped, or absorbed as overhead. Gross margin is predictable, but the model breaks when AI usage scales independently of seat count.

Best for: low AI intensity

Archetype 2

Per-token / per-call PAYG

Direct metering of AI consumption — customers pay per token, per API call, or per model invocation. Revenue exactly tracks COGS. No margin risk on unused capacity, but revenue is maximally volatile and enterprise buyers resist the unpredictability.

Best for: API-native platforms

Archetype 3

Hybrid seat + usage credits

Seat fee covers platform access and a bundled usage allowance; additional usage is metered separately. Combines revenue predictability (the seat) with gross margin protection on high-usage customers (the overage). The dominant model at 46% of SaaS companies.

Best for: enterprise SaaS + AI overlay

Archetype 4

Outcome-based (per-resolution)

Customers pay per completed unit of AI work — per resolved ticket, per qualified lead, per generated document. All inference COGS is implicitly passed through. Pricing aligns with customer value maximally, but requires robust outcome definition, metering, and dispute resolution infrastructure.

Best for: high-volume AI agents

The value metric principle

The defining rule for any usage-based model: the value metric should reflect how customers extract value, not your internal infrastructure cost. A video transcription service should charge per completed transcription, not per CPU-second — even though CPU-seconds is what the vendor actually pays. Customers buy outcomes; vendors buy infrastructure. Pricing that exposes the infrastructure layer creates friction and erodes trust. Source: Metronome usage-based billing explainer, July 2025.

04 — Decision MatrixGross margin exposure by archetype — mapped.

The matrix below maps each archetype across five operational dimensions: COGS volatility risk, revenue predictability, bill-shock risk for buyers, metering infrastructure complexity, and the AI-intensity profile it best fits. No public framework has previously combined these dimensions for the AI inference era — most analyst coverage either focuses on buyer cost or addresses UBP adoption as an aggregate trend. The goal here is to give founders and product leaders the breakpoint logic, not just the archetypes.

Pure per-seat

COGS volatility

Low

No metered inference COGS — AI is absent, capped, or absorbed. Gross margin is predictable but the model structurally breaks when AI usage scales beyond what the seat fee can absorb. Highest risk: unlimited AI usage in a seat plan.

Revenue: high predictability

Pure PAYG (per-token)

Revenue volatility

High

Revenue exactly mirrors COGS variability — no lag, no smoothing. Margin is structurally protected because pricing tracks cost, but MRR forecasting requires usage-based FP&A that most finance teams don't have. Enterprise buyers often resist.

COGS risk: fully hedged

Hybrid (seat + credits)

Balance point

Med

The seat floor provides revenue predictability and covers base COGS. Overage credits capture high-usage expansion revenue while maintaining positive margin. The dominant real-world model at 46% of SaaS companies (OpenView, 2023).

46% of SaaS index

Outcome-based

Metering complexity

High

Outcome definition, tracking, and dispute resolution require significant infrastructure investment. Gross margin can be high if inference costs fall faster than outcome pricing. The frontier model — live at Intercom ($0.99/resolution) and Salesforce Agentforce (~$2/conversation).

Highest alignment

Gross margin COGS protection by pricing archetype

Source: Digital Applied analysis — OpenView UBP 2nd Edition (Feb 2023), a16z LLMflation (Nov 2024), Metronome blog (2025)

Pure per-seatCOGS protection: low — absorbs unlimited AI usage as overhead

Low

Seat + AI credits bundleCOGS protection: medium — seat floor + overage captures expansion

Medium

Per-token / per-call PAYGCOGS protection: high — revenue tracks cost exactly

High

Tiered credits with overageCOGS protection: medium-high — credit blocks smooth variability

Med-High

Outcome-based (per-resolution)COGS protection: high — all inference COGS passed through

Highest

Outcome + seat floorCOGS protection: highest — floor guarantees minimum margin

Highest

05 — Case StudiesHow real products have solved the inference-to-price translation.

Theory becomes tractable when mapped against specific implementations. The eight examples below span the full archetype spectrum and represent the most-cited reference points in the current SaaS pricing literature. They are useful not as templates to copy, but as existence proofs that each archetype is operable at scale — and as illustrations of where each breaks down.

Intercom Fin AI

Outcome-based with seat floor

Charges $0.99 per resolved customer interaction — all inference COGS passed through to buyer. Seat-based platform fees ($29/$85/$132 per seat/month by tier) provide the revenue floor. The most widely cited live outcome-based AI pricing implementation as of May 2026. Source: Intercom pricing page, retrieved May 2026.

Archetype: outcome + seat floor

Salesforce Agentforce

Hybrid seat + per-action usage

Approximately $2 per conversation or lead (from Metronome blog, March 2025 — verify at current Salesforce pricing). Existing Salesforce seat fees cover platform access; Agentforce consumption is metered on top. A textbook implementation of the hybrid archetype within an enterprise install base. Always verify current pricing before planning against it — Salesforce has iterated on Agentforce pricing multiple times.

Archetype: hybrid seat + usage

OpenAI API

Per-token PAYG

Pure per-token pricing — input and output tokens metered separately. GPT-5.5 is priced at $5 input / $30 output per million tokens (standard tier, under 272K input), with a long-context surcharge above that threshold. Token pricing is the purest form of PAYG for AI products because it directly mirrors inference COGS. Source: fact-pack.md verified anchors, May 2026.

Archetype: pure PAYG

Snowflake

Credit-based compute

Credits are charged per second of virtual warehouse activity. The credit abstraction shields customers from raw per-second pricing complexity while maintaining a direct relationship to compute consumption. The 'credit as currency' model has become a template for AI platform pricing. Source: Metronome usage-based billing explainer, July 2025.

Archetype: credit bundle with overage

The pattern across all four implementations is the same: the pricing unit is chosen to reflect the customer's experience of value (resolution, conversation, token, compute unit), not the vendor's internal infrastructure metric. Intercom customers understand a resolved ticket; they do not want to reason about underlying inference token counts. OpenAI's developer customers are the exception — they specifically want token-level visibility because tokens are their own COGS input. Knowing which type of customer you serve is prerequisite to choosing the right unit.

The infrastructure story behind these implementations is also converging. Stripe's acquisition of Metronome (announced 2026, with Sacra Research estimating approximately $1 billion — note: this is an analyst estimate, not a disclosed transaction figure) signals that metering infrastructure is now a core banking-stack requirement. Metronome was already handling metering for OpenAI and Anthropic at scale. For teams building on the CRM automation or AI transformation layer, this consolidation suggests that purpose-built billing infrastructure is increasingly available as a commodity, removing one of the traditional barriers to adopting UBP.

06 — Risk ManagementBill shock: a pricing design failure, not a customer problem.

In July 2025, a single Cursor developer generated a $7,225 invoice in one day when a team member exhausted 500 requests under an annually-billed plan. The X/Twitter post documenting the incident reportedly reached 797,000 views within a week, per Aakash Gupta's February 2026 analysis. The incident became a canonical case study in AI SaaS pricing risk — not because the billing was technically wrong, but because the pricing architecture created a situation where a single user's behavior could generate a five-figure invoice without any warning system intervening.

This is a design failure, not a customer education failure. The operational lesson is that usage-based pricing requires four architectural components that seat-based pricing does not:

Spend caps: hard limits on per-user, per-team, or per-billing-period usage that prevent runaway bills. Caps should be configurable at multiple levels, not just account-wide.
Real-time usage dashboards: customers and administrators need live visibility into consumption against their allowance or budget, not retroactive invoice surprises.
Threshold alerts: configurable notifications when usage crosses 50%, 75%, and 90% of allocated budget — analogous to AWS CloudWatch billing alarms.
Grace period policies: a written, publicly-visible policy on what happens when a user hits their cap — does usage stop, does cost per unit change, or is there a manual approval required? The answer should be the same every time.

Kyle Poyar, OpenView Partners

“The bigger risk to me is that companies just don't make the changes that ultimately are necessary. So I think the inertia leads to avoidance, which ultimately hurts the business more than maybe moving early and making a change.” — Metronome Webinar, March 2025. The same logic applies to pricing design: avoiding the hard architecture decisions around caps and dashboards does not avoid the bill-shock risk — it just defers it until a high-profile incident.

07 — InfrastructureMetering infrastructure: the hidden requirement for UBP at scale.

Usage-based pricing is not a product decision that implementation simply follows. It is simultaneously a product decision and an engineering commitment. The billing infrastructure required to run UBP reliably at scale — event ingestion, aggregation, deduplication, rate-limit enforcement, invoice reconciliation — is a non-trivial engineering investment that most SaaS companies underestimate when they first consider moving off flat-fee subscriptions.

AWS established the infrastructure template in 2017 by introducing per-second billing for EC2 instances, billing in 1-second intervals with a 60-second minimum. Handling that granularity at AWS's scale required an entirely separate billing infrastructure stack. Snowflake built a credit-based abstraction layer precisely to shield customers and internal billing systems from the complexity of per-second compute metering. Twilio has metered per SMS since its founding — building billing infrastructure that handles millions of small events per second became a core competency, not a peripheral concern.

Event ingestion

Usage event pipeline

Core

Every API call, token consumed, or resolved outcome must generate a timestamped event record. At scale, this means a dedicated event bus with guaranteed delivery — not a logging system. Missing events directly equal missing revenue.

Requirement: <100ms latency

Aggregation layer

Usage summarisation

Core

Raw events must be aggregated into billable units — per hour, per day, per billing period. Aggregation logic must handle retries, deduplication, and late arrivals without double-counting. This is where most in-house implementations fail first.

Deduplication required

Customer visibility

Real-time usage dashboards

Table

Buyers expect to see their consumption in real time. Retroactive invoice surprises are a churn risk. The investment in a usage dashboard is smaller than the cost of a single high-profile bill-shock incident. Stripe/Metronome provides this as a hosted component.

Stakes: trust + retention

The consolidation of metering infrastructure into Stripe's stack (via the Metronome acquisition, 2026) means that most SaaS vendors no longer need to build this in-house. Purpose-built billing platforms now handle millions of usage events per second as a managed service — the same infrastructure that OpenAI and Anthropic used when running their own metering at scale. The build-vs-buy calculus for metering infrastructure has shifted decisively toward buy for all but the largest and most idiosyncratic platforms.

For founders building ecommerce or SaaS products with AI features, the practical decision is: choose a metering platform before choosing a pricing archetype. The archetype you can sustainably operate is constrained by the granularity of usage data you can actually capture and bill against. Outcome-based pricing at $0.99 per resolution requires your infrastructure to definitively detect, count, and attribute resolutions in real time — a harder problem than counting tokens.

08 — Migration PlaybookMoving an existing install base — without destroying NRR.

The hardest UBP challenge is not designing the new pricing model — it is migrating an existing install base that has been budgeting against flat fees. Kyle Poyar of OpenView Partners described this directly in Metronome's March 2025 webinar: "No one really wanted to stick their neck out to totally transform the business. And it was very hard to pivot within a business that already had an install base where customers were paying a certain amount where they might pay dramatically different amounts, say with usage-based pricing."

The migration risk is asymmetric. Customers who are low AI users will likely pay more under outcome-based pricing than under a seat plan — they'll object. Customers who are high AI users may pay dramatically less under the new model — they'll be happy, but their lower spend will show up in your MRR before the expansion from new customers compensates. The practical playbook for avoiding NRR destruction during migration:

Grandfather for 12–18 months: existing customers stay on the old model for a defined period. New customers go straight to UBP. This buys time to validate that UBP generates comparable or better revenue without risking churn on current ARR.
Run a shadow billing period: meter usage on the new model for 60–90 days before switching billing. Show customers their projected bill under the new model before they receive it. This surfaces objections early and gives you data to set fair credit bundles.
Anchor to an equivalent value story: "you used to pay $X per seat per month; under the new model, average customers at your usage level pay $Y". The comparison must be to usage data from their own account, not to a theoretical average.
Build the dashboard before the migration: customers will only accept usage-based billing if they trust their ability to monitor and control it. Shipping the usage dashboard first, independently of billing, reduces resistance by separating "visibility" from "pricing risk" in the customer's mind.

The forward-looking picture for SaaS pricing is shaped by one observation: the 10×/year LLM inference deflation rate documented by a16z makes every fixed AI pricing decision a deprecating asset. Vendors who lock in outcome-based pricing today at $0.99 per resolution may find themselves with strong margins in 2026 and competitive pressure to cut prices in 2027 as commodity model alternatives become viable for their use case. The pricing architecture that best survives this dynamic is the one that maintains a flexible value metric — one that can be repriced without changing the fundamental contract with the customer. Hybrid seat + usage models have this property; pure outcome-based models do not. This is the underlying reason why hybrid dominates the current market at 46%, even though outcome-based is the theoretically superior alignment model.

For teams evaluating AI transformation strategy and its intersection with pricing design, our analytics practice and AI transformation engagements both address the metering data and pricing architecture questions together.

The shape of SaaS pricing, 2026

When AI does the work, the seat is no longer the unit of value.

The structural case for usage-based pricing in the AI era is not about ideology — it's about gross margin arithmetic. When your product's primary cost driver is variable LLM inference and your pricing model is a flat seat fee, you have an unhedged COGS position. If inference costs rise (frontier model quality requirements), your margin compresses. If inference costs fall (commodity deflation), you accrue margin that competitors can undercut. Neither outcome is stable.

The 61% UBP adoption figure from OpenView Partners (2023) and the 46% hybrid dominance tell the same story: the industry has already moved, but imperfectly. Hybrid seat + usage models have won the practical adoption contest because they preserve revenue predictability — the thing CFOs and sales teams are most reluctant to give up — while exposing enough usage data to protect margin on high-volume customers. Outcome-based models like Intercom's $0.99/resolution and Salesforce Agentforce's ~$2/conversation represent the frontier of alignment, but they require metering infrastructure and customer trust that most vendors are still building.

The decision matrix offered in this guide is a starting point, not a formula. Your inference-cost exposure, your customer's predictability tolerance, and your engineering capacity for metering infrastructure are all variables specific to your product. The right move is to run shadow billing against your actual usage data before committing to a new archetype — not to project from benchmarks that were set under different cost and competitive conditions.

SaaS Pricing in the AI Era: Seat, Usage, or Outcome?

01 — ContextThree pricing eras: perpetual, seat, and usage.

Perpetual license

Per-seat subscription

Usage and outcome

02 — LLM EconomicsLLMflation: the COGS curve that changes everything.

LLM inference cost trajectory · commodity vs frontier tiers

03 — ArchetypesFour pricing archetypes — what each actually protects.

Per-seat subscription

Per-token / per-call PAYG

Hybrid seat + usage credits

Outcome-based (per-resolution)

04 — Decision MatrixGross margin exposure by archetype — mapped.

COGS volatility

Revenue volatility

Balance point

Metering complexity

Gross margin COGS protection by pricing archetype

05 — Case StudiesHow real products have solved the inference-to-price translation.

Outcome-based with seat floor

Hybrid seat + per-action usage

Per-token PAYG

Credit-based compute

06 — Risk ManagementBill shock: a pricing design failure, not a customer problem.

07 — InfrastructureMetering infrastructure: the hidden requirement for UBP at scale.

Usage event pipeline

Usage summarisation

Real-time usage dashboards

08 — Migration PlaybookMoving an existing install base — without destroying NRR.

When AI does the work, the seat is no longer the unit of value.

Usage-based pricing that protects your margin.

SaaS pricing and AI economics

The questions we get every week.

Continue exploring SaaS economics.

Included or Metered? The New AI Pricing Divide in 2026

AI Unit Economics: Pricing & Margins for AI Services

Anthropic Files for IPO: What It Means for Claude Users

AI-Era Agency Pricing Models: A 2026 Decision Guide