Usage-based pricing is no longer a niche experiment for developer tools — it is the dominant architectural response to a structural shift in SaaS economics: when AI does the work, seat count stops being a proxy for value delivered, and variable LLM inference cost becomes a line item that can destroy gross margin if the pricing model isn't designed around it.
According to OpenView Partners' 2023 survey, 61% of SaaS companies had already adopted some form of usage-based pricing, with a further 21% planning to test it. But pure pay-as-you-go remains the minority position — only 15% of companies run a largely usage-based or PAYG model. Three times as many (46%) take a hybrid approach that bundles usage within seat plans or layers usage fees on top. That gap between theory and practice reflects the real operational tension: UBP aligns pricing with value, but it also introduces revenue unpredictability and, for buyers, bill-shock risk.
This guide covers the three pricing eras in SaaS, the LLM inference cost dynamic that is forcing the transition, the four archetypes available today, a gross margin exposure matrix mapping them against COGS volatility, real-world vendor implementations, and a migration playbook for founders deciding which model fits their product. Internal-link context: current LLM API token costs feed directly into the COGS exposure math — read that alongside this framework.
- 01Seat-based pricing has a structural AI problem.When AI handles resolutions, completions, and generated output, usage scales independently of human headcount. Charging per seat while absorbing variable LLM inference COGS as a vendor means your gross margin is exposed to a cost you don't control.
- 02LLM inference cost has dropped 1,000× in three years — but frontier pricing hasn't.Per a16z's LLMflation research, commodity-tier LLM inference cost fell from $60 per million tokens (GPT-3, November 2021) to $0.06 (Llama 3.2 3B via Together.ai, November 2024). Frontier models intentionally hold premium pricing — OpenAI's o1 launched at $60 per million output tokens, matching GPT-3's 2021 price.
- 03Hybrid models dominate: 46% vs 15% pure PAYG.OpenView Partners (February 2023, the most recent published survey) found 46% of SaaS companies use hybrid UBP — bundling usage within seat plans — versus only 15% running pure pay-as-you-go. Hybrid reduces revenue volatility while still aligning pricing with AI consumption.
- 04Outcome-based pricing is the frontier model — but carries the highest billing complexity.Intercom Fin charges $0.99 per resolved customer interaction. Salesforce Agentforce charges approximately $2 per conversation or lead. These outcome-based models pass through all inference COGS to the buyer, but require robust metering infrastructure to track outcomes reliably at scale.
- 05Bill shock is a pricing design failure, not a customer education problem.When Cursor generated a $7,225 invoice for a single developer in July 2025, the root cause was uncapped usage in an annually-billed plan — a metering architecture decision. Spend caps, usage dashboards, and alert thresholds are engineering requirements, not optional UX add-ons.
01 — ContextThree pricing eras: perpetual, seat, and usage.
Software pricing has moved through three distinct eras, each triggered by a shift in how customers access and consume software value. Understanding the arc matters because pricing transitions do not happen cleanly — most vendors today are navigating two eras simultaneously, with an existing seat-based install base and new AI-powered features that demand a different economic model.
Perpetual license
Software shipped as physical or digital media; customers paid once for a version and owned it. Upgrade revenue required a new sale cycle. Infrastructure was the customer's problem.
Per-seat subscription
Seat count became the dominant value metric because access was the scarcity: each human user needed a login. Predictable MRR, simple billing, and straightforward NRR expansion via seat adds.
Usage and outcome
AI agents perform work that scales independently of human headcount. Value is delivered in resolutions, tokens, completions, and generated outputs — not in access granted. Seat count becomes a poor proxy for value exchanged.
The shift from Era 2 to Era 3 is not uniform across product categories. Infrastructure tools (AWS EC2 moved to per-second billing in 2017), API-native developer platforms (Twilio has charged per SMS sent since its founding), and database platforms (Snowflake has used a credit-based compute model since launch) were already in Era 3 before AI became a mass-market concern. What AI has done is pull every other SaaS category — CRM, customer support, code tooling, content platforms — into the same structural question: when the AI does the work, what is the right unit of value to charge for?
02 — LLM EconomicsLLMflation: the COGS curve that changes everything.
Guido Appenzeller of a16z documented what he called LLMflation in November 2024: for LLMs of equivalent performance, inference cost has been decreasing by roughly 10× every year. Over the three years from November 2021 to November 2024, the cost of running a model achieving an MMLU score of approximately 42 fell from $60 per million tokens (GPT-3) to $0.06 per million tokens (Llama 3.2 3B via Together.ai) — a 1,000× decline. At the frontier quality level (MMLU ~83), costs had fallen by approximately 62× since GPT-4's launch in March 2023, per the same research.
What this means for SaaS vendors is a two-sided exposure map. On one side, vendors who priced AI add-ons at a fixed monthly fee in 2023 may now be selling at 10× to 100× above their actual inference COGS — healthy margin, but potentially vulnerable to commoditisation pressure from competitors who pass those savings through. On the other side, vendors who locked in fixed-price enterprise contracts with unlimited AI usage face a very different problem if frontier model quality is required: the cost of the best-performing models has not followed the commodity deflation curve. OpenAI's o1 model at launch (as documented in the a16z LLMflation article, November 2024) had the same cost per output token as GPT-3 at its launch — $60 per million output tokens — evidence that frontier providers intentionally hold premium tier pricing even as the commodity tier collapses.
LLM inference cost trajectory · commodity vs frontier tiers
Sources: a16z LLMflation (Nov 2024); fact-pack.md verified API anchors (May 2026)The practical consequence for pricing design: if your AI product runs on commodity-tier inference, your COGS floor is falling faster than your pricing should. The question is whether to pass those savings to customers (competitive differentiation through lower prices), hold margin (profitable but vulnerable), or reinvest in capability and charge for outcomes rather than tokens. If your product runs on frontier models, the opposite risk applies — COGS may not deflate at the same rate as commodity, meaning an outcome-based pricing model that priced resolutions at $0.99 in 2024 could face margin compression if your underlying inference cost doesn't fall in line with the commodity curve.
This asymmetry between frontier and commodity inference cost trajectories is one of the least-discussed risks in AI product pricing. The decision matrix in Section 04 maps each archetype against COGS volatility to help identify which pricing model gives the most margin protection relative to the inference tier you actually operate on. For a deeper dive on current per-token costs across providers, see our AI agent deployment cost tracker.
03 — ArchetypesFour pricing archetypes — what each actually protects.
The SaaS pricing landscape in 2026 has converged on four primary archetypes. Most discussions frame these as a spectrum from "predictable" to "value-aligned" — but that framing obscures the more operationally important question: which archetype protects or exposes your gross margin relative to variable LLM inference COGS?
Per-seat subscription
Fixed monthly or annual fee per human user. COGS is primarily infrastructure, not AI inference — because AI usage is either absent, capped, or absorbed as overhead. Gross margin is predictable, but the model breaks when AI usage scales independently of seat count.
Per-token / per-call PAYG
Direct metering of AI consumption — customers pay per token, per API call, or per model invocation. Revenue exactly tracks COGS. No margin risk on unused capacity, but revenue is maximally volatile and enterprise buyers resist the unpredictability.
Hybrid seat + usage credits
Seat fee covers platform access and a bundled usage allowance; additional usage is metered separately. Combines revenue predictability (the seat) with gross margin protection on high-usage customers (the overage). The dominant model at 46% of SaaS companies.
Outcome-based (per-resolution)
Customers pay per completed unit of AI work — per resolved ticket, per qualified lead, per generated document. All inference COGS is implicitly passed through. Pricing aligns with customer value maximally, but requires robust outcome definition, metering, and dispute resolution infrastructure.
04 — Decision MatrixGross margin exposure by archetype — mapped.
The matrix below maps each archetype across five operational dimensions: COGS volatility risk, revenue predictability, bill-shock risk for buyers, metering infrastructure complexity, and the AI-intensity profile it best fits. No public framework has previously combined these dimensions for the AI inference era — most analyst coverage either focuses on buyer cost or addresses UBP adoption as an aggregate trend. The goal here is to give founders and product leaders the breakpoint logic, not just the archetypes.
COGS volatility
No metered inference COGS — AI is absent, capped, or absorbed. Gross margin is predictable but the model structurally breaks when AI usage scales beyond what the seat fee can absorb. Highest risk: unlimited AI usage in a seat plan.
Revenue volatility
Revenue exactly mirrors COGS variability — no lag, no smoothing. Margin is structurally protected because pricing tracks cost, but MRR forecasting requires usage-based FP&A that most finance teams don't have. Enterprise buyers often resist.
Balance point
The seat floor provides revenue predictability and covers base COGS. Overage credits capture high-usage expansion revenue while maintaining positive margin. The dominant real-world model at 46% of SaaS companies (OpenView, 2023).
Metering complexity
Outcome definition, tracking, and dispute resolution require significant infrastructure investment. Gross margin can be high if inference costs fall faster than outcome pricing. The frontier model — live at Intercom ($0.99/resolution) and Salesforce Agentforce (~$2/conversation).
Gross margin COGS protection by pricing archetype
Source: Digital Applied analysis — OpenView UBP 2nd Edition (Feb 2023), a16z LLMflation (Nov 2024), Metronome blog (2025)05 — Case StudiesHow real products have solved the inference-to-price translation.
Theory becomes tractable when mapped against specific implementations. The eight examples below span the full archetype spectrum and represent the most-cited reference points in the current SaaS pricing literature. They are useful not as templates to copy, but as existence proofs that each archetype is operable at scale — and as illustrations of where each breaks down.
Outcome-based with seat floor
Charges $0.99 per resolved customer interaction — all inference COGS passed through to buyer. Seat-based platform fees ($29/$85/$132 per seat/month by tier) provide the revenue floor. The most widely cited live outcome-based AI pricing implementation as of May 2026. Source: Intercom pricing page, retrieved May 2026.
Hybrid seat + per-action usage
Approximately $2 per conversation or lead (from Metronome blog, March 2025 — verify at current Salesforce pricing). Existing Salesforce seat fees cover platform access; Agentforce consumption is metered on top. A textbook implementation of the hybrid archetype within an enterprise install base. Always verify current pricing before planning against it — Salesforce has iterated on Agentforce pricing multiple times.
Per-token PAYG
Pure per-token pricing — input and output tokens metered separately. GPT-5.5 is priced at $5 input / $30 output per million tokens (standard tier, under 272K input), with a long-context surcharge above that threshold. Token pricing is the purest form of PAYG for AI products because it directly mirrors inference COGS. Source: fact-pack.md verified anchors, May 2026.
Credit-based compute
Credits are charged per second of virtual warehouse activity. The credit abstraction shields customers from raw per-second pricing complexity while maintaining a direct relationship to compute consumption. The 'credit as currency' model has become a template for AI platform pricing. Source: Metronome usage-based billing explainer, July 2025.
The pattern across all four implementations is the same: the pricing unit is chosen to reflect the customer's experience of value (resolution, conversation, token, compute unit), not the vendor's internal infrastructure metric. Intercom customers understand a resolved ticket; they do not want to reason about underlying inference token counts. OpenAI's developer customers are the exception — they specifically want token-level visibility because tokens are their own COGS input. Knowing which type of customer you serve is prerequisite to choosing the right unit.
The infrastructure story behind these implementations is also converging. Stripe's acquisition of Metronome (announced 2026, with Sacra Research estimating approximately $1 billion — note: this is an analyst estimate, not a disclosed transaction figure) signals that metering infrastructure is now a core banking-stack requirement. Metronome was already handling metering for OpenAI and Anthropic at scale. For teams building on the CRM automation or AI transformation layer, this consolidation suggests that purpose-built billing infrastructure is increasingly available as a commodity, removing one of the traditional barriers to adopting UBP.
06 — Risk ManagementBill shock: a pricing design failure, not a customer problem.
In July 2025, a single Cursor developer generated a $7,225 invoice in one day when a team member exhausted 500 requests under an annually-billed plan. The X/Twitter post documenting the incident reportedly reached 797,000 views within a week, per Aakash Gupta's February 2026 analysis. The incident became a canonical case study in AI SaaS pricing risk — not because the billing was technically wrong, but because the pricing architecture created a situation where a single user's behavior could generate a five-figure invoice without any warning system intervening.
This is a design failure, not a customer education failure. The operational lesson is that usage-based pricing requires four architectural components that seat-based pricing does not:
- Spend caps: hard limits on per-user, per-team, or per-billing-period usage that prevent runaway bills. Caps should be configurable at multiple levels, not just account-wide.
- Real-time usage dashboards: customers and administrators need live visibility into consumption against their allowance or budget, not retroactive invoice surprises.
- Threshold alerts: configurable notifications when usage crosses 50%, 75%, and 90% of allocated budget — analogous to AWS CloudWatch billing alarms.
- Grace period policies: a written, publicly-visible policy on what happens when a user hits their cap — does usage stop, does cost per unit change, or is there a manual approval required? The answer should be the same every time.
07 — InfrastructureMetering infrastructure: the hidden requirement for UBP at scale.
Usage-based pricing is not a product decision that implementation simply follows. It is simultaneously a product decision and an engineering commitment. The billing infrastructure required to run UBP reliably at scale — event ingestion, aggregation, deduplication, rate-limit enforcement, invoice reconciliation — is a non-trivial engineering investment that most SaaS companies underestimate when they first consider moving off flat-fee subscriptions.
AWS established the infrastructure template in 2017 by introducing per-second billing for EC2 instances, billing in 1-second intervals with a 60-second minimum. Handling that granularity at AWS's scale required an entirely separate billing infrastructure stack. Snowflake built a credit-based abstraction layer precisely to shield customers and internal billing systems from the complexity of per-second compute metering. Twilio has metered per SMS since its founding — building billing infrastructure that handles millions of small events per second became a core competency, not a peripheral concern.
Usage event pipeline
Every API call, token consumed, or resolved outcome must generate a timestamped event record. At scale, this means a dedicated event bus with guaranteed delivery — not a logging system. Missing events directly equal missing revenue.
Usage summarisation
Raw events must be aggregated into billable units — per hour, per day, per billing period. Aggregation logic must handle retries, deduplication, and late arrivals without double-counting. This is where most in-house implementations fail first.
Real-time usage dashboards
Buyers expect to see their consumption in real time. Retroactive invoice surprises are a churn risk. The investment in a usage dashboard is smaller than the cost of a single high-profile bill-shock incident. Stripe/Metronome provides this as a hosted component.
The consolidation of metering infrastructure into Stripe's stack (via the Metronome acquisition, 2026) means that most SaaS vendors no longer need to build this in-house. Purpose-built billing platforms now handle millions of usage events per second as a managed service — the same infrastructure that OpenAI and Anthropic used when running their own metering at scale. The build-vs-buy calculus for metering infrastructure has shifted decisively toward buy for all but the largest and most idiosyncratic platforms.
For founders building ecommerce or SaaS products with AI features, the practical decision is: choose a metering platform before choosing a pricing archetype. The archetype you can sustainably operate is constrained by the granularity of usage data you can actually capture and bill against. Outcome-based pricing at $0.99 per resolution requires your infrastructure to definitively detect, count, and attribute resolutions in real time — a harder problem than counting tokens.
08 — Migration PlaybookMoving an existing install base — without destroying NRR.
The hardest UBP challenge is not designing the new pricing model — it is migrating an existing install base that has been budgeting against flat fees. Kyle Poyar of OpenView Partners described this directly in Metronome's March 2025 webinar: "No one really wanted to stick their neck out to totally transform the business. And it was very hard to pivot within a business that already had an install base where customers were paying a certain amount where they might pay dramatically different amounts, say with usage-based pricing."
The migration risk is asymmetric. Customers who are low AI users will likely pay more under outcome-based pricing than under a seat plan — they'll object. Customers who are high AI users may pay dramatically less under the new model — they'll be happy, but their lower spend will show up in your MRR before the expansion from new customers compensates. The practical playbook for avoiding NRR destruction during migration:
- Grandfather for 12–18 months: existing customers stay on the old model for a defined period. New customers go straight to UBP. This buys time to validate that UBP generates comparable or better revenue without risking churn on current ARR.
- Run a shadow billing period: meter usage on the new model for 60–90 days before switching billing. Show customers their projected bill under the new model before they receive it. This surfaces objections early and gives you data to set fair credit bundles.
- Anchor to an equivalent value story:"you used to pay $X per seat per month; under the new model, average customers at your usage level pay $Y". The comparison must be to usage data from their own account, not to a theoretical average.
- Build the dashboard before the migration: customers will only accept usage-based billing if they trust their ability to monitor and control it. Shipping the usage dashboard first, independently of billing, reduces resistance by separating "visibility" from "pricing risk" in the customer's mind.
The forward-looking picture for SaaS pricing is shaped by one observation: the 10×/year LLM inference deflation rate documented by a16z makes every fixed AI pricing decision a deprecating asset. Vendors who lock in outcome-based pricing today at $0.99 per resolution may find themselves with strong margins in 2026 and competitive pressure to cut prices in 2027 as commodity model alternatives become viable for their use case. The pricing architecture that best survives this dynamic is the one that maintains a flexible value metric — one that can be repriced without changing the fundamental contract with the customer. Hybrid seat + usage models have this property; pure outcome-based models do not. This is the underlying reason why hybrid dominates the current market at 46%, even though outcome-based is the theoretically superior alignment model.
For teams evaluating AI transformation strategy and its intersection with pricing design, our analytics practice and AI transformation engagements both address the metering data and pricing architecture questions together.
When AI does the work, the seat is no longer the unit of value.
The structural case for usage-based pricing in the AI era is not about ideology — it's about gross margin arithmetic. When your product's primary cost driver is variable LLM inference and your pricing model is a flat seat fee, you have an unhedged COGS position. If inference costs rise (frontier model quality requirements), your margin compresses. If inference costs fall (commodity deflation), you accrue margin that competitors can undercut. Neither outcome is stable.
The 61% UBP adoption figure from OpenView Partners (2023) and the 46% hybrid dominance tell the same story: the industry has already moved, but imperfectly. Hybrid seat + usage models have won the practical adoption contest because they preserve revenue predictability — the thing CFOs and sales teams are most reluctant to give up — while exposing enough usage data to protect margin on high-volume customers. Outcome-based models like Intercom's $0.99/resolution and Salesforce Agentforce's ~$2/conversation represent the frontier of alignment, but they require metering infrastructure and customer trust that most vendors are still building.
The decision matrix offered in this guide is a starting point, not a formula. Your inference-cost exposure, your customer's predictability tolerance, and your engineering capacity for metering infrastructure are all variables specific to your product. The right move is to run shadow billing against your actual usage data before committing to a new archetype — not to project from benchmarks that were set under different cost and competitive conditions.