BusinessDecision Matrix14 min readPublished May 26, 2026

Seat vs usage vs outcome · COGS exposure mapped · four archetypes compared

SaaS Pricing in the AI Era: Seat, Usage, or Outcome?

LLM inference costs are now a variable COGS item, and traditional seat-based models can destroy gross margin when AI does the work. According to OpenView Partners, 61% of SaaS companies had adopted some form of usage-based pricing by end of 2023 — but hybrid models (46%) dominate over pure pay-as-you-go (15%). The right archetype depends on your inference-cost exposure and how much billing volatility your customers will tolerate.

DA
Digital Applied Team
Senior strategists · Published May 26, 2026
PublishedMay 26, 2026
Read time14 min
SourcesOpenView, a16z, Metronome
SaaS companies using UBP
61%
of SaaS index (2023)
+21% planning adoption
Hybrid UBP dominance
46%
seat + usage overlay
vs 15% pure PAYG
LLM inference cost drop
1,000×
over 3 years (a16z)
10× per year trend
Intercom Fin AI price
$0.99
per resolved outcome
outcome-based live model

Usage-based pricing is no longer a niche experiment for developer tools — it is the dominant architectural response to a structural shift in SaaS economics: when AI does the work, seat count stops being a proxy for value delivered, and variable LLM inference cost becomes a line item that can destroy gross margin if the pricing model isn't designed around it.

According to OpenView Partners' 2023 survey, 61% of SaaS companies had already adopted some form of usage-based pricing, with a further 21% planning to test it. But pure pay-as-you-go remains the minority position — only 15% of companies run a largely usage-based or PAYG model. Three times as many (46%) take a hybrid approach that bundles usage within seat plans or layers usage fees on top. That gap between theory and practice reflects the real operational tension: UBP aligns pricing with value, but it also introduces revenue unpredictability and, for buyers, bill-shock risk.

This guide covers the three pricing eras in SaaS, the LLM inference cost dynamic that is forcing the transition, the four archetypes available today, a gross margin exposure matrix mapping them against COGS volatility, real-world vendor implementations, and a migration playbook for founders deciding which model fits their product. Internal-link context: current LLM API token costs feed directly into the COGS exposure math — read that alongside this framework.

Key takeaways
  1. 01
    Seat-based pricing has a structural AI problem.When AI handles resolutions, completions, and generated output, usage scales independently of human headcount. Charging per seat while absorbing variable LLM inference COGS as a vendor means your gross margin is exposed to a cost you don't control.
  2. 02
    LLM inference cost has dropped 1,000× in three years — but frontier pricing hasn't.Per a16z's LLMflation research, commodity-tier LLM inference cost fell from $60 per million tokens (GPT-3, November 2021) to $0.06 (Llama 3.2 3B via Together.ai, November 2024). Frontier models intentionally hold premium pricing — OpenAI's o1 launched at $60 per million output tokens, matching GPT-3's 2021 price.
  3. 03
    Hybrid models dominate: 46% vs 15% pure PAYG.OpenView Partners (February 2023, the most recent published survey) found 46% of SaaS companies use hybrid UBP — bundling usage within seat plans — versus only 15% running pure pay-as-you-go. Hybrid reduces revenue volatility while still aligning pricing with AI consumption.
  4. 04
    Outcome-based pricing is the frontier model — but carries the highest billing complexity.Intercom Fin charges $0.99 per resolved customer interaction. Salesforce Agentforce charges approximately $2 per conversation or lead. These outcome-based models pass through all inference COGS to the buyer, but require robust metering infrastructure to track outcomes reliably at scale.
  5. 05
    Bill shock is a pricing design failure, not a customer education problem.When Cursor generated a $7,225 invoice for a single developer in July 2025, the root cause was uncapped usage in an annually-billed plan — a metering architecture decision. Spend caps, usage dashboards, and alert thresholds are engineering requirements, not optional UX add-ons.

01ContextThree pricing eras: perpetual, seat, and usage.

Software pricing has moved through three distinct eras, each triggered by a shift in how customers access and consume software value. Understanding the arc matters because pricing transitions do not happen cleanly — most vendors today are navigating two eras simultaneously, with an existing seat-based install base and new AI-powered features that demand a different economic model.

Era 1
Perpetual license
On-premise · one-time fee

Software shipped as physical or digital media; customers paid once for a version and owned it. Upgrade revenue required a new sale cycle. Infrastructure was the customer's problem.

Pre-SaaS era
Era 2
Per-seat subscription
SaaS era · recurring monthly/annual

Seat count became the dominant value metric because access was the scarcity: each human user needed a login. Predictable MRR, simple billing, and straightforward NRR expansion via seat adds.

Late 1990s–2020s
Era 3
Usage and outcome
AI era · metered consumption

AI agents perform work that scales independently of human headcount. Value is delivered in resolutions, tokens, completions, and generated outputs — not in access granted. Seat count becomes a poor proxy for value exchanged.

2020s+

The shift from Era 2 to Era 3 is not uniform across product categories. Infrastructure tools (AWS EC2 moved to per-second billing in 2017), API-native developer platforms (Twilio has charged per SMS sent since its founding), and database platforms (Snowflake has used a credit-based compute model since launch) were already in Era 3 before AI became a mass-market concern. What AI has done is pull every other SaaS category — CRM, customer support, code tooling, content platforms — into the same structural question: when the AI does the work, what is the right unit of value to charge for?

02LLM EconomicsLLMflation: the COGS curve that changes everything.

Guido Appenzeller of a16z documented what he called LLMflation in November 2024: for LLMs of equivalent performance, inference cost has been decreasing by roughly 10× every year. Over the three years from November 2021 to November 2024, the cost of running a model achieving an MMLU score of approximately 42 fell from $60 per million tokens (GPT-3) to $0.06 per million tokens (Llama 3.2 3B via Together.ai) — a 1,000× decline. At the frontier quality level (MMLU ~83), costs had fallen by approximately 62× since GPT-4's launch in March 2023, per the same research.

What this means for SaaS vendors is a two-sided exposure map. On one side, vendors who priced AI add-ons at a fixed monthly fee in 2023 may now be selling at 10× to 100× above their actual inference COGS — healthy margin, but potentially vulnerable to commoditisation pressure from competitors who pass those savings through. On the other side, vendors who locked in fixed-price enterprise contracts with unlimited AI usage face a very different problem if frontier model quality is required: the cost of the best-performing models has not followed the commodity deflation curve. OpenAI's o1 model at launch (as documented in the a16z LLMflation article, November 2024) had the same cost per output token as GPT-3 at its launch — $60 per million output tokens — evidence that frontier providers intentionally hold premium tier pricing even as the commodity tier collapses.

LLM inference cost trajectory · commodity vs frontier tiers

Sources: a16z LLMflation (Nov 2024); fact-pack.md verified API anchors (May 2026)
GPT-3 (Nov 2021)MMLU ~42 performance · commodity tier baseline
$60/Mtok
GPT-4 (Mar 2023)MMLU ~83 performance · frontier tier at launch
$~30/Mtok
Claude Sonnet 4.6 (May 2026)Input $3 / Output $15 per million tokens
$3–15/Mtok
GPT-5.5 standard (May 2026)Input $5 / Output $30 per million tokens (under 272K context)
$5–30/Mtok
Llama 3.2 3B via Together.ai (Nov 2024)MMLU ~42 equivalent · commodity tier today
$0.06/Mtok
The cost curve · a16z LLMflation
Andreessen Horowitz partner Guido Appenzeller's LLMflation analysis (a16z Infra blog, November 2024) framed the trajectory bluntly: for an LLM of equivalent performance, the cost is decreasing by 10x every year. That deflation is the structural backdrop for every usage-based pricing decision that follows.

The practical consequence for pricing design: if your AI product runs on commodity-tier inference, your COGS floor is falling faster than your pricing should. The question is whether to pass those savings to customers (competitive differentiation through lower prices), hold margin (profitable but vulnerable), or reinvest in capability and charge for outcomes rather than tokens. If your product runs on frontier models, the opposite risk applies — COGS may not deflate at the same rate as commodity, meaning an outcome-based pricing model that priced resolutions at $0.99 in 2024 could face margin compression if your underlying inference cost doesn't fall in line with the commodity curve.

This asymmetry between frontier and commodity inference cost trajectories is one of the least-discussed risks in AI product pricing. The decision matrix in Section 04 maps each archetype against COGS volatility to help identify which pricing model gives the most margin protection relative to the inference tier you actually operate on. For a deeper dive on current per-token costs across providers, see our AI agent deployment cost tracker.

03ArchetypesFour pricing archetypes — what each actually protects.

The SaaS pricing landscape in 2026 has converged on four primary archetypes. Most discussions frame these as a spectrum from "predictable" to "value-aligned" — but that framing obscures the more operationally important question: which archetype protects or exposes your gross margin relative to variable LLM inference COGS?

Archetype 1
Per-seat subscription

Fixed monthly or annual fee per human user. COGS is primarily infrastructure, not AI inference — because AI usage is either absent, capped, or absorbed as overhead. Gross margin is predictable, but the model breaks when AI usage scales independently of seat count.

Best for: low AI intensity
Archetype 2
Per-token / per-call PAYG

Direct metering of AI consumption — customers pay per token, per API call, or per model invocation. Revenue exactly tracks COGS. No margin risk on unused capacity, but revenue is maximally volatile and enterprise buyers resist the unpredictability.

Best for: API-native platforms
Archetype 3
Hybrid seat + usage credits

Seat fee covers platform access and a bundled usage allowance; additional usage is metered separately. Combines revenue predictability (the seat) with gross margin protection on high-usage customers (the overage). The dominant model at 46% of SaaS companies.

Best for: enterprise SaaS + AI overlay
Archetype 4
Outcome-based (per-resolution)

Customers pay per completed unit of AI work — per resolved ticket, per qualified lead, per generated document. All inference COGS is implicitly passed through. Pricing aligns with customer value maximally, but requires robust outcome definition, metering, and dispute resolution infrastructure.

Best for: high-volume AI agents
The value metric principle
The defining rule for any usage-based model: the value metric should reflect how customers extract value, not your internal infrastructure cost. A video transcription service should charge per completed transcription, not per CPU-second — even though CPU-seconds is what the vendor actually pays. Customers buy outcomes; vendors buy infrastructure. Pricing that exposes the infrastructure layer creates friction and erodes trust. Source: Metronome usage-based billing explainer, July 2025.

04Decision MatrixGross margin exposure by archetype — mapped.

The matrix below maps each archetype across five operational dimensions: COGS volatility risk, revenue predictability, bill-shock risk for buyers, metering infrastructure complexity, and the AI-intensity profile it best fits. No public framework has previously combined these dimensions for the AI inference era — most analyst coverage either focuses on buyer cost or addresses UBP adoption as an aggregate trend. The goal here is to give founders and product leaders the breakpoint logic, not just the archetypes.

Pure per-seat
COGS volatility
Low

No metered inference COGS — AI is absent, capped, or absorbed. Gross margin is predictable but the model structurally breaks when AI usage scales beyond what the seat fee can absorb. Highest risk: unlimited AI usage in a seat plan.

Revenue: high predictability
Pure PAYG (per-token)
Revenue volatility
High

Revenue exactly mirrors COGS variability — no lag, no smoothing. Margin is structurally protected because pricing tracks cost, but MRR forecasting requires usage-based FP&A that most finance teams don't have. Enterprise buyers often resist.

COGS risk: fully hedged
Hybrid (seat + credits)
Balance point
Med

The seat floor provides revenue predictability and covers base COGS. Overage credits capture high-usage expansion revenue while maintaining positive margin. The dominant real-world model at 46% of SaaS companies (OpenView, 2023).

46% of SaaS index
Outcome-based
Metering complexity
High

Outcome definition, tracking, and dispute resolution require significant infrastructure investment. Gross margin can be high if inference costs fall faster than outcome pricing. The frontier model — live at Intercom ($0.99/resolution) and Salesforce Agentforce (~$2/conversation).

Highest alignment

Gross margin COGS protection by pricing archetype

Source: Digital Applied analysis — OpenView UBP 2nd Edition (Feb 2023), a16z LLMflation (Nov 2024), Metronome blog (2025)
Pure per-seatCOGS protection: low — absorbs unlimited AI usage as overhead
Low
Seat + AI credits bundleCOGS protection: medium — seat floor + overage captures expansion
Medium
Per-token / per-call PAYGCOGS protection: high — revenue tracks cost exactly
High
Tiered credits with overageCOGS protection: medium-high — credit blocks smooth variability
Med-High
Outcome-based (per-resolution)COGS protection: high — all inference COGS passed through
Highest
Outcome + seat floorCOGS protection: highest — floor guarantees minimum margin
Highest

05Case StudiesHow real products have solved the inference-to-price translation.

Theory becomes tractable when mapped against specific implementations. The eight examples below span the full archetype spectrum and represent the most-cited reference points in the current SaaS pricing literature. They are useful not as templates to copy, but as existence proofs that each archetype is operable at scale — and as illustrations of where each breaks down.

Intercom Fin AI
Outcome-based with seat floor

Charges $0.99 per resolved customer interaction — all inference COGS passed through to buyer. Seat-based platform fees ($29/$85/$132 per seat/month by tier) provide the revenue floor. The most widely cited live outcome-based AI pricing implementation as of May 2026. Source: Intercom pricing page, retrieved May 2026.

Archetype: outcome + seat floor
Salesforce Agentforce
Hybrid seat + per-action usage

Approximately $2 per conversation or lead (from Metronome blog, March 2025 — verify at current Salesforce pricing). Existing Salesforce seat fees cover platform access; Agentforce consumption is metered on top. A textbook implementation of the hybrid archetype within an enterprise install base. Always verify current pricing before planning against it — Salesforce has iterated on Agentforce pricing multiple times.

Archetype: hybrid seat + usage
OpenAI API
Per-token PAYG

Pure per-token pricing — input and output tokens metered separately. GPT-5.5 is priced at $5 input / $30 output per million tokens (standard tier, under 272K input), with a long-context surcharge above that threshold. Token pricing is the purest form of PAYG for AI products because it directly mirrors inference COGS. Source: fact-pack.md verified anchors, May 2026.

Archetype: pure PAYG
Snowflake
Credit-based compute

Credits are charged per second of virtual warehouse activity. The credit abstraction shields customers from raw per-second pricing complexity while maintaining a direct relationship to compute consumption. The 'credit as currency' model has become a template for AI platform pricing. Source: Metronome usage-based billing explainer, July 2025.

Archetype: credit bundle with overage

The pattern across all four implementations is the same: the pricing unit is chosen to reflect the customer's experience of value (resolution, conversation, token, compute unit), not the vendor's internal infrastructure metric. Intercom customers understand a resolved ticket; they do not want to reason about underlying inference token counts. OpenAI's developer customers are the exception — they specifically want token-level visibility because tokens are their own COGS input. Knowing which type of customer you serve is prerequisite to choosing the right unit.

The infrastructure story behind these implementations is also converging. Stripe's acquisition of Metronome (announced 2026, with Sacra Research estimating approximately $1 billion — note: this is an analyst estimate, not a disclosed transaction figure) signals that metering infrastructure is now a core banking-stack requirement. Metronome was already handling metering for OpenAI and Anthropic at scale. For teams building on the CRM automation or AI transformation layer, this consolidation suggests that purpose-built billing infrastructure is increasingly available as a commodity, removing one of the traditional barriers to adopting UBP.

06Risk ManagementBill shock: a pricing design failure, not a customer problem.

In July 2025, a single Cursor developer generated a $7,225 invoice in one day when a team member exhausted 500 requests under an annually-billed plan. The X/Twitter post documenting the incident reportedly reached 797,000 views within a week, per Aakash Gupta's February 2026 analysis. The incident became a canonical case study in AI SaaS pricing risk — not because the billing was technically wrong, but because the pricing architecture created a situation where a single user's behavior could generate a five-figure invoice without any warning system intervening.

This is a design failure, not a customer education failure. The operational lesson is that usage-based pricing requires four architectural components that seat-based pricing does not:

  • Spend caps: hard limits on per-user, per-team, or per-billing-period usage that prevent runaway bills. Caps should be configurable at multiple levels, not just account-wide.
  • Real-time usage dashboards: customers and administrators need live visibility into consumption against their allowance or budget, not retroactive invoice surprises.
  • Threshold alerts: configurable notifications when usage crosses 50%, 75%, and 90% of allocated budget — analogous to AWS CloudWatch billing alarms.
  • Grace period policies: a written, publicly-visible policy on what happens when a user hits their cap — does usage stop, does cost per unit change, or is there a manual approval required? The answer should be the same every time.
Kyle Poyar, OpenView Partners
“The bigger risk to me is that companies just don't make the changes that ultimately are necessary. So I think the inertia leads to avoidance, which ultimately hurts the business more than maybe moving early and making a change.” — Metronome Webinar, March 2025. The same logic applies to pricing design: avoiding the hard architecture decisions around caps and dashboards does not avoid the bill-shock risk — it just defers it until a high-profile incident.

07InfrastructureMetering infrastructure: the hidden requirement for UBP at scale.

Usage-based pricing is not a product decision that implementation simply follows. It is simultaneously a product decision and an engineering commitment. The billing infrastructure required to run UBP reliably at scale — event ingestion, aggregation, deduplication, rate-limit enforcement, invoice reconciliation — is a non-trivial engineering investment that most SaaS companies underestimate when they first consider moving off flat-fee subscriptions.

AWS established the infrastructure template in 2017 by introducing per-second billing for EC2 instances, billing in 1-second intervals with a 60-second minimum. Handling that granularity at AWS's scale required an entirely separate billing infrastructure stack. Snowflake built a credit-based abstraction layer precisely to shield customers and internal billing systems from the complexity of per-second compute metering. Twilio has metered per SMS since its founding — building billing infrastructure that handles millions of small events per second became a core competency, not a peripheral concern.

Event ingestion
Usage event pipeline
Core

Every API call, token consumed, or resolved outcome must generate a timestamped event record. At scale, this means a dedicated event bus with guaranteed delivery — not a logging system. Missing events directly equal missing revenue.

Requirement: <100ms latency
Aggregation layer
Usage summarisation
Core

Raw events must be aggregated into billable units — per hour, per day, per billing period. Aggregation logic must handle retries, deduplication, and late arrivals without double-counting. This is where most in-house implementations fail first.

Deduplication required
Customer visibility
Real-time usage dashboards
Table

Buyers expect to see their consumption in real time. Retroactive invoice surprises are a churn risk. The investment in a usage dashboard is smaller than the cost of a single high-profile bill-shock incident. Stripe/Metronome provides this as a hosted component.

Stakes: trust + retention

The consolidation of metering infrastructure into Stripe's stack (via the Metronome acquisition, 2026) means that most SaaS vendors no longer need to build this in-house. Purpose-built billing platforms now handle millions of usage events per second as a managed service — the same infrastructure that OpenAI and Anthropic used when running their own metering at scale. The build-vs-buy calculus for metering infrastructure has shifted decisively toward buy for all but the largest and most idiosyncratic platforms.

For founders building ecommerce or SaaS products with AI features, the practical decision is: choose a metering platform before choosing a pricing archetype. The archetype you can sustainably operate is constrained by the granularity of usage data you can actually capture and bill against. Outcome-based pricing at $0.99 per resolution requires your infrastructure to definitively detect, count, and attribute resolutions in real time — a harder problem than counting tokens.

08Migration PlaybookMoving an existing install base — without destroying NRR.

The hardest UBP challenge is not designing the new pricing model — it is migrating an existing install base that has been budgeting against flat fees. Kyle Poyar of OpenView Partners described this directly in Metronome's March 2025 webinar: "No one really wanted to stick their neck out to totally transform the business. And it was very hard to pivot within a business that already had an install base where customers were paying a certain amount where they might pay dramatically different amounts, say with usage-based pricing."

The migration risk is asymmetric. Customers who are low AI users will likely pay more under outcome-based pricing than under a seat plan — they'll object. Customers who are high AI users may pay dramatically less under the new model — they'll be happy, but their lower spend will show up in your MRR before the expansion from new customers compensates. The practical playbook for avoiding NRR destruction during migration:

  • Grandfather for 12–18 months: existing customers stay on the old model for a defined period. New customers go straight to UBP. This buys time to validate that UBP generates comparable or better revenue without risking churn on current ARR.
  • Run a shadow billing period: meter usage on the new model for 60–90 days before switching billing. Show customers their projected bill under the new model before they receive it. This surfaces objections early and gives you data to set fair credit bundles.
  • Anchor to an equivalent value story:"you used to pay $X per seat per month; under the new model, average customers at your usage level pay $Y". The comparison must be to usage data from their own account, not to a theoretical average.
  • Build the dashboard before the migration: customers will only accept usage-based billing if they trust their ability to monitor and control it. Shipping the usage dashboard first, independently of billing, reduces resistance by separating "visibility" from "pricing risk" in the customer's mind.

The forward-looking picture for SaaS pricing is shaped by one observation: the 10×/year LLM inference deflation rate documented by a16z makes every fixed AI pricing decision a deprecating asset. Vendors who lock in outcome-based pricing today at $0.99 per resolution may find themselves with strong margins in 2026 and competitive pressure to cut prices in 2027 as commodity model alternatives become viable for their use case. The pricing architecture that best survives this dynamic is the one that maintains a flexible value metric — one that can be repriced without changing the fundamental contract with the customer. Hybrid seat + usage models have this property; pure outcome-based models do not. This is the underlying reason why hybrid dominates the current market at 46%, even though outcome-based is the theoretically superior alignment model.

For teams evaluating AI transformation strategy and its intersection with pricing design, our analytics practice and AI transformation engagements both address the metering data and pricing architecture questions together.

The shape of SaaS pricing, 2026

When AI does the work, the seat is no longer the unit of value.

The structural case for usage-based pricing in the AI era is not about ideology — it's about gross margin arithmetic. When your product's primary cost driver is variable LLM inference and your pricing model is a flat seat fee, you have an unhedged COGS position. If inference costs rise (frontier model quality requirements), your margin compresses. If inference costs fall (commodity deflation), you accrue margin that competitors can undercut. Neither outcome is stable.

The 61% UBP adoption figure from OpenView Partners (2023) and the 46% hybrid dominance tell the same story: the industry has already moved, but imperfectly. Hybrid seat + usage models have won the practical adoption contest because they preserve revenue predictability — the thing CFOs and sales teams are most reluctant to give up — while exposing enough usage data to protect margin on high-volume customers. Outcome-based models like Intercom's $0.99/resolution and Salesforce Agentforce's ~$2/conversation represent the frontier of alignment, but they require metering infrastructure and customer trust that most vendors are still building.

The decision matrix offered in this guide is a starting point, not a formula. Your inference-cost exposure, your customer's predictability tolerance, and your engineering capacity for metering infrastructure are all variables specific to your product. The right move is to run shadow billing against your actual usage data before committing to a new archetype — not to project from benchmarks that were set under different cost and competitive conditions.

AI pricing architecture

Usage-based pricing that protects your margin.

We help SaaS founders and product teams design pricing architectures that align with AI inference economics — from metering infrastructure selection to migration playbooks for existing install bases.

Free consultationExpert guidanceTailored solutions
What we work on

SaaS pricing and AI economics

  • Pricing archetype selection — seat, usage, outcome, hybrid
  • Metering infrastructure design and vendor selection
  • Gross margin exposure analysis under LLM COGS variability
  • Migration playbooks for existing seat-based install bases
  • Spend cap and usage dashboard requirements
FAQ · SaaS pricing decision matrix

The questions we get every week.

Usage-based pricing charges customers based on how much they consume — tokens processed, API calls made, outcomes resolved — rather than a flat fee per seat. It is growing because AI has changed the relationship between human headcount (the traditional seat-based proxy for value) and actual work done. When an AI agent resolves 500 customer tickets with no human involvement, the seat count stays flat while the value delivered grows. That disconnect makes seat-based pricing both a poor revenue-capture mechanism for vendors and an unfair cost structure for customers. According to OpenView Partners' 2023 survey (the most recent published data), 61% of SaaS companies had adopted some form of UBP, with a further 21% planning to test it.