AI DevelopmentPricing Tracker11 min readPublished May 21, 2026

This post anchors to the May 2026 AI agent pricing landscape — 14 vendors, 17 days of pricing shifts, snapshotted today (May 21, 2026).

AI Agent Pricing: May 2026 Full Matrix

14 vendors. 17 days of pricing shifts. Token rates from $0.25 to $30 per Mtok. This is the single-page reference matrix for AI agent costs as of May 21, 2026 — every surcharge, multiplier, and expiring promo in one place.

DA
Digital Applied Team
Senior strategists · Published May 21, 2026
PublishedMay 21, 2026
Read time11 min
Sources18 primary
Vendors tracked
14
API + subscription tiers
May 21 snapshot
Token rate range
$0.25–$30
per Mtok input
Gemini Flash-Lite → Opus Fast
Opus 4.7 × Copilot
15×
premium-request multiplier
Promo expired Apr 30
Anthropic worked example
$0.705
1h Opus 4.7 session, no cache
Vendor-anchored baseline

AI agent pricing fragmented across 14 vendors and 17 days of pricing events in May 2026 — spanning a 60× range from Gemini 3.1 Flash-Lite at $0.25/Mtok to Opus 4.7 Fast mode at $30/Mtok. This reference matrix captures every surcharge, multiplier, expiring promo, and subscription breakeven point as of May 21, 2026.

Three cross-vendor patterns emerged simultaneously this month: a long-context surcharge now applies from Anthropic, OpenAI, and Google; four separate intro promos are expiring within 10 days of this publication; and the Copilot premium-request multiplier for Opus 4.7 doubled on April 30 when the launch promo expired — making it 15× the cost of a Sonnet 4.6 request on the same subscription. None of these patterns appeared individually in headline coverage.

This guide covers all five proprietary tables: the API rate matrix across 20 model SKUs, the subscription tier matrix for 12 vendors, the effective per-task cost with all four levers applied (tokenizer, surcharge, cache, fast mode), the subscription vs API breakeven by task volume, and the May 2026 pricing-event chronology. Use the 10-tool cost calculator companion for interactive per-loop math.

Key takeaways
  1. 01
    Composer 2.5 is 10× cheaper per token than Opus 4.7 standard.Cursor's Composer 2.5 standard tier launched May 18 at $0.50/$2.50 per Mtok input/output. Opus 4.7 standard is $5/$25 — but the new tokenizer can add up to 35% more tokens for the same text, widening the effective gap further.
  2. 02
    The long-context surcharge is now a cross-vendor pattern.Anthropic Fast mode charges 6× standard rates, OpenAI GPT-5.5 applies 2× input / 1.5× output above 272K tokens for the full session, and Gemini 3.1 Pro Preview doubles input and raises output above 200K. Headline '1M context' pricing is never the full story.
  3. 03
    Copilot's Opus 4.7 multiplier doubled when the launch promo expired.The April 30 expiry of Copilot's launch promo pushed the Opus 4.7 premium-request multiplier from 7.5× to 15×. A Pro plan's 300 monthly requests yields only ~20 effective Opus 4.7 prompts. Pro+ at $39 gives 1,500 requests — about 100 effective Opus prompts.
  4. 04
    Four intro promos expire within 10 days of this post.Composer 2.5 first-week 2× promo ends ~May 25 (4 days out). Codex Pro 2× promo expires May 31 (10 days out). SuperGrok Heavy stays at $99/mo intro for 6 months then rises to ~$300/mo list. The Opus 4.7 Copilot 7.5× promo already expired April 30.
  5. 05
    OSS BYOK tools shift all spend to the API tier you choose.Cline and Aider carry $0 license cost but route every token to your BYOK API key. A typical developer using Opus 4.7 uncached via Cline may spend $50–200/month in pure API costs depending on task volume — the same economics as direct API access, with zero subscription discount.

0114 Vendors at a GlanceThe May 2026 frontier: 60× spread from cheapest to most expensive.

The table below is the master subscription tier matrix for May 21, 2026 — 12 vendors across free, mid, pro, and power tiers, with the key pricing event that changed the picture this month. Every cell is sourced from the vendor's live pricing page, retrieved May 24, 2026.

Cursor
Composer 2.5 — new standard

Free Hobby · $20 Individual · $40/user Teams · Enterprise custom. Composer 2.5 API: $0.50/$2.50 standard, $3/$15 Fast. First-week 2× promo ends ~May 25.

Best value for solo devs
Anthropic Claude Code
Pro / Max 5× / Max 20×

Pro $17–20/mo (5h windows, doubled May 6). Max 5× = $100/mo. Max 20× = $200/mo. API: Opus 4.7 $5/$25, Sonnet 4.6 $3/$15, Haiku 4.5 $1/$5.

Best for heavy API users
GitHub Copilot
Free / Pro / Pro+ — sign-ups paused

Free $0 (50 reqs). Pro $10/mo (300 reqs) — sign-ups paused since Apr 20. Pro+ $39/mo (1,500 reqs). June 1: usage-based billing. Per-credit pricing not yet published.

Best if already on GitHub Enterprise
OpenAI Codex
Plus / Pro — promo expires May 31

Codex Plus $20/mo (15–80 GPT-5.5 msgs/5h). Pro 5× $100/mo (80–400 msgs/5h). Pro 20× $200+/mo. 2× promo on $100 tier expires May 31 — 10 days from publication.

Best for GPT-5.5 power users
Amazon Kiro
Credit-based tiers

Free $0 (50 credits). Pro $20/mo (1K credits). Pro+ $40/mo (2K credits). Power $200/mo (10K credits). Overage $0.04/credit.

Best for AWS-integrated teams
Windsurf (Cognition)
Free / Pro / Max + Devin Cloud

Free $0. Pro $20/mo (standard allowance). Max $200/mo (heavy allowance + Devin Cloud access). Teams $40/user/mo.

Best for Devin Cloud access
xAI Grok Build
SuperGrok Heavy required

Grok Build API: $1/$2 per Mtok, 256K context. Requires SuperGrok Heavy subscription: $99/mo intro for 6 months, then ~$300/mo list. Up to 8 concurrent sub-agents.

Best for parallel sub-agent workloads
Antigravity (Google)
AI Pro / Ultra / Ultra Premium

Free $0 (rate-limited, secondary-source pricing — verify at antigravity.google/pricing). AI Pro $20/mo. AI Ultra $100/mo (5× Pro). AI Ultra Premium $200/mo (reduced from $250). Overage: $25 / 2,500 credits.

Best for Gemini-native IDE workflows
Source caveat — Antigravity pricing
Antigravity's own antigravity.google/pricing page is JS-rendered and could not be confirmed via direct fetch on May 24, 2026. The tier prices above are sourced from third-party aggregators (Vibecoding.app, Datastudios, ThinkPeak AI). Verify against antigravity.google/pricing before planning budget around these figures.

02Anthropic ClaudeOpus 4.7, Sonnet 4.6, Haiku 4.5 — plus Fast mode at 6×.

Anthropic's API pricing as of May 21, 2026 is deceptively simple on the surface — three models, flat per-Mtok rates — but four levers compound the effective cost: Fast mode (6× standard rates), tokenizer overhead (up to 35% more tokens on Opus 4.7), prompt caching (as low as 0.05× with batch stacking), and the Managed Agents session runtime surcharge ($0.08/session-hour, applicable only to Managed Agents — not Claude Code, not raw API).

Anthropic publishes the only vendor-anchored real-session cost in the industry: a one-hour Opus 4.7 coding session consuming 50,000 input tokens and 15,000 output tokens totals $0.705 uncached. With 40,000 of those input tokens as cache reads, the total drops to $0.525 — a 25.5% reduction. This worked example is the methodology anchor for all per-task estimates in this post.

Opus 4.7 standard
Input / output per Mtok
$5/$25

Full 1M context window at standard rate. New tokenizer may use up to 35% more tokens for equivalent text versus Opus 4.6 — same per-Mtok rate, higher effective $/task.

Fast mode: $30/$150 (6× standard)
Sonnet 4.6
Input / output per Mtok
$3/$15

1M context flat-priced. Prompt cache: 0.1× input on cache reads (5-min write at 1.25×, 1-hour write at 2×). Batch API: 50% off both input and output. Stacking yields 0.05× effective input.

Copilot multiplier: 1×
Haiku 4.5
Input / output per Mtok
$1/$5

200K context window. Most cost-effective Anthropic model for high-volume routine tasks. Prompt cache: 0.1× on reads. Copilot multiplier: 0.33× (the most economical premium-request spend on Copilot).

Copilot multiplier: 0.33×

On May 6, 2026, Anthropic doubled Claude Code's five-hour rate limits across Pro, Max, Team, and seat-based Enterprise plans. Claude Code on Pro ($17–20/mo) now provides 2× its prior window budget — the same subscription cost for meaningfully more throughput. For teams choosing between Pro and Max 5×, this change narrows the effective cost gap: run your own task-volume math before upgrading.

The data-residency surcharge is worth flagging for regulated industries: setting inference_geo: "us" on Claude 4.6+ models adds a 1.1× multiplieron top of standard rates. For an Opus 4.7 task at $1.00 uncached, that's an additional $0.10 per task — a cost that compounds rapidly at production scale. See the Opus 4.7 cost strategy guide for the full cache × batch × geo optimization framework.

03OpenAI GPT-5.xGPT-5.5 at $5/$30 — with a 272K long-context surcharge.

OpenAI's GPT-5.5 carries a headline rate of $5 input / $30 output per Mtok for standard use. The critical caveat — absent from most coverage — is in the verbatim model page language: prompts with more than 272K input tokens are "priced at 2× input and 1.5× output for the full session." This is not a marginal-token surcharge on the overflow — it is a retroactive surcharge applied to every token in that session once the 272K threshold is crossed. At 300K input, you pay $10 input / $45 output per Mtok for the entire exchange, not just the excess 28K tokens.

OpenAI GPT-5.x — input token rates per Mtok (log-scaled to $30 = 100%)

Source: OpenAI pricing docs + GPT-5.5 model page, retrieved May 24, 2026
GPT-5.5 standard (≤272K input)Input $5 / output $30 per Mtok · cached input $0.50
$5 in
GPT-5.5 above 272K (full session)2× input / 1.5× output — applies to full session, not just overflow
$10 in
GPT-5.5-pro standard (≤272K)Input $30 / output $180 per Mtok · same 272K surcharge rule
$30 in
GPT-5.4Input $2.50 / output $15 per Mtok · mini $0.75/$4.50 · nano $0.20/$1.25
$2.50 in
GPT-5.3-CodexInput $1.75 / output $14 per Mtok · cached $0.175 · dedicated Codex SKU
$1.75 in

For teams using GPT-5.5 with long codebase contexts — the primary use case driving the 1.05M context window — the practical ceiling for sub-surcharge sessions is 271K input tokens. Beyond that, effective input cost doubles. A 400K-input session costs $10/Mtok in rather than $5 for every single token, yielding an effective $4.00 per task at our modeled 400K input / 20K output — versus $2.20 at the standard rate.

The Codex subscription tiers (Plus $20/mo, Pro 5× $100/mo, Pro 20× $200+/mo) all use GPT-5.5 under the hood at 15–1,600 messages per 5-hour window. The Pro 2× promo — which doubles the $100/mo tier's message budget until May 31 — expires in 10 days from this post's publication. After June 1, Pro 5× reverts to its baseline 80–400 message band. See the GPT-5.5 1M-context complete guide for deep context strategy.

04Cursor Composer 2.5$0.50/$2.50 standard — 10× cheaper than Opus 4.7 per token.

Cursor shipped Composer 2.5 on May 18, 2026 (three days ago) with two API pricing tiers. The standard tier at $0.50/$2.50 per Mtok is the lowest input rate of any frontier-capable model in this matrix. The Fast variant at $3/$15 per Mtok matches Sonnet 4.6's rate while promising lower latency.

Standard
Composer 2.5 Standard
$0.50 in / $2.50 out per Mtok

The lowest per-token frontier rate in the May 2026 matrix. Context window not publicly specified. Cache multipliers not published — the only confirmed discount is the first-week double-usage promo, which ends ~May 25 (4 days from publication).

First-week 2× promo until ~May 25
Fast
Composer 2.5 Fast
$3.00 in / $15.00 out per Mtok

Same intelligence as standard at 6× higher input rate, targeting lower latency. Cursor describes this as 'lower cost than the fast tiers of other frontier models' — accurate versus Opus 4.7 Fast mode at $30 in.

Matches Sonnet 4.6 rate
Transparency gap
Cursor publishes per-token input/output rates for Composer 2.5 but has not disclosed cache multipliers, context window size, or effective per-task figures. The only published discount is the first-week double-usage promo (ends ~May 25). Every other vendor in this matrix publishes more granular cost data. Frame Composer 2.5 budgets using the raw per-token rates plus a 15–30% uncertainty buffer until Cursor publishes the full cost model.

At our modeled baseline of 100K input / 20K output tokens per task, Composer 2.5 standard lands at $0.10/task— versus $1.00/task for Opus 4.7 uncached and $1.35/task with the 35% tokenizer overhead. The 10× per-token gap is real at this model volume, though Cursor does not publish Composer 2.5's benchmark performance on SWE-Bench Verified, making capability comparison harder than price comparison. Cursor publishes CursorBench v3.1 (vendor-controlled) and SWE-Bench Multilingual results.

05Google Gemini APIGemini 3.5 Flash GA at $1.50/$9 — 3.1 Pro tiered above 200K.

Google launched Gemini 3.5 Flash to GA on May 19, 2026 at I/O (two days ago) at $1.50/$9.00 per Mtok input/output. The launch also confirmed a 1.05M token context window and a $0.15 cached-input rate with $1.00/hour storage. Gemini 3.5 Flash becomes the primary mid-tier option for cost-sensitive agentic workloads — see the Gemini 3.5 Flash vs GPT-5.5 vs Opus 4.7 head-to-head for quality comparison.

Gemini 3.1 Pro Preview carries the most complex pricing structure in this matrix: a two-breakpoint tiered model where both input and output rates rise above 200K context tokens. This is a distinct structure from the GPT-5.5 272K surcharge (which only triggers input repricing) — Gemini 3.1 Pro raises both legs simultaneously.

Google Gemini API — input rates per Mtok (log-scaled to Gemini 3.1 Pro >200K = 45%)

Source: ai.google.dev/gemini-api/docs/pricing, live WebFetch May 24, 2026
Gemini 3.1 Flash-LiteInput $0.25 / output $1.50 per Mtok · lowest Google rate
$0.25 in
Gemini 3 Flash PreviewInput $0.50 / output $3.00 per Mtok · audio inputs $1.00
$0.50 in
Gemini 3.5 Flash (GA — May 19)Input $1.50 / output $9.00 per Mtok · cached $0.15 · $1/hr storage
$1.50 in
Gemini 3.1 Pro Preview (≤200K)Input $2.00 / output $12.00 · cached $0.20 · $4.50/hr storage
$2.00 in
Gemini 3.1 Pro Preview (>200K input)Input $4.00 / output $18.00 · cached $0.40 — BOTH rates rise above 200K
$4.00 in

The Gemini 3.1 Pro 200K threshold deserves special attention: Gemini's input-token counting includes all conversation history, tool descriptions, and system prompts — not just the user's latest message. An agentic session with a 50K system prompt, 80K tool registry, and a 100K codebase snapshot already sits at 230K tokens before any user input arrives, triggering the higher tier for the entire session. Budget at $4 input / $18 output for any Gemini 3.1 Pro agent running against full-project context.

06xAI Grok 4.3 + Grok BuildGrok 4.3 at $1.25/$2.50 — Grok Build requires SuperGrok Heavy.

xAI's API pricing was confirmed via live WebFetch of docs.x.ai/docs/models on May 24, 2026. Grok 4.3 ($1.25 input / $2.50 output per Mtok, 1M context window) is the general-purpose model available via standard xAI API access. Grok Build ($1.00/$2.00, 256K context, model ID grok-build-0.1, alias grok-code-fast-1) is the specialized coding agent with sub-agent parallelism — up to 8 concurrent AI agents running in parallel.

The critical caveat on Grok Build access: SuperGrok Heavy subscription is required. The current introductory price is $99/mo for the first 6 months, then the list price rises to approximately $300/mo (some secondary sources cite $299 — treat as ~$300). The $99 figure is the promo rate, not the steady-state cost. At the $300/mo list rate, Grok Build access costs more than Claude Code Max 20× ($200/mo) — the per-task math favors Grok Build only for high-parallelism workloads that can saturate all 8 concurrent agent slots.

Grok 4.3
Input / output per Mtok
$1.25/$2.50

1M context window. API IDs: grok-4.3 / grok-4.3-latest. Standard xAI API access — no SuperGrok subscription required. Confirmed via live docs.x.ai/docs/models fetch.

1M context · no sub required
Grok Build
Input / output per Mtok
$1.00/$2.00

256K context. Model ID: grok-build-0.1. Requires SuperGrok Heavy at $99/mo intro (6 months), then ~$300/mo list. Up to 8 concurrent sub-agents.

$99/mo intro → ~$300/mo list
SuperGrok Heavy breakeven
Tasks/mo at $0.14/task to break even at $99
707

At modeled 100K in / 20K out per task ($0.14/task via Grok Build), SuperGrok Heavy at $99/mo breaks even at ~707 tasks/month. At $300/mo list, breakeven rises to ~2,143 tasks/month.

~2,143 tasks at $300/mo list

07GitHub CopilotOpus 4.7 = 15× multiplier — premium requests are not tasks.

GitHub Copilot's premium-request system is the most misunderstood pricing construct in the matrix. A "premium request" is not a task, not a completion, and not a fixed token budget — it is a unit that consumes 1–50 underlying model messages depending on the model, scaled by a per-model multiplier. The multipliers below are the live values as of May 24, 2026, reconciled exactly with Day 05's Copilot Gemini removal analysis.

GitHub Copilot premium-request multipliers — May 2026

Source: docs.github.com/en/copilot/concepts/billing/copilot-requests, retrieved May 24, 2026
Opus 4.6 Fast mode30× multiplier · Pro 300 reqs = 10 effective Opus Fast prompts
30×
Opus 4.715× multiplier (was 7.5× promo, expired Apr 30) · Pro 300 reqs = 20 effective
15×
Opus 4.63× multiplier · Pro 300 reqs = 100 effective Opus 4.6 prompts
GPT-5.57.5× multiplier · Pro 300 reqs = 40 effective GPT-5.5 prompts
7.5×
Sonnet 4.6 / GPT-5.4 / GPT-5.3-Codex1× multiplier · Pro 300 reqs = 300 effective prompts
Haiku 4.5 / GPT-5.4-mini0.33× multiplier · Pro 300 reqs = ~909 effective prompts
0.33×

The practical unit conversion: Copilot Pro ($10/mo, 300 premium requests) allows approximately 20 effective Opus 4.7 prompts per month at the current 15× multiplier. Pro+ ($39/mo, 1,500 requests) yields about 100 effective Opus 4.7 prompts. By contrast, 300 Sonnet 4.6 prompts (at 1×) exhaust the same 300-request budget completely.

Three critical facts about the current Copilot situation: (1) Pro sign-ups have been paused since April 20, 2026 as GitHub rolls out a "flexible billing experience." (2) Usage-based billing transitions on June 1, 2026 (11 days from publication) — per-credit dollar pricing has not been published as of May 21. (3) Paid-plan subscribers using auto-model-selection receive a 10% multiplier discount. Yesterday (May 20), GitHub removed all Gemini models and GPT-5.2 Codex / GPT-5.4 nano from Copilot Chat on the web — scoped to the web surface only, not VS Code, JetBrains, or CLI.

08Breakeven AnalysisSubscription vs API: how many tasks per month to justify each tier?

The table below models breakeven task volumes at 100K input / 20K output tokens per task (uncached, base rate). These are modeling assumptions, not vendor-disclosed figures — Anthropic's $0.705 worked example anchors the methodology. The 10-tool cost calculator companion post provides the interactive version of this math. For the full per-task / per-user framework, see our agent cost metrics framework.

Claude Code Pro — $20/mo
20–33 tasks/month to break even

Anthropic Pro at $17–20/mo. Breakeven: ~20 tasks/mo via Opus 4.7 uncached ($1.00/task) or ~33 tasks/mo via Sonnet 4.6 ($0.60/task). 5h-window-capped. Limits doubled May 6.

Solo dev, light Opus use
Claude Code Max 5× — $100/mo
100–166 tasks/month to break even

Breakeven: ~100 tasks/mo (Opus 4.7) or ~166 tasks/mo (Sonnet 4.6). At 5 tasks/day workday pace, Opus 4.7 breaks even at Max 5×. Heavy users should compare vs direct API + caching.

Team leads, 5+ tasks/day
Claude Code Max 20× — $200/mo
200–333 tasks/month to break even

Breakeven: ~200 tasks/mo (Opus 4.7) or ~333 tasks/mo (Sonnet 4.6). At ~10 tasks/workday, Opus 4.7 at Max 20× is cost-neutral versus direct API — plus subscription convenience.

Power users, 10+ tasks/day
Copilot Pro — $10/mo (300 premium reqs)
Multiplier-aware: ~20 effective Opus prompts

Opus 4.7 at 15×: 300 reqs / 15 = 20 effective Opus prompts. GPT-5.5 at 7.5×: ~40 prompts. Sonnet 4.6 at 1×: 300 prompts. Premium requests are not tasks — convert using multiplier before comparing to API spend.

Already on GitHub, light AI use
Cursor Individual — $20/mo
~200 tasks/mo at Composer 2.5 standard rate

At $0.10/task (100K in / 20K out via Composer 2.5 std), $20 subscription is cost-neutral at ~200 tasks/mo. Composer 2.5 Fast: ~33 tasks/mo breakeven at $0.60/task. First-week 2× promo ends ~May 25.

Best per-task ROI in the matrix
SuperGrok Heavy — $99/mo intro
~707 tasks/mo to break even (intro rate)

At $0.14/task (Grok Build, 100K in / 20K out), $99/mo breaks even at ~707 tasks/month. After the 6-month intro expires at ~$300/mo list, breakeven rises to ~2,143 tasks/month — justify only with high-parallelism use of all 8 sub-agent slots.

High-volume parallel agent workloads
At 100K input / 20K output per task, Composer 2.5 standard costs $0.10 — Opus 4.7 uncached costs $1.00. The 10× token-rate gap translates directly into subscription breakeven math: 200 Composer tasks equal the breakeven of 20 Opus tasks at the same monthly spend.Digital Applied synthesis, May 21, 2026

09Cross-Vendor PatternThe long-context surcharge pattern: three vendors, same playbook.

The most analytically significant finding in this matrix is the convergence of long-context surcharges across three top vendors — each implemented differently, all charging materially more than the headline rate once a context threshold is crossed. The coverage gap is striking: every major publication reports "1M context" as a headline feature without surfacing the surcharge that applies to it.

Anthropic
Fast Mode — 6× standard rates
$30/$150 per Mtok (Opus 4.7/4.6)

Opt-in fast-output mode for Opus 4.6 and 4.7. Full 1M context window at 6× standard rate. Not triggered automatically by context length — must be explicitly selected. No cached-input rate published for Fast mode.

Opt-in · $5→$30 input
OpenAI
GPT-5.5 272K — 2× input / 1.5× output
Full-session surcharge above 272K input

Verbatim: 'prompts with >272K input tokens are priced at 2× input and 1.5× output for the full session.' Triggered automatically by context size. The surcharge applies retroactively to the entire session — not just the overflow tokens.

Auto-triggered · $5→$10 input
Google
Gemini 3.1 Pro — tiered at 200K
$2→$4 input / $12→$18 output above 200K

Both input and output rates double above the 200K threshold — a two-leg surcharge versus OpenAI's one-leg (input-only) model. Cached input also doubles: $0.20→$0.40 per Mtok. Cache storage rises to $4.50/hr above 200K.

Two-leg surcharge · both rates rise

The strategic implication: teams building long-context agents must model cost at their expected p90 context size, not the baseline rate. An Anthropic agent running Opus 4.7 Fast mode across 1M context costs $30/Mtok input — 6× what the pricing page suggests as the default. A Google agent running Gemini 3.1 Pro against 500K tokens of codebase context costs $4/Mtok input — double the sub-200K rate. Budget for the surcharge, not the headline.

Our Q2 2026 price vs performance efficient frontier charts each model's post-surcharge effective cost against benchmark quality — the picture shifts significantly once surcharge economics are applied.

10Industry PatternFour intro promos, three expiring in the next 10 days.

Introductory discounts are now the industry default launch mechanism for AI agent pricing. Four separate promos have overlapped in the May 2026 window — and three of them expire within 10 days of this post. The pattern is consistent: vendors launch with a 2×–7.5× discount to seed adoption, then revert to a materially higher rate that often doubles or triples effective cost per task.

Expired April 30
Opus 4.7 on Copilot
7.5×→15×

Copilot Opus 4.7 launch promo ran at 7.5× multiplier for ~10 days post-launch. On April 30, the promo expired and the multiplier doubled to 15×. Copilot Pro users went from ~40 effective Opus prompts to ~20 overnight.

Expired Apr 30 — now permanent at 15×
Expires ~May 25
Composer 2.5 first-week promo
usage

Cursor's first-week double-usage promo for Composer 2.5 ends approximately May 25 — 4 days from publication. After expiry, subscription-included usage reverts to the standard 1× allowance. API token rates remain at $0.50/$2.50.

4 days from publication
Expires May 31
Codex Pro $100/mo tier
messages

OpenAI's verbatim: 'Double your normal Codex usage on the $100/month tier until May 31, 2026.' Pro 5× reverts to baseline 80–400 GPT-5.5 msgs/5h on June 1. Teams at the $100 tier should plan for 50% message-budget reduction in 10 days.

10 days from publication — June 1 revert

The fourth promo — SuperGrok Heavy at $99/mo for the first 6 months — has a longer runway but the steepest cliff: the list price of approximately $300/mo (nearly 3× the intro rate) applies after month 6. Teams committing to Grok Build on the basis of the $99 entry price should model their June 2026 budgets at ~$300/mo and ensure the per-task economics justify the investment at full price.

The broader projection: introductory pricing as a launch norm means any vendor-disclosed rate published in the first 90 days of a product launch may be materially lower than the steady-state cost. Budget planning for AI agent infrastructure should use post-promo rates as the planning baseline, with the intro discount treated as a temporary reduction — not the long-term price.

1117-Day TimelineWhat changed in May 2026: eight pricing events, one matrix.

The May 2026 pricing landscape shifted on at least eight distinct events between April 30 and the publication of this post on May 21. No single publication assembled this chronology before today. The sequence below is sourced from primary vendor changelogs, documentation updates, and the batch research files that anchor this post's figures.

Apr 30
Opus 4.7 Copilot promo expired — 7.5× doubled to 15×

GitHub Copilot's launch-period Opus 4.7 multiplier discount ended. The multiplier rose from 7.5× to 15× permanently, cutting effective Opus 4.7 prompts per Pro plan in half overnight.

Copilot Pro users lost ~50% of Opus capacity
May 6
Anthropic doubled Claude Code 5-hour rate limits

Anthropic announced doubled 5-hour rate limits for Claude Code across Pro, Max, Team, and Enterprise plans — same subscription cost, 2× throughput. Announced alongside a SpaceX enterprise deal.

All Claude Code subscribers benefited
May 17
GitHub switched Business/Enterprise base model to GPT-5.3-Codex

GitHub Copilot for Business and Enterprise shifted the default base-completion model to GPT-5.3-Codex ($1.75/$14 per Mtok), replacing the prior GPT-5.4 default. Changes Copilot's cost structure for completions without premium-request consumption.

Enterprise completion costs shifted
May 18
Cursor shipped Composer 2.5 at $0.50/$2.50

Composer 2.5 launched at the lowest frontier-capable per-token rate in the matrix. First-week double-usage promo activated. Fast variant at $3/$15 also available.

New low end of the cost spectrum
May 19
Google launched Gemini 3.5 Flash GA + Antigravity 2.0 at I/O

Gemini 3.5 Flash reached GA at $1.50/$9 per Mtok with 1.05M context. Antigravity 2.0 (desktop IDE + agy CLI + SDK) announced at I/O. Managed Agents pricing ($0.08/session-hour) entered public preview.

New Google mid-tier competitor at $1.50 in
May 20
GitHub pulled Gemini + GPT-5.2 Codex / GPT-5.4 nano from Copilot Chat web

All Gemini models and several others removed from Copilot Chat on the web only. VS Code, JetBrains, and CLI surfaces were not affected by the same announcement.

Web-surface only — IDE Copilot unchanged
May 25 (4 days out)
Composer 2.5 first-week promo ends

Cursor's double-usage first-week promo for Composer 2.5 expires approximately May 25. Standard subscription allowance resumes; API token rates unchanged at $0.50/$2.50.

Plan on 1× allowance from May 26
May 31 / Jun 1 (10–11 days out)
Codex Pro promo expires + Copilot usage-based billing

Codex Pro $100/mo 2× promo ends May 31 — Pro 5× reverts to 80–400 GPT-5.5 msgs/5h. On June 1, Copilot transitions to usage-based billing; per-credit dollar pricing not published as of this post's date.

Two changes landing 24 hours apart
Copilot usage-based billing — unresolved
GitHub has not published per-credit dollar pricing for the June 1, 2026 usage-based billing transition as of May 21. The transition is confirmed; the price-per-credit is not. Monitor github.blog/changelog and docs.github.com/en/copilot/concepts/billing for the pricing disclosure before June 1.

12OSS BYOKCline and Aider: "free" in license, API-spend-bound in practice.

Open-source BYOK tools — Cline (Apache 2.0) and Aider (free and open source) — carry zero license cost and zero subscription overhead. Every dollar of AI spend flows directly to the API provider of your choice at standard published rates, with no markup and no bundled allowance. For teams with the operational maturity to manage API keys and cost attribution, BYOK is the most transparent pricing model in the matrix.

The practical cost profile of an OSS BYOK developer depends entirely on model choice. At our modeled 100K in / 20K out per task:

OSS BYOK effective cost per task at modeled 100K in / 20K out (no cache)

Per-task calculation: (100K × input rate + 20K × output rate) / 1,000,000. Modeling assumptions — not vendor-disclosed.
Cline / Aider → Gemini 3.1 Flash-Lite$0.055/task at 100K in / 20K out · ~364 tasks for $20/mo budget
$0.055
Cline / Aider → Grok 4.3$0.175/task · ~114 tasks for $20/mo budget
$0.175
Cline / Aider → GPT-5.3-Codex$0.455/task · ~44 tasks for $20/mo budget
$0.455
Cline / Aider → Sonnet 4.6$0.60/task · ~33 tasks for $20/mo budget
$0.60
Cline / Aider → Opus 4.7 (uncached)$1.00/task · ~20 tasks for $20/mo budget · +35% tokenizer overhead possible
$1.00

A developer running Cline with Opus 4.7 at 20 substantive tasks per workday would spend approximately $20/day — $400–440/month at a standard work schedule, potentially rising to $540 with the 35% tokenizer overhead on representative code inputs. The Continue.dev Team plan ($20/seat/month with $10 in API credits) partially offsets this, but $10 in credits covers only 10 Opus 4.7 tasks at standard rates.

For teams tracking cost per successful task, the cost-per-successful-task metric framework provides the right unit for OSS BYOK vs subscription comparison — raw per-task cost ignores task success rates, which vary significantly by model and workload type. Our AI transformation practice runs cost-attribution benchmarks across BYOK and subscription models for specific codebases and agent patterns before recommending a tier commitment. The 20-platform agentic coding matrix covers OSS BYOK alongside the full subscription landscape.

The shape of AI agent pricing, May 2026

Headline token rates are the floor — not the price you actually pay.

The May 2026 AI agent pricing landscape has three structural characteristics that headline coverage consistently misses. First, long-context surcharges from Anthropic, OpenAI, and Google mean the advertised rate is the minimum — the effective cost at production context sizes is 1.5× to 6× higher. Second, four overlapping intro promos are collapsing within 10 days of this post, with rate jumps of 2× to 3× baked in. Third, the Copilot premium-request multiplier system makes per-prompt cost comparison to direct API pricing non-trivial — the unit conversion requires knowing both your model mix and the per-model multiplier.

The 10× token-rate gap between Composer 2.5 standard ($0.50/Mtok) and Opus 4.7 ($5.00/Mtok) represents the widest per-token spread between frontier-capable models in any prior quarter. Whether that gap reflects a genuine capability difference at task-success level — not just benchmark performance — is the question every team should be benchmarking against their own code and agent patterns, not adopting wholesale from vendor-published numbers.

The most durable conclusion from this matrix is methodological: the right unit for AI agent cost planning is effective $/successful-task, not $/Mtok. That unit requires knowing your task success rate by model, your cache hit rate, your typical context distribution, and whether your workload crosses the long-context surcharge threshold. This post has given you all four inputs by vendor — the combination is yours to model against your actual usage pattern.

Know your actual per-task cost before you commit to a tier

Token rates are the starting point. Effective $/task is the number that matters.

We benchmark AI agent costs across your actual codebase and task patterns — comparing subscription, API, and BYOK economics before you commit to a tier. Delivered in days, not quarters.

Free consultationExpert guidanceTailored solutions
What we benchmark

AI agent cost attribution

  • Per-task cost across Opus 4.7, Sonnet 4.6, GPT-5.5, Gemini 3.5 Flash
  • Subscription vs API breakeven at your task volume
  • Long-context surcharge modelling for your p90 context size
  • Copilot multiplier-aware effective prompt budget by tier
  • OSS BYOK vs subscription total cost of ownership
FAQ · AI Agent Pricing May 2026

Questions we get every week.

By raw input token rate, Gemini 3.1 Flash-Lite is the cheapest at $0.25/Mtok input and $1.50/Mtok output — confirmed via live fetch of ai.google.dev/gemini-api/docs/pricing on May 24, 2026. For frontier-capable models with strong coding performance, Cursor's Composer 2.5 standard at $0.50/$2.50 is the lowest rate. xAI Grok 4.3 at $1.25/$2.50 and Grok Build at $1.00/$2.00 occupy the next tier. All three are substantially cheaper than Opus 4.7 standard ($5/$25) or GPT-5.5 ($5/$30) per token. Whether cheaper tokens translate to cheaper per-task costs depends on task success rates, which vary by model and workload type — benchmark on your specific code patterns before committing to a tier.