AI agent pricing fragmented across 14 vendors and 17 days of pricing events in May 2026 — spanning a 60× range from Gemini 3.1 Flash-Lite at $0.25/Mtok to Opus 4.7 Fast mode at $30/Mtok. This reference matrix captures every surcharge, multiplier, expiring promo, and subscription breakeven point as of May 21, 2026.

Three cross-vendor patterns emerged simultaneously this month: a long-context surcharge now applies from Anthropic, OpenAI, and Google; four separate intro promos are expiring within 10 days of this publication; and the Copilot premium-request multiplier for Opus 4.7 doubled on April 30 when the launch promo expired — making it 15× the cost of a Sonnet 4.6 request on the same subscription. None of these patterns appeared individually in headline coverage.

This guide covers all five proprietary tables: the API rate matrix across 20 model SKUs, the subscription tier matrix for 12 vendors, the effective per-task cost with all four levers applied (tokenizer, surcharge, cache, fast mode), the subscription vs API breakeven by task volume, and the May 2026 pricing-event chronology. Use the 10-tool cost calculator companion for interactive per-loop math.

Key takeaways

01
Composer 2.5 is 10× cheaper per token than Opus 4.7 standard.Cursor's Composer 2.5 standard tier launched May 18 at $0.50/$2.50 per Mtok input/output. Opus 4.7 standard is $5/$25 — but the new tokenizer can add up to 35% more tokens for the same text, widening the effective gap further.
02
The long-context surcharge is now a cross-vendor pattern.Anthropic Fast mode charges 6× standard rates, OpenAI GPT-5.5 applies 2× input / 1.5× output above 272K tokens for the full session, and Gemini 3.1 Pro Preview doubles input and raises output above 200K. Headline '1M context' pricing is never the full story.
03
Copilot's Opus 4.7 multiplier doubled when the launch promo expired.The April 30 expiry of Copilot's launch promo pushed the Opus 4.7 premium-request multiplier from 7.5× to 15×. A Pro plan's 300 monthly requests yields only ~20 effective Opus 4.7 prompts. Pro+ at $39 gives 1,500 requests — about 100 effective Opus prompts.
04
Four intro promos expire within 10 days of this post.Composer 2.5 first-week 2× promo ends ~May 25 (4 days out). Codex Pro 2× promo expires May 31 (10 days out). SuperGrok Heavy stays at $99/mo intro for 6 months then rises to ~$300/mo list. The Opus 4.7 Copilot 7.5× promo already expired April 30.
05
OSS BYOK tools shift all spend to the API tier you choose.Cline and Aider carry $0 license cost but route every token to your BYOK API key. A typical developer using Opus 4.7 uncached via Cline may spend $50–200/month in pure API costs depending on task volume — the same economics as direct API access, with zero subscription discount.

01 — 14 Vendors at a GlanceThe May 2026 frontier: 60× spread from cheapest to most expensive.

The table below is the master subscription tier matrix for May 21, 2026 — 12 vendors across free, mid, pro, and power tiers, with the key pricing event that changed the picture this month. Every cell is sourced from the vendor's live pricing page, retrieved May 24, 2026.

Cursor

Composer 2.5 — new standard

Free Hobby · $20 Individual · $40/user Teams · Enterprise custom. Composer 2.5 API: $0.50/$2.50 standard, $3/$15 Fast. First-week 2× promo ends ~May 25.

Best value for solo devs

Anthropic Claude Code

Pro / Max 5× / Max 20×

Pro $17–20/mo (5h windows, doubled May 6). Max 5× = $100/mo. Max 20× = $200/mo. API: Opus 4.7 $5/$25, Sonnet 4.6 $3/$15, Haiku 4.5 $1/$5.

Best for heavy API users

GitHub Copilot

Free / Pro / Pro+ — sign-ups paused

Free $0 (50 reqs). Pro $10/mo (300 reqs) — sign-ups paused since Apr 20. Pro+ $39/mo (1,500 reqs). June 1: usage-based billing. Per-credit pricing not yet published.

Best if already on GitHub Enterprise

OpenAI Codex

Plus / Pro — promo expires May 31

Codex Plus $20/mo (15–80 GPT-5.5 msgs/5h). Pro 5× $100/mo (80–400 msgs/5h). Pro 20× $200+/mo. 2× promo on $100 tier expires May 31 — 10 days from publication.

Best for GPT-5.5 power users

Amazon Kiro

Credit-based tiers

Free $0 (50 credits). Pro $20/mo (1K credits). Pro+ $40/mo (2K credits). Power $200/mo (10K credits). Overage $0.04/credit.

Best for AWS-integrated teams

Windsurf (Cognition)

Free / Pro / Max + Devin Cloud

Free $0. Pro $20/mo (standard allowance). Max $200/mo (heavy allowance + Devin Cloud access). Teams $40/user/mo.

Best for Devin Cloud access

xAI Grok Build

SuperGrok Heavy required

Grok Build API: $1/$2 per Mtok, 256K context. Requires SuperGrok Heavy subscription: $99/mo intro for 6 months, then ~$300/mo list. Up to 8 concurrent sub-agents.

Best for parallel sub-agent workloads

Antigravity (Google)

AI Pro / Ultra / Ultra Premium

Free $0 (rate-limited, secondary-source pricing — verify at antigravity.google/pricing). AI Pro $20/mo. AI Ultra $100/mo (5× Pro). AI Ultra Premium $200/mo (reduced from $250). Overage: $25 / 2,500 credits.

Best for Gemini-native IDE workflows

Source caveat — Antigravity pricing

Antigravity's own antigravity.google/pricing page is JS-rendered and could not be confirmed via direct fetch on May 24, 2026. The tier prices above are sourced from third-party aggregators (Vibecoding.app, Datastudios, ThinkPeak AI). Verify against antigravity.google/pricing before planning budget around these figures.

02 — Anthropic ClaudeOpus 4.7, Sonnet 4.6, Haiku 4.5 — plus Fast mode at 6×.

Anthropic's API pricing as of May 21, 2026 is deceptively simple on the surface — three models, flat per-Mtok rates — but four levers compound the effective cost: Fast mode (6× standard rates), tokenizer overhead (up to 35% more tokens on Opus 4.7), prompt caching (as low as 0.05× with batch stacking), and the Managed Agents session runtime surcharge ($0.08/session-hour, applicable only to Managed Agents — not Claude Code, not raw API).

Anthropic publishes the only vendor-anchored real-session cost in the industry: a one-hour Opus 4.7 coding session consuming 50,000 input tokens and 15,000 output tokens totals $0.705 uncached. With 40,000 of those input tokens as cache reads, the total drops to $0.525 — a 25.5% reduction. This worked example is the methodology anchor for all per-task estimates in this post.

Opus 4.7 standard

Input / output per Mtok

$5/$25

Full 1M context window at standard rate. New tokenizer may use up to 35% more tokens for equivalent text versus Opus 4.6 — same per-Mtok rate, higher effective $/task.

Fast mode: $30/$150 (6× standard)

Sonnet 4.6

Input / output per Mtok

$3/$15

1M context flat-priced. Prompt cache: 0.1× input on cache reads (5-min write at 1.25×, 1-hour write at 2×). Batch API: 50% off both input and output. Stacking yields 0.05× effective input.

Copilot multiplier: 1×

Haiku 4.5

Input / output per Mtok

$1/$5

200K context window. Most cost-effective Anthropic model for high-volume routine tasks. Prompt cache: 0.1× on reads. Copilot multiplier: 0.33× (the most economical premium-request spend on Copilot).

Copilot multiplier: 0.33×

On May 6, 2026, Anthropic doubled Claude Code's five-hour rate limits across Pro, Max, Team, and seat-based Enterprise plans. Claude Code on Pro ($17–20/mo) now provides 2× its prior window budget — the same subscription cost for meaningfully more throughput. For teams choosing between Pro and Max 5×, this change narrows the effective cost gap: run your own task-volume math before upgrading.

The data-residency surcharge is worth flagging for regulated industries: setting inference_geo: "us" on Claude 4.6+ models adds a 1.1× multiplier on top of standard rates. For an Opus 4.7 task at $1.00 uncached, that's an additional $0.10 per task — a cost that compounds rapidly at production scale. See the Opus 4.7 cost strategy guide for the full cache × batch × geo optimization framework.

03 — OpenAI GPT-5.xGPT-5.5 at $5/$30 — with a 272K long-context surcharge.

OpenAI's GPT-5.5 carries a headline rate of $5 input / $30 output per Mtok for standard use. The critical caveat — absent from most coverage — is in the verbatim model page language: prompts with more than 272K input tokens are "priced at 2× input and 1.5× output for the full session." This is not a marginal-token surcharge on the overflow — it is a retroactive surcharge applied to every token in that session once the 272K threshold is crossed. At 300K input, you pay $10 input / $45 output per Mtok for the entire exchange, not just the excess 28K tokens.

OpenAI GPT-5.x — input token rates per Mtok (log-scaled to $30 = 100%)

Source: OpenAI pricing docs + GPT-5.5 model page, retrieved May 24, 2026

GPT-5.5 standard (≤272K input)Input $5 / output $30 per Mtok · cached input $0.50

$5 in

GPT-5.5 above 272K (full session)2× input / 1.5× output — applies to full session, not just overflow

$10 in

GPT-5.5-pro standard (≤272K)Input $30 / output $180 per Mtok · same 272K surcharge rule

$30 in

GPT-5.4Input $2.50 / output $15 per Mtok · mini $0.75/$4.50 · nano $0.20/$1.25

$2.50 in

GPT-5.3-CodexInput $1.75 / output $14 per Mtok · cached $0.175 · dedicated Codex SKU

$1.75 in

For teams using GPT-5.5 with long codebase contexts — the primary use case driving the 1.05M context window — the practical ceiling for sub-surcharge sessions is 271K input tokens. Beyond that, effective input cost doubles. A 400K-input session costs $10/Mtok in rather than $5 for every single token, yielding an effective $4.00 per task at our modeled 400K input / 20K output — versus $2.20 at the standard rate.

The Codex subscription tiers (Plus $20/mo, Pro 5× $100/mo, Pro 20× $200+/mo) all use GPT-5.5 under the hood at 15–1,600 messages per 5-hour window. The Pro 2× promo — which doubles the $100/mo tier's message budget until May 31 — expires in 10 days from this post's publication. After June 1, Pro 5× reverts to its baseline 80–400 message band. See the GPT-5.5 1M-context complete guide for deep context strategy.

04 — Cursor Composer 2.5$0.50/$2.50 standard — 10× cheaper than Opus 4.7 per token.

Cursor shipped Composer 2.5 on May 18, 2026 (three days ago) with two API pricing tiers. The standard tier at $0.50/$2.50 per Mtok is the lowest input rate of any frontier-capable model in this matrix. The Fast variant at $3/$15 per Mtok matches Sonnet 4.6's rate while promising lower latency.

Standard

Composer 2.5 Standard

$0.50 in / $2.50 out per Mtok

The lowest per-token frontier rate in the May 2026 matrix. Context window not publicly specified. Cache multipliers not published — the only confirmed discount is the first-week double-usage promo, which ends ~May 25 (4 days from publication).

First-week 2× promo until ~May 25

Fast

Composer 2.5 Fast

$3.00 in / $15.00 out per Mtok

Same intelligence as standard at 6× higher input rate, targeting lower latency. Cursor describes this as 'lower cost than the fast tiers of other frontier models' — accurate versus Opus 4.7 Fast mode at $30 in.

Matches Sonnet 4.6 rate

Transparency gap

Cursor publishes per-token input/output rates for Composer 2.5 but has not disclosed cache multipliers, context window size, or effective per-task figures. The only published discount is the first-week double-usage promo (ends ~May 25). Every other vendor in this matrix publishes more granular cost data. Frame Composer 2.5 budgets using the raw per-token rates plus a 15–30% uncertainty buffer until Cursor publishes the full cost model.

At our modeled baseline of 100K input / 20K output tokens per task, Composer 2.5 standard lands at $0.10/task — versus $1.00/task for Opus 4.7 uncached and $1.35/task with the 35% tokenizer overhead. The 10× per-token gap is real at this model volume, though Cursor does not publish Composer 2.5's benchmark performance on SWE-Bench Verified, making capability comparison harder than price comparison. Cursor publishes CursorBench v3.1 (vendor-controlled) and SWE-Bench Multilingual results.

05 — Google Gemini APIGemini 3.5 Flash GA at $1.50/$9 — 3.1 Pro tiered above 200K.

Google launched Gemini 3.5 Flash to GA on May 19, 2026 at I/O (two days ago) at $1.50/$9.00 per Mtok input/output. The launch also confirmed a 1.05M token context window and a $0.15 cached-input rate with $1.00/hour storage. Gemini 3.5 Flash becomes the primary mid-tier option for cost-sensitive agentic workloads — see the Gemini 3.5 Flash vs GPT-5.5 vs Opus 4.7 head-to-head for quality comparison.

Gemini 3.1 Pro Preview carries the most complex pricing structure in this matrix: a two-breakpoint tiered model where both input and output rates rise above 200K context tokens. This is a distinct structure from the GPT-5.5 272K surcharge (which only triggers input repricing) — Gemini 3.1 Pro raises both legs simultaneously.

Google Gemini API — input rates per Mtok (log-scaled to Gemini 3.1 Pro >200K = 45%)

Source: ai.google.dev/gemini-api/docs/pricing, live WebFetch May 24, 2026

Gemini 3.1 Flash-LiteInput $0.25 / output $1.50 per Mtok · lowest Google rate

$0.25 in

Gemini 3 Flash PreviewInput $0.50 / output $3.00 per Mtok · audio inputs $1.00

$0.50 in

Gemini 3.5 Flash (GA — May 19)Input $1.50 / output $9.00 per Mtok · cached $0.15 · $1/hr storage

$1.50 in

Gemini 3.1 Pro Preview (≤200K)Input $2.00 / output $12.00 · cached $0.20 · $4.50/hr storage

$2.00 in

Gemini 3.1 Pro Preview (>200K input)Input $4.00 / output $18.00 · cached $0.40 — BOTH rates rise above 200K

$4.00 in

The Gemini 3.1 Pro 200K threshold deserves special attention: Gemini's input-token counting includes all conversation history, tool descriptions, and system prompts — not just the user's latest message. An agentic session with a 50K system prompt, 80K tool registry, and a 100K codebase snapshot already sits at 230K tokens before any user input arrives, triggering the higher tier for the entire session. Budget at $4 input / $18 output for any Gemini 3.1 Pro agent running against full-project context.

06 — xAI Grok 4.3 + Grok BuildGrok 4.3 at $1.25/$2.50 — Grok Build requires SuperGrok Heavy.

xAI's API pricing was confirmed via live WebFetch of docs.x.ai/docs/models on May 24, 2026. Grok 4.3 ($1.25 input / $2.50 output per Mtok, 1M context window) is the general-purpose model available via standard xAI API access. Grok Build ($1.00/$2.00, 256K context, model ID grok-build-0.1, alias grok-code-fast-1) is the specialized coding agent with sub-agent parallelism — up to 8 concurrent AI agents running in parallel.

The critical caveat on Grok Build access: SuperGrok Heavy subscription is required. The current introductory price is $99/mo for the first 6 months, then the list price rises to approximately $300/mo (some secondary sources cite $299 — treat as ~$300). The $99 figure is the promo rate, not the steady-state cost. At the $300/mo list rate, Grok Build access costs more than Claude Code Max 20× ($200/mo) — the per-task math favors Grok Build only for high-parallelism workloads that can saturate all 8 concurrent agent slots.

Grok 4.3

Input / output per Mtok

$1.25/$2.50

1M context window. API IDs: grok-4.3 / grok-4.3-latest. Standard xAI API access — no SuperGrok subscription required. Confirmed via live docs.x.ai/docs/models fetch.

1M context · no sub required

Grok Build

Input / output per Mtok

$1.00/$2.00

256K context. Model ID: grok-build-0.1. Requires SuperGrok Heavy at $99/mo intro (6 months), then ~$300/mo list. Up to 8 concurrent sub-agents.

$99/mo intro → ~$300/mo list

SuperGrok Heavy breakeven

Tasks/mo at $0.14/task to break even at $99

707

At modeled 100K in / 20K out per task ($0.14/task via Grok Build), SuperGrok Heavy at $99/mo breaks even at ~707 tasks/month. At $300/mo list, breakeven rises to ~2,143 tasks/month.

~2,143 tasks at $300/mo list

07 — GitHub CopilotOpus 4.7 = 15× multiplier — premium requests are not tasks.

GitHub Copilot's premium-request system is the most misunderstood pricing construct in the matrix. A "premium request" is not a task, not a completion, and not a fixed token budget — it is a unit that consumes 1–50 underlying model messages depending on the model, scaled by a per-model multiplier. The multipliers below are the live values as of May 24, 2026, reconciled exactly with Day 05's Copilot Gemini removal analysis.

GitHub Copilot premium-request multipliers — May 2026

Source: docs.github.com/en/copilot/concepts/billing/copilot-requests, retrieved May 24, 2026

Opus 4.6 Fast mode30× multiplier · Pro 300 reqs = 10 effective Opus Fast prompts

30×

Opus 4.715× multiplier (was 7.5× promo, expired Apr 30) · Pro 300 reqs = 20 effective

15×

Opus 4.63× multiplier · Pro 300 reqs = 100 effective Opus 4.6 prompts

3×

GPT-5.57.5× multiplier · Pro 300 reqs = 40 effective GPT-5.5 prompts

7.5×

Sonnet 4.6 / GPT-5.4 / GPT-5.3-Codex1× multiplier · Pro 300 reqs = 300 effective prompts

1×

Haiku 4.5 / GPT-5.4-mini0.33× multiplier · Pro 300 reqs = ~909 effective prompts

0.33×

The practical unit conversion: Copilot Pro ($10/mo, 300 premium requests) allows approximately 20 effective Opus 4.7 prompts per month at the current 15× multiplier. Pro+ ($39/mo, 1,500 requests) yields about 100 effective Opus 4.7 prompts. By contrast, 300 Sonnet 4.6 prompts (at 1×) exhaust the same 300-request budget completely.

Three critical facts about the current Copilot situation: (1) Pro sign-ups have been paused since April 20, 2026 as GitHub rolls out a "flexible billing experience." (2) Usage-based billing transitions on June 1, 2026 (11 days from publication) — per-credit dollar pricing has not been published as of May 21. (3) Paid-plan subscribers using auto-model-selection receive a 10% multiplier discount. Yesterday (May 20), GitHub removed all Gemini models and GPT-5.2 Codex / GPT-5.4 nano from Copilot Chat on the web — scoped to the web surface only, not VS Code, JetBrains, or CLI.

08 — Breakeven AnalysisSubscription vs API: how many tasks per month to justify each tier?

The table below models breakeven task volumes at 100K input / 20K output tokens per task (uncached, base rate). These are modeling assumptions, not vendor-disclosed figures — Anthropic's $0.705 worked example anchors the methodology. The 10-tool cost calculator companion post provides the interactive version of this math. For the full per-task / per-user framework, see our agent cost metrics framework.

Claude Code Pro — $20/mo

20–33 tasks/month to break even

Anthropic Pro at $17–20/mo. Breakeven: ~20 tasks/mo via Opus 4.7 uncached ($1.00/task) or ~33 tasks/mo via Sonnet 4.6 ($0.60/task). 5h-window-capped. Limits doubled May 6.

Solo dev, light Opus use

Claude Code Max 5× — $100/mo

100–166 tasks/month to break even

Breakeven: ~100 tasks/mo (Opus 4.7) or ~166 tasks/mo (Sonnet 4.6). At 5 tasks/day workday pace, Opus 4.7 breaks even at Max 5×. Heavy users should compare vs direct API + caching.

Team leads, 5+ tasks/day

Claude Code Max 20× — $200/mo

200–333 tasks/month to break even

Breakeven: ~200 tasks/mo (Opus 4.7) or ~333 tasks/mo (Sonnet 4.6). At ~10 tasks/workday, Opus 4.7 at Max 20× is cost-neutral versus direct API — plus subscription convenience.

Power users, 10+ tasks/day

Copilot Pro — $10/mo (300 premium reqs)

Multiplier-aware: ~20 effective Opus prompts

Opus 4.7 at 15×: 300 reqs / 15 = 20 effective Opus prompts. GPT-5.5 at 7.5×: ~40 prompts. Sonnet 4.6 at 1×: 300 prompts. Premium requests are not tasks — convert using multiplier before comparing to API spend.

Already on GitHub, light AI use

Cursor Individual — $20/mo

~200 tasks/mo at Composer 2.5 standard rate

At $0.10/task (100K in / 20K out via Composer 2.5 std), $20 subscription is cost-neutral at ~200 tasks/mo. Composer 2.5 Fast: ~33 tasks/mo breakeven at $0.60/task. First-week 2× promo ends ~May 25.

Best per-task ROI in the matrix

SuperGrok Heavy — $99/mo intro

~707 tasks/mo to break even (intro rate)

At $0.14/task (Grok Build, 100K in / 20K out), $99/mo breaks even at ~707 tasks/month. After the 6-month intro expires at ~$300/mo list, breakeven rises to ~2,143 tasks/month — justify only with high-parallelism use of all 8 sub-agent slots.

High-volume parallel agent workloads

At 100K input / 20K output per task, Composer 2.5 standard costs $0.10 — Opus 4.7 uncached costs $1.00. The 10× token-rate gap translates directly into subscription breakeven math: 200 Composer tasks equal the breakeven of 20 Opus tasks at the same monthly spend.Digital Applied synthesis, May 21, 2026

09 — Cross-Vendor PatternThe long-context surcharge pattern: three vendors, same playbook.

The most analytically significant finding in this matrix is the convergence of long-context surcharges across three top vendors — each implemented differently, all charging materially more than the headline rate once a context threshold is crossed. The coverage gap is striking: every major publication reports "1M context" as a headline feature without surfacing the surcharge that applies to it.

Anthropic

Fast Mode — 6× standard rates

$30/$150 per Mtok (Opus 4.7/4.6)

Opt-in fast-output mode for Opus 4.6 and 4.7. Full 1M context window at 6× standard rate. Not triggered automatically by context length — must be explicitly selected. No cached-input rate published for Fast mode.

Opt-in · $5→$30 input

OpenAI

GPT-5.5 272K — 2× input / 1.5× output

Full-session surcharge above 272K input

Verbatim: 'prompts with >272K input tokens are priced at 2× input and 1.5× output for the full session.' Triggered automatically by context size. The surcharge applies retroactively to the entire session — not just the overflow tokens.

Auto-triggered · $5→$10 input

Google

Gemini 3.1 Pro — tiered at 200K

$2→$4 input / $12→$18 output above 200K

Both input and output rates double above the 200K threshold — a two-leg surcharge versus OpenAI's one-leg (input-only) model. Cached input also doubles: $0.20→$0.40 per Mtok. Cache storage rises to $4.50/hr above 200K.

Two-leg surcharge · both rates rise

The strategic implication: teams building long-context agents must model cost at their expected p90 context size, not the baseline rate. An Anthropic agent running Opus 4.7 Fast mode across 1M context costs $30/Mtok input — 6× what the pricing page suggests as the default. A Google agent running Gemini 3.1 Pro against 500K tokens of codebase context costs $4/Mtok input — double the sub-200K rate. Budget for the surcharge, not the headline.

Our Q2 2026 price vs performance efficient frontier charts each model's post-surcharge effective cost against benchmark quality — the picture shifts significantly once surcharge economics are applied.

Introductory discounts are now the industry default launch mechanism for AI agent pricing. Four separate promos have overlapped in the May 2026 window — and three of them expire within 10 days of this post. The pattern is consistent: vendors launch with a 2×–7.5× discount to seed adoption, then revert to a materially higher rate that often doubles or triples effective cost per task.

Expired April 30

Opus 4.7 on Copilot

7.5×→15×

Copilot Opus 4.7 launch promo ran at 7.5× multiplier for ~10 days post-launch. On April 30, the promo expired and the multiplier doubled to 15×. Copilot Pro users went from ~40 effective Opus prompts to ~20 overnight.

Expired Apr 30 — now permanent at 15×

Expires ~May 25

Composer 2.5 first-week promo

2× usage

Cursor's first-week double-usage promo for Composer 2.5 ends approximately May 25 — 4 days from publication. After expiry, subscription-included usage reverts to the standard 1× allowance. API token rates remain at $0.50/$2.50.

4 days from publication

Expires May 31

Codex Pro $100/mo tier

2× messages

OpenAI's verbatim: 'Double your normal Codex usage on the $100/month tier until May 31, 2026.' Pro 5× reverts to baseline 80–400 GPT-5.5 msgs/5h on June 1. Teams at the $100 tier should plan for 50% message-budget reduction in 10 days.

10 days from publication — June 1 revert

The fourth promo — SuperGrok Heavy at $99/mo for the first 6 months — has a longer runway but the steepest cliff: the list price of approximately $300/mo (nearly 3× the intro rate) applies after month 6. Teams committing to Grok Build on the basis of the $99 entry price should model their June 2026 budgets at ~$300/mo and ensure the per-task economics justify the investment at full price.

The broader projection: introductory pricing as a launch norm means any vendor-disclosed rate published in the first 90 days of a product launch may be materially lower than the steady-state cost. Budget planning for AI agent infrastructure should use post-promo rates as the planning baseline, with the intro discount treated as a temporary reduction — not the long-term price.

11 — 17-Day TimelineWhat changed in May 2026: eight pricing events, one matrix.

The May 2026 pricing landscape shifted on at least eight distinct events between April 30 and the publication of this post on May 21. No single publication assembled this chronology before today. The sequence below is sourced from primary vendor changelogs, documentation updates, and the batch research files that anchor this post's figures.

Apr 30

Opus 4.7 Copilot promo expired — 7.5× doubled to 15×

GitHub Copilot's launch-period Opus 4.7 multiplier discount ended. The multiplier rose from 7.5× to 15× permanently, cutting effective Opus 4.7 prompts per Pro plan in half overnight.

Copilot Pro users lost ~50% of Opus capacity

May 6

Anthropic doubled Claude Code 5-hour rate limits

Anthropic announced doubled 5-hour rate limits for Claude Code across Pro, Max, Team, and Enterprise plans — same subscription cost, 2× throughput. Announced alongside a SpaceX enterprise deal.

All Claude Code subscribers benefited

May 17

GitHub switched Business/Enterprise base model to GPT-5.3-Codex

GitHub Copilot for Business and Enterprise shifted the default base-completion model to GPT-5.3-Codex ($1.75/$14 per Mtok), replacing the prior GPT-5.4 default. Changes Copilot's cost structure for completions without premium-request consumption.

Enterprise completion costs shifted

May 18

Cursor shipped Composer 2.5 at $0.50/$2.50

Composer 2.5 launched at the lowest frontier-capable per-token rate in the matrix. First-week double-usage promo activated. Fast variant at $3/$15 also available.

New low end of the cost spectrum

May 19

Google launched Gemini 3.5 Flash GA + Antigravity 2.0 at I/O

Gemini 3.5 Flash reached GA at $1.50/$9 per Mtok with 1.05M context. Antigravity 2.0 (desktop IDE + agy CLI + SDK) announced at I/O. Managed Agents pricing ($0.08/session-hour) entered public preview.

New Google mid-tier competitor at $1.50 in

May 20

GitHub pulled Gemini + GPT-5.2 Codex / GPT-5.4 nano from Copilot Chat web

All Gemini models and several others removed from Copilot Chat on the web only. VS Code, JetBrains, and CLI surfaces were not affected by the same announcement.

Web-surface only — IDE Copilot unchanged

May 25 (4 days out)

Composer 2.5 first-week promo ends

Cursor's double-usage first-week promo for Composer 2.5 expires approximately May 25. Standard subscription allowance resumes; API token rates unchanged at $0.50/$2.50.

Plan on 1× allowance from May 26

May 31 / Jun 1 (10–11 days out)

Codex Pro promo expires + Copilot usage-based billing

Codex Pro $100/mo 2× promo ends May 31 — Pro 5× reverts to 80–400 GPT-5.5 msgs/5h. On June 1, Copilot transitions to usage-based billing; per-credit dollar pricing not published as of this post's date.

Two changes landing 24 hours apart

Copilot usage-based billing — unresolved

GitHub has not published per-credit dollar pricing for the June 1, 2026 usage-based billing transition as of May 21. The transition is confirmed; the price-per-credit is not. Monitor github.blog/changelog and docs.github.com/en/copilot/concepts/billing for the pricing disclosure before June 1.

12 — OSS BYOKCline and Aider: "free" in license, API-spend-bound in practice.

Open-source BYOK tools — Cline (Apache 2.0) and Aider (free and open source) — carry zero license cost and zero subscription overhead. Every dollar of AI spend flows directly to the API provider of your choice at standard published rates, with no markup and no bundled allowance. For teams with the operational maturity to manage API keys and cost attribution, BYOK is the most transparent pricing model in the matrix.

The practical cost profile of an OSS BYOK developer depends entirely on model choice. At our modeled 100K in / 20K out per task:

OSS BYOK effective cost per task at modeled 100K in / 20K out (no cache)

Per-task calculation: (100K × input rate + 20K × output rate) / 1,000,000. Modeling assumptions — not vendor-disclosed.

Cline / Aider → Gemini 3.1 Flash-Lite$0.055/task at 100K in / 20K out · ~364 tasks for $20/mo budget

$0.055

Cline / Aider → Grok 4.3$0.175/task · ~114 tasks for $20/mo budget

$0.175

Cline / Aider → GPT-5.3-Codex$0.455/task · ~44 tasks for $20/mo budget

$0.455

Cline / Aider → Sonnet 4.6$0.60/task · ~33 tasks for $20/mo budget

$0.60

Cline / Aider → Opus 4.7 (uncached)$1.00/task · ~20 tasks for $20/mo budget · +35% tokenizer overhead possible

$1.00

A developer running Cline with Opus 4.7 at 20 substantive tasks per workday would spend approximately $20/day — $400–440/month at a standard work schedule, potentially rising to $540 with the 35% tokenizer overhead on representative code inputs. The Continue.dev Team plan ($20/seat/month with $10 in API credits) partially offsets this, but $10 in credits covers only 10 Opus 4.7 tasks at standard rates.

For teams tracking cost per successful task, the cost-per-successful-task metric framework provides the right unit for OSS BYOK vs subscription comparison — raw per-task cost ignores task success rates, which vary significantly by model and workload type. Our AI transformation practice runs cost-attribution benchmarks across BYOK and subscription models for specific codebases and agent patterns before recommending a tier commitment. The 20-platform agentic coding matrix covers OSS BYOK alongside the full subscription landscape.

The shape of AI agent pricing, May 2026

Headline token rates are the floor — not the price you actually pay.

The May 2026 AI agent pricing landscape has three structural characteristics that headline coverage consistently misses. First, long-context surcharges from Anthropic, OpenAI, and Google mean the advertised rate is the minimum — the effective cost at production context sizes is 1.5× to 6× higher. Second, four overlapping intro promos are collapsing within 10 days of this post, with rate jumps of 2× to 3× baked in. Third, the Copilot premium-request multiplier system makes per-prompt cost comparison to direct API pricing non-trivial — the unit conversion requires knowing both your model mix and the per-model multiplier.

The 10× token-rate gap between Composer 2.5 standard ($0.50/Mtok) and Opus 4.7 ($5.00/Mtok) represents the widest per-token spread between frontier-capable models in any prior quarter. Whether that gap reflects a genuine capability difference at task-success level — not just benchmark performance — is the question every team should be benchmarking against their own code and agent patterns, not adopting wholesale from vendor-published numbers.

The most durable conclusion from this matrix is methodological: the right unit for AI agent cost planning is effective $/successful-task, not $/Mtok. That unit requires knowing your task success rate by model, your cache hit rate, your typical context distribution, and whether your workload crosses the long-context surcharge threshold. This post has given you all four inputs by vendor — the combination is yours to model against your actual usage pattern.

AI Agent Pricing: May 2026 Full Matrix

01 — 14 Vendors at a GlanceThe May 2026 frontier: 60× spread from cheapest to most expensive.

Composer 2.5 — new standard

Pro / Max 5× / Max 20×

Free / Pro / Pro+ — sign-ups paused

Plus / Pro — promo expires May 31

Credit-based tiers

Free / Pro / Max + Devin Cloud

SuperGrok Heavy required

AI Pro / Ultra / Ultra Premium

02 — Anthropic ClaudeOpus 4.7, Sonnet 4.6, Haiku 4.5 — plus Fast mode at 6×.

Input / output per Mtok

Input / output per Mtok

Input / output per Mtok

03 — OpenAI GPT-5.xGPT-5.5 at $5/$30 — with a 272K long-context surcharge.

OpenAI GPT-5.x — input token rates per Mtok (log-scaled to $30 = 100%)

04 — Cursor Composer 2.5$0.50/$2.50 standard — 10× cheaper than Opus 4.7 per token.

Composer 2.5 Standard

Composer 2.5 Fast

05 — Google Gemini APIGemini 3.5 Flash GA at $1.50/$9 — 3.1 Pro tiered above 200K.

Google Gemini API — input rates per Mtok (log-scaled to Gemini 3.1 Pro >200K = 45%)

06 — xAI Grok 4.3 + Grok BuildGrok 4.3 at $1.25/$2.50 — Grok Build requires SuperGrok Heavy.

Input / output per Mtok

Input / output per Mtok

Tasks/mo at $0.14/task to break even at $99

07 — GitHub CopilotOpus 4.7 = 15× multiplier — premium requests are not tasks.

GitHub Copilot premium-request multipliers — May 2026

08 — Breakeven AnalysisSubscription vs API: how many tasks per month to justify each tier?

20–33 tasks/month to break even

100–166 tasks/month to break even

200–333 tasks/month to break even

Multiplier-aware: ~20 effective Opus prompts

~200 tasks/mo at Composer 2.5 standard rate

~707 tasks/mo to break even (intro rate)

09 — Cross-Vendor PatternThe long-context surcharge pattern: three vendors, same playbook.

Fast Mode — 6× standard rates

GPT-5.5 272K — 2× input / 1.5× output

Gemini 3.1 Pro — tiered at 200K

10 — Industry PatternFour intro promos, three expiring in the next 10 days.

Opus 4.7 on Copilot

Composer 2.5 first-week promo

Codex Pro $100/mo tier

11 — 17-Day TimelineWhat changed in May 2026: eight pricing events, one matrix.

Opus 4.7 Copilot promo expired — 7.5× doubled to 15×

Anthropic doubled Claude Code 5-hour rate limits

GitHub switched Business/Enterprise base model to GPT-5.3-Codex

Cursor shipped Composer 2.5 at $0.50/$2.50

Google launched Gemini 3.5 Flash GA + Antigravity 2.0 at I/O

GitHub pulled Gemini + GPT-5.2 Codex / GPT-5.4 nano from Copilot Chat web

Composer 2.5 first-week promo ends

Codex Pro promo expires + Copilot usage-based billing

12 — OSS BYOKCline and Aider: "free" in license, API-spend-bound in practice.

OSS BYOK effective cost per task at modeled 100K in / 20K out (no cache)

Headline token rates are the floor — not the price you actually pay.

Token rates are the starting point. Effective $/task is the number that matters.

AI agent cost attribution

Questions we get every week.

Continue exploring AI agent economics.

Agent Coding Cost: Composer 2.5 vs Opus vs GPT-5.5

Grok 4.5 vs Opus 4.8 vs GPT-5.5: Which Model Wins?

Grok 4.5 Ships: SpaceXAI's Coding and Office Agent

Fable 5 + GLM-5.2: Orchestrator Brain, Open-Weight Muscle