AI agent pricing fragmented across 14 vendors and 17 days of pricing events in May 2026 — spanning a 60× range from Gemini 3.1 Flash-Lite at $0.25/Mtok to Opus 4.7 Fast mode at $30/Mtok. This reference matrix captures every surcharge, multiplier, expiring promo, and subscription breakeven point as of May 21, 2026.
Three cross-vendor patterns emerged simultaneously this month: a long-context surcharge now applies from Anthropic, OpenAI, and Google; four separate intro promos are expiring within 10 days of this publication; and the Copilot premium-request multiplier for Opus 4.7 doubled on April 30 when the launch promo expired — making it 15× the cost of a Sonnet 4.6 request on the same subscription. None of these patterns appeared individually in headline coverage.
This guide covers all five proprietary tables: the API rate matrix across 20 model SKUs, the subscription tier matrix for 12 vendors, the effective per-task cost with all four levers applied (tokenizer, surcharge, cache, fast mode), the subscription vs API breakeven by task volume, and the May 2026 pricing-event chronology. Use the 10-tool cost calculator companion for interactive per-loop math.
- 01Composer 2.5 is 10× cheaper per token than Opus 4.7 standard.Cursor's Composer 2.5 standard tier launched May 18 at $0.50/$2.50 per Mtok input/output. Opus 4.7 standard is $5/$25 — but the new tokenizer can add up to 35% more tokens for the same text, widening the effective gap further.
- 02The long-context surcharge is now a cross-vendor pattern.Anthropic Fast mode charges 6× standard rates, OpenAI GPT-5.5 applies 2× input / 1.5× output above 272K tokens for the full session, and Gemini 3.1 Pro Preview doubles input and raises output above 200K. Headline '1M context' pricing is never the full story.
- 03Copilot's Opus 4.7 multiplier doubled when the launch promo expired.The April 30 expiry of Copilot's launch promo pushed the Opus 4.7 premium-request multiplier from 7.5× to 15×. A Pro plan's 300 monthly requests yields only ~20 effective Opus 4.7 prompts. Pro+ at $39 gives 1,500 requests — about 100 effective Opus prompts.
- 04Four intro promos expire within 10 days of this post.Composer 2.5 first-week 2× promo ends ~May 25 (4 days out). Codex Pro 2× promo expires May 31 (10 days out). SuperGrok Heavy stays at $99/mo intro for 6 months then rises to ~$300/mo list. The Opus 4.7 Copilot 7.5× promo already expired April 30.
- 05OSS BYOK tools shift all spend to the API tier you choose.Cline and Aider carry $0 license cost but route every token to your BYOK API key. A typical developer using Opus 4.7 uncached via Cline may spend $50–200/month in pure API costs depending on task volume — the same economics as direct API access, with zero subscription discount.
01 — 14 Vendors at a GlanceThe May 2026 frontier: 60× spread from cheapest to most expensive.
The table below is the master subscription tier matrix for May 21, 2026 — 12 vendors across free, mid, pro, and power tiers, with the key pricing event that changed the picture this month. Every cell is sourced from the vendor's live pricing page, retrieved May 24, 2026.
Composer 2.5 — new standard
Free Hobby · $20 Individual · $40/user Teams · Enterprise custom. Composer 2.5 API: $0.50/$2.50 standard, $3/$15 Fast. First-week 2× promo ends ~May 25.
Pro / Max 5× / Max 20×
Pro $17–20/mo (5h windows, doubled May 6). Max 5× = $100/mo. Max 20× = $200/mo. API: Opus 4.7 $5/$25, Sonnet 4.6 $3/$15, Haiku 4.5 $1/$5.
Free / Pro / Pro+ — sign-ups paused
Free $0 (50 reqs). Pro $10/mo (300 reqs) — sign-ups paused since Apr 20. Pro+ $39/mo (1,500 reqs). June 1: usage-based billing. Per-credit pricing not yet published.
Plus / Pro — promo expires May 31
Codex Plus $20/mo (15–80 GPT-5.5 msgs/5h). Pro 5× $100/mo (80–400 msgs/5h). Pro 20× $200+/mo. 2× promo on $100 tier expires May 31 — 10 days from publication.
Credit-based tiers
Free $0 (50 credits). Pro $20/mo (1K credits). Pro+ $40/mo (2K credits). Power $200/mo (10K credits). Overage $0.04/credit.
Free / Pro / Max + Devin Cloud
Free $0. Pro $20/mo (standard allowance). Max $200/mo (heavy allowance + Devin Cloud access). Teams $40/user/mo.
SuperGrok Heavy required
Grok Build API: $1/$2 per Mtok, 256K context. Requires SuperGrok Heavy subscription: $99/mo intro for 6 months, then ~$300/mo list. Up to 8 concurrent sub-agents.
AI Pro / Ultra / Ultra Premium
Free $0 (rate-limited, secondary-source pricing — verify at antigravity.google/pricing). AI Pro $20/mo. AI Ultra $100/mo (5× Pro). AI Ultra Premium $200/mo (reduced from $250). Overage: $25 / 2,500 credits.
antigravity.google/pricing page is JS-rendered and could not be confirmed via direct fetch on May 24, 2026. The tier prices above are sourced from third-party aggregators (Vibecoding.app, Datastudios, ThinkPeak AI). Verify against antigravity.google/pricing before planning budget around these figures.02 — Anthropic ClaudeOpus 4.7, Sonnet 4.6, Haiku 4.5 — plus Fast mode at 6×.
Anthropic's API pricing as of May 21, 2026 is deceptively simple on the surface — three models, flat per-Mtok rates — but four levers compound the effective cost: Fast mode (6× standard rates), tokenizer overhead (up to 35% more tokens on Opus 4.7), prompt caching (as low as 0.05× with batch stacking), and the Managed Agents session runtime surcharge ($0.08/session-hour, applicable only to Managed Agents — not Claude Code, not raw API).
Anthropic publishes the only vendor-anchored real-session cost in the industry: a one-hour Opus 4.7 coding session consuming 50,000 input tokens and 15,000 output tokens totals $0.705 uncached. With 40,000 of those input tokens as cache reads, the total drops to $0.525 — a 25.5% reduction. This worked example is the methodology anchor for all per-task estimates in this post.
Input / output per Mtok
Full 1M context window at standard rate. New tokenizer may use up to 35% more tokens for equivalent text versus Opus 4.6 — same per-Mtok rate, higher effective $/task.
Input / output per Mtok
1M context flat-priced. Prompt cache: 0.1× input on cache reads (5-min write at 1.25×, 1-hour write at 2×). Batch API: 50% off both input and output. Stacking yields 0.05× effective input.
Input / output per Mtok
200K context window. Most cost-effective Anthropic model for high-volume routine tasks. Prompt cache: 0.1× on reads. Copilot multiplier: 0.33× (the most economical premium-request spend on Copilot).
On May 6, 2026, Anthropic doubled Claude Code's five-hour rate limits across Pro, Max, Team, and seat-based Enterprise plans. Claude Code on Pro ($17–20/mo) now provides 2× its prior window budget — the same subscription cost for meaningfully more throughput. For teams choosing between Pro and Max 5×, this change narrows the effective cost gap: run your own task-volume math before upgrading.
The data-residency surcharge is worth flagging for regulated industries: setting inference_geo: "us" on Claude 4.6+ models adds a 1.1× multiplieron top of standard rates. For an Opus 4.7 task at $1.00 uncached, that's an additional $0.10 per task — a cost that compounds rapidly at production scale. See the Opus 4.7 cost strategy guide for the full cache × batch × geo optimization framework.
03 — OpenAI GPT-5.xGPT-5.5 at $5/$30 — with a 272K long-context surcharge.
OpenAI's GPT-5.5 carries a headline rate of $5 input / $30 output per Mtok for standard use. The critical caveat — absent from most coverage — is in the verbatim model page language: prompts with more than 272K input tokens are "priced at 2× input and 1.5× output for the full session." This is not a marginal-token surcharge on the overflow — it is a retroactive surcharge applied to every token in that session once the 272K threshold is crossed. At 300K input, you pay $10 input / $45 output per Mtok for the entire exchange, not just the excess 28K tokens.
OpenAI GPT-5.x — input token rates per Mtok (log-scaled to $30 = 100%)
Source: OpenAI pricing docs + GPT-5.5 model page, retrieved May 24, 2026For teams using GPT-5.5 with long codebase contexts — the primary use case driving the 1.05M context window — the practical ceiling for sub-surcharge sessions is 271K input tokens. Beyond that, effective input cost doubles. A 400K-input session costs $10/Mtok in rather than $5 for every single token, yielding an effective $4.00 per task at our modeled 400K input / 20K output — versus $2.20 at the standard rate.
The Codex subscription tiers (Plus $20/mo, Pro 5× $100/mo, Pro 20× $200+/mo) all use GPT-5.5 under the hood at 15–1,600 messages per 5-hour window. The Pro 2× promo — which doubles the $100/mo tier's message budget until May 31 — expires in 10 days from this post's publication. After June 1, Pro 5× reverts to its baseline 80–400 message band. See the GPT-5.5 1M-context complete guide for deep context strategy.
04 — Cursor Composer 2.5$0.50/$2.50 standard — 10× cheaper than Opus 4.7 per token.
Cursor shipped Composer 2.5 on May 18, 2026 (three days ago) with two API pricing tiers. The standard tier at $0.50/$2.50 per Mtok is the lowest input rate of any frontier-capable model in this matrix. The Fast variant at $3/$15 per Mtok matches Sonnet 4.6's rate while promising lower latency.
Composer 2.5 Standard
The lowest per-token frontier rate in the May 2026 matrix. Context window not publicly specified. Cache multipliers not published — the only confirmed discount is the first-week double-usage promo, which ends ~May 25 (4 days from publication).
Composer 2.5 Fast
Same intelligence as standard at 6× higher input rate, targeting lower latency. Cursor describes this as 'lower cost than the fast tiers of other frontier models' — accurate versus Opus 4.7 Fast mode at $30 in.
At our modeled baseline of 100K input / 20K output tokens per task, Composer 2.5 standard lands at $0.10/task— versus $1.00/task for Opus 4.7 uncached and $1.35/task with the 35% tokenizer overhead. The 10× per-token gap is real at this model volume, though Cursor does not publish Composer 2.5's benchmark performance on SWE-Bench Verified, making capability comparison harder than price comparison. Cursor publishes CursorBench v3.1 (vendor-controlled) and SWE-Bench Multilingual results.
05 — Google Gemini APIGemini 3.5 Flash GA at $1.50/$9 — 3.1 Pro tiered above 200K.
Google launched Gemini 3.5 Flash to GA on May 19, 2026 at I/O (two days ago) at $1.50/$9.00 per Mtok input/output. The launch also confirmed a 1.05M token context window and a $0.15 cached-input rate with $1.00/hour storage. Gemini 3.5 Flash becomes the primary mid-tier option for cost-sensitive agentic workloads — see the Gemini 3.5 Flash vs GPT-5.5 vs Opus 4.7 head-to-head for quality comparison.
Gemini 3.1 Pro Preview carries the most complex pricing structure in this matrix: a two-breakpoint tiered model where both input and output rates rise above 200K context tokens. This is a distinct structure from the GPT-5.5 272K surcharge (which only triggers input repricing) — Gemini 3.1 Pro raises both legs simultaneously.
Google Gemini API — input rates per Mtok (log-scaled to Gemini 3.1 Pro >200K = 45%)
Source: ai.google.dev/gemini-api/docs/pricing, live WebFetch May 24, 2026The Gemini 3.1 Pro 200K threshold deserves special attention: Gemini's input-token counting includes all conversation history, tool descriptions, and system prompts — not just the user's latest message. An agentic session with a 50K system prompt, 80K tool registry, and a 100K codebase snapshot already sits at 230K tokens before any user input arrives, triggering the higher tier for the entire session. Budget at $4 input / $18 output for any Gemini 3.1 Pro agent running against full-project context.
06 — xAI Grok 4.3 + Grok BuildGrok 4.3 at $1.25/$2.50 — Grok Build requires SuperGrok Heavy.
xAI's API pricing was confirmed via live WebFetch of docs.x.ai/docs/models on May 24, 2026. Grok 4.3 ($1.25 input / $2.50 output per Mtok, 1M context window) is the general-purpose model available via standard xAI API access. Grok Build ($1.00/$2.00, 256K context, model ID grok-build-0.1, alias grok-code-fast-1) is the specialized coding agent with sub-agent parallelism — up to 8 concurrent AI agents running in parallel.
The critical caveat on Grok Build access: SuperGrok Heavy subscription is required. The current introductory price is $99/mo for the first 6 months, then the list price rises to approximately $300/mo (some secondary sources cite $299 — treat as ~$300). The $99 figure is the promo rate, not the steady-state cost. At the $300/mo list rate, Grok Build access costs more than Claude Code Max 20× ($200/mo) — the per-task math favors Grok Build only for high-parallelism workloads that can saturate all 8 concurrent agent slots.
Input / output per Mtok
1M context window. API IDs: grok-4.3 / grok-4.3-latest. Standard xAI API access — no SuperGrok subscription required. Confirmed via live docs.x.ai/docs/models fetch.
Input / output per Mtok
256K context. Model ID: grok-build-0.1. Requires SuperGrok Heavy at $99/mo intro (6 months), then ~$300/mo list. Up to 8 concurrent sub-agents.
Tasks/mo at $0.14/task to break even at $99
At modeled 100K in / 20K out per task ($0.14/task via Grok Build), SuperGrok Heavy at $99/mo breaks even at ~707 tasks/month. At $300/mo list, breakeven rises to ~2,143 tasks/month.
07 — GitHub CopilotOpus 4.7 = 15× multiplier — premium requests are not tasks.
GitHub Copilot's premium-request system is the most misunderstood pricing construct in the matrix. A "premium request" is not a task, not a completion, and not a fixed token budget — it is a unit that consumes 1–50 underlying model messages depending on the model, scaled by a per-model multiplier. The multipliers below are the live values as of May 24, 2026, reconciled exactly with Day 05's Copilot Gemini removal analysis.
GitHub Copilot premium-request multipliers — May 2026
Source: docs.github.com/en/copilot/concepts/billing/copilot-requests, retrieved May 24, 2026The practical unit conversion: Copilot Pro ($10/mo, 300 premium requests) allows approximately 20 effective Opus 4.7 prompts per month at the current 15× multiplier. Pro+ ($39/mo, 1,500 requests) yields about 100 effective Opus 4.7 prompts. By contrast, 300 Sonnet 4.6 prompts (at 1×) exhaust the same 300-request budget completely.
Three critical facts about the current Copilot situation: (1) Pro sign-ups have been paused since April 20, 2026 as GitHub rolls out a "flexible billing experience." (2) Usage-based billing transitions on June 1, 2026 (11 days from publication) — per-credit dollar pricing has not been published as of May 21. (3) Paid-plan subscribers using auto-model-selection receive a 10% multiplier discount. Yesterday (May 20), GitHub removed all Gemini models and GPT-5.2 Codex / GPT-5.4 nano from Copilot Chat on the web — scoped to the web surface only, not VS Code, JetBrains, or CLI.
08 — Breakeven AnalysisSubscription vs API: how many tasks per month to justify each tier?
The table below models breakeven task volumes at 100K input / 20K output tokens per task (uncached, base rate). These are modeling assumptions, not vendor-disclosed figures — Anthropic's $0.705 worked example anchors the methodology. The 10-tool cost calculator companion post provides the interactive version of this math. For the full per-task / per-user framework, see our agent cost metrics framework.
20–33 tasks/month to break even
Anthropic Pro at $17–20/mo. Breakeven: ~20 tasks/mo via Opus 4.7 uncached ($1.00/task) or ~33 tasks/mo via Sonnet 4.6 ($0.60/task). 5h-window-capped. Limits doubled May 6.
100–166 tasks/month to break even
Breakeven: ~100 tasks/mo (Opus 4.7) or ~166 tasks/mo (Sonnet 4.6). At 5 tasks/day workday pace, Opus 4.7 breaks even at Max 5×. Heavy users should compare vs direct API + caching.
200–333 tasks/month to break even
Breakeven: ~200 tasks/mo (Opus 4.7) or ~333 tasks/mo (Sonnet 4.6). At ~10 tasks/workday, Opus 4.7 at Max 20× is cost-neutral versus direct API — plus subscription convenience.
Multiplier-aware: ~20 effective Opus prompts
Opus 4.7 at 15×: 300 reqs / 15 = 20 effective Opus prompts. GPT-5.5 at 7.5×: ~40 prompts. Sonnet 4.6 at 1×: 300 prompts. Premium requests are not tasks — convert using multiplier before comparing to API spend.
~200 tasks/mo at Composer 2.5 standard rate
At $0.10/task (100K in / 20K out via Composer 2.5 std), $20 subscription is cost-neutral at ~200 tasks/mo. Composer 2.5 Fast: ~33 tasks/mo breakeven at $0.60/task. First-week 2× promo ends ~May 25.
~707 tasks/mo to break even (intro rate)
At $0.14/task (Grok Build, 100K in / 20K out), $99/mo breaks even at ~707 tasks/month. After the 6-month intro expires at ~$300/mo list, breakeven rises to ~2,143 tasks/month — justify only with high-parallelism use of all 8 sub-agent slots.
At 100K input / 20K output per task, Composer 2.5 standard costs $0.10 — Opus 4.7 uncached costs $1.00. The 10× token-rate gap translates directly into subscription breakeven math: 200 Composer tasks equal the breakeven of 20 Opus tasks at the same monthly spend.Digital Applied synthesis, May 21, 2026
09 — Cross-Vendor PatternThe long-context surcharge pattern: three vendors, same playbook.
The most analytically significant finding in this matrix is the convergence of long-context surcharges across three top vendors — each implemented differently, all charging materially more than the headline rate once a context threshold is crossed. The coverage gap is striking: every major publication reports "1M context" as a headline feature without surfacing the surcharge that applies to it.
Fast Mode — 6× standard rates
Opt-in fast-output mode for Opus 4.6 and 4.7. Full 1M context window at 6× standard rate. Not triggered automatically by context length — must be explicitly selected. No cached-input rate published for Fast mode.
GPT-5.5 272K — 2× input / 1.5× output
Verbatim: 'prompts with >272K input tokens are priced at 2× input and 1.5× output for the full session.' Triggered automatically by context size. The surcharge applies retroactively to the entire session — not just the overflow tokens.
Gemini 3.1 Pro — tiered at 200K
Both input and output rates double above the 200K threshold — a two-leg surcharge versus OpenAI's one-leg (input-only) model. Cached input also doubles: $0.20→$0.40 per Mtok. Cache storage rises to $4.50/hr above 200K.
The strategic implication: teams building long-context agents must model cost at their expected p90 context size, not the baseline rate. An Anthropic agent running Opus 4.7 Fast mode across 1M context costs $30/Mtok input — 6× what the pricing page suggests as the default. A Google agent running Gemini 3.1 Pro against 500K tokens of codebase context costs $4/Mtok input — double the sub-200K rate. Budget for the surcharge, not the headline.
Our Q2 2026 price vs performance efficient frontier charts each model's post-surcharge effective cost against benchmark quality — the picture shifts significantly once surcharge economics are applied.
10 — Industry PatternFour intro promos, three expiring in the next 10 days.
Introductory discounts are now the industry default launch mechanism for AI agent pricing. Four separate promos have overlapped in the May 2026 window — and three of them expire within 10 days of this post. The pattern is consistent: vendors launch with a 2×–7.5× discount to seed adoption, then revert to a materially higher rate that often doubles or triples effective cost per task.
Opus 4.7 on Copilot
Copilot Opus 4.7 launch promo ran at 7.5× multiplier for ~10 days post-launch. On April 30, the promo expired and the multiplier doubled to 15×. Copilot Pro users went from ~40 effective Opus prompts to ~20 overnight.
Composer 2.5 first-week promo
Cursor's first-week double-usage promo for Composer 2.5 ends approximately May 25 — 4 days from publication. After expiry, subscription-included usage reverts to the standard 1× allowance. API token rates remain at $0.50/$2.50.
Codex Pro $100/mo tier
OpenAI's verbatim: 'Double your normal Codex usage on the $100/month tier until May 31, 2026.' Pro 5× reverts to baseline 80–400 GPT-5.5 msgs/5h on June 1. Teams at the $100 tier should plan for 50% message-budget reduction in 10 days.
The fourth promo — SuperGrok Heavy at $99/mo for the first 6 months — has a longer runway but the steepest cliff: the list price of approximately $300/mo (nearly 3× the intro rate) applies after month 6. Teams committing to Grok Build on the basis of the $99 entry price should model their June 2026 budgets at ~$300/mo and ensure the per-task economics justify the investment at full price.
The broader projection: introductory pricing as a launch norm means any vendor-disclosed rate published in the first 90 days of a product launch may be materially lower than the steady-state cost. Budget planning for AI agent infrastructure should use post-promo rates as the planning baseline, with the intro discount treated as a temporary reduction — not the long-term price.
11 — 17-Day TimelineWhat changed in May 2026: eight pricing events, one matrix.
The May 2026 pricing landscape shifted on at least eight distinct events between April 30 and the publication of this post on May 21. No single publication assembled this chronology before today. The sequence below is sourced from primary vendor changelogs, documentation updates, and the batch research files that anchor this post's figures.
Opus 4.7 Copilot promo expired — 7.5× doubled to 15×
GitHub Copilot's launch-period Opus 4.7 multiplier discount ended. The multiplier rose from 7.5× to 15× permanently, cutting effective Opus 4.7 prompts per Pro plan in half overnight.
Anthropic doubled Claude Code 5-hour rate limits
Anthropic announced doubled 5-hour rate limits for Claude Code across Pro, Max, Team, and Enterprise plans — same subscription cost, 2× throughput. Announced alongside a SpaceX enterprise deal.
GitHub switched Business/Enterprise base model to GPT-5.3-Codex
GitHub Copilot for Business and Enterprise shifted the default base-completion model to GPT-5.3-Codex ($1.75/$14 per Mtok), replacing the prior GPT-5.4 default. Changes Copilot's cost structure for completions without premium-request consumption.
Cursor shipped Composer 2.5 at $0.50/$2.50
Composer 2.5 launched at the lowest frontier-capable per-token rate in the matrix. First-week double-usage promo activated. Fast variant at $3/$15 also available.
Google launched Gemini 3.5 Flash GA + Antigravity 2.0 at I/O
Gemini 3.5 Flash reached GA at $1.50/$9 per Mtok with 1.05M context. Antigravity 2.0 (desktop IDE + agy CLI + SDK) announced at I/O. Managed Agents pricing ($0.08/session-hour) entered public preview.
GitHub pulled Gemini + GPT-5.2 Codex / GPT-5.4 nano from Copilot Chat web
All Gemini models and several others removed from Copilot Chat on the web only. VS Code, JetBrains, and CLI surfaces were not affected by the same announcement.
Composer 2.5 first-week promo ends
Cursor's double-usage first-week promo for Composer 2.5 expires approximately May 25. Standard subscription allowance resumes; API token rates unchanged at $0.50/$2.50.
Codex Pro promo expires + Copilot usage-based billing
Codex Pro $100/mo 2× promo ends May 31 — Pro 5× reverts to 80–400 GPT-5.5 msgs/5h. On June 1, Copilot transitions to usage-based billing; per-credit dollar pricing not published as of this post's date.
12 — OSS BYOKCline and Aider: "free" in license, API-spend-bound in practice.
Open-source BYOK tools — Cline (Apache 2.0) and Aider (free and open source) — carry zero license cost and zero subscription overhead. Every dollar of AI spend flows directly to the API provider of your choice at standard published rates, with no markup and no bundled allowance. For teams with the operational maturity to manage API keys and cost attribution, BYOK is the most transparent pricing model in the matrix.
The practical cost profile of an OSS BYOK developer depends entirely on model choice. At our modeled 100K in / 20K out per task:
OSS BYOK effective cost per task at modeled 100K in / 20K out (no cache)
Per-task calculation: (100K × input rate + 20K × output rate) / 1,000,000. Modeling assumptions — not vendor-disclosed.A developer running Cline with Opus 4.7 at 20 substantive tasks per workday would spend approximately $20/day — $400–440/month at a standard work schedule, potentially rising to $540 with the 35% tokenizer overhead on representative code inputs. The Continue.dev Team plan ($20/seat/month with $10 in API credits) partially offsets this, but $10 in credits covers only 10 Opus 4.7 tasks at standard rates.
For teams tracking cost per successful task, the cost-per-successful-task metric framework provides the right unit for OSS BYOK vs subscription comparison — raw per-task cost ignores task success rates, which vary significantly by model and workload type. Our AI transformation practice runs cost-attribution benchmarks across BYOK and subscription models for specific codebases and agent patterns before recommending a tier commitment. The 20-platform agentic coding matrix covers OSS BYOK alongside the full subscription landscape.
Headline token rates are the floor — not the price you actually pay.
The May 2026 AI agent pricing landscape has three structural characteristics that headline coverage consistently misses. First, long-context surcharges from Anthropic, OpenAI, and Google mean the advertised rate is the minimum — the effective cost at production context sizes is 1.5× to 6× higher. Second, four overlapping intro promos are collapsing within 10 days of this post, with rate jumps of 2× to 3× baked in. Third, the Copilot premium-request multiplier system makes per-prompt cost comparison to direct API pricing non-trivial — the unit conversion requires knowing both your model mix and the per-model multiplier.
The 10× token-rate gap between Composer 2.5 standard ($0.50/Mtok) and Opus 4.7 ($5.00/Mtok) represents the widest per-token spread between frontier-capable models in any prior quarter. Whether that gap reflects a genuine capability difference at task-success level — not just benchmark performance — is the question every team should be benchmarking against their own code and agent patterns, not adopting wholesale from vendor-published numbers.
The most durable conclusion from this matrix is methodological: the right unit for AI agent cost planning is effective $/successful-task, not $/Mtok. That unit requires knowing your task success rate by model, your cache hit rate, your typical context distribution, and whether your workload crosses the long-context surcharge threshold. This post has given you all four inputs by vendor — the combination is yours to model against your actual usage pattern.