Cursor Composer 2.5 launched today (May 18, 2026) at $0.50/$2.50 per million tokens standard — on a Kimi K2.5 base with post-training compute that reportedly exceeded the pretraining investment. At face value that makes it roughly one-tenth the per-token cost of Claude Opus 4.7 ($5/$25). But per-Mtok rates alone don't answer the question engineering teams actually need: at what retry count does the cheaper model stop winning on total task cost?

Two hidden levers collapse the 10x headline gap faster than most budgets expect. Anthropic publishes a 0.1x cache-read multiplier — meaning a team running Claude Code with an 80% cache-hit rate drops Opus 4.7's effective per-task cost from $1.00 to roughly $0.46. The Opus 4.7 tokenizer note discloses “up to 35% more tokens for the same fixed text” versus prior models — same headline rate, higher effective cost. And GPT-5.5's 1M context window carries a 2x input / 1.5x output surcharge for the full session above 272K input tokens.

This guide derives every per-task figure from vendor-documented token rates, walks the breakpoint math across 1-to-60 retry scenarios, and names the surcharges that make “cheap” more expensive than the rate card suggests. All per-task dollar amounts are modeled from a 100K-input / 20K-output task base — a mid-point modeling assumption, not a vendor-disclosed figure.

Key takeaways

01
Composer 2.5 standard is roughly 10x cheaper per token than Opus 4.7.At $0.50/$2.50 vs $5/$25 per Mtok, Composer 2.5 standard produces a modeled $0.10/task vs $1.00/task at 100K input / 20K output. The breakeven retry count is 10 with no caching on either side.
02
Anthropic's 0.1x cache-read multiplier shifts the breakpoint by 30-50%.At 80% cache-hit, Opus 4.7's effective per-task cost drops to ~$0.46. That narrows the breakeven from 10 retries to approximately 4-5 retries before Opus 4.7 wins on total cost.
03
Opus 4.7's tokenizer overhead adds up to 35% effective cost vs 4.6.Anthropic discloses that the Opus 4.7 tokenizer 'may use up to 35% more tokens for the same fixed text' compared to prior models. The per-Mtok headline rate is identical to 4.6; the effective per-task cost is not.
04
GPT-5.5's 1M context is not flat-priced above 272K input tokens.OpenAI documents a 2x input / 1.5x output surcharge that applies to the full session for any prompt exceeding 272K input tokens. Opus 4.7's 1M context IS flat-priced — a real under-priced differentiator.
05
For most 5-10 iteration tasks, Composer 2.5 standard wins on total cost.Even against fully-cached Opus 4.7, Composer 2.5 standard wins at typical retry rates. The decision shifts to lock-in tolerance, audit-trail need, and whether flat-rate long-context matters for your workload.

01 — Launch MathComposer 2.5 shipped today — here's the cost math no vendor publishes.

Cursor launched Composer 2.5 on May 18, 2026. Per the Cursor launch blog, the standard variant is priced at $0.50/M input and $2.50/M output tokens — verbatim from the post. A faster variant with “the same intelligence” runs at $3.00/$15.00 per Mtok, which Cursor explicitly positions as “a lower cost than the fast tiers of other frontier models.” The Cursor changelog adds that Composer 2.5 includes double usage for the first week as a launch promotion.

Composer 2.5 is built on a Kimi K2.5 base. Cursor's earlier Composer 1.5 disclosure — “the total compute invested in post-training even surpasses the amount used to pretrain the base model” — provides the structural reason why Cursor can price at 1/10th of frontier-API rates. The cost basis is RL-on-an-open-weight-base, not from-scratch pretraining. That's a fundamentally different COGS structure from Anthropic or OpenAI. For background on the Composer 1.5 post-training approach, that earlier guide covers the RL-at-scale mechanics.

The gap between $0.50 and $5 per Mtok input is obvious. What's not obvious is how quickly it narrows when you account for cache discipline, tokenizer inflation, and context-window surcharges. The following sections derive each of those adjustments from vendor-published documentation.

02 — Pricing RowHeadline $/Mtok rates across all comparators.

The table below shows input rates as published by each vendor as of May 18, 2026. All figures are from Anthropic's pricing documentation, OpenAI's pricing docs, and the Cursor launch post. These are input rates only; output rates follow a different multiplier per model. The Q2 2026 LLM API pricing index tracks full in/out rate pairs across the market.

Input token rates by model · $/Mtok · lower is cheaper

Sources: Anthropic pricing docs, OpenAI pricing docs, Cursor blog — May 18, 2026

Composer 2.5 standardCursor · Kimi K2.5 base · launched May 18, 2026

$0.50/Mtok in

GPT-5.3-codexOpenAI · dedicated Codex SKU · $14/Mtok output

$1.75/Mtok in

GPT-5.4OpenAI · standard · $15/Mtok output

$2.50/Mtok in

Sonnet 4.6Anthropic · standard · $15/Mtok output

$3.00/Mtok in

Composer 2.5 FastCursor · faster variant · $15/Mtok output

$3.00/Mtok in

Opus 4.7 standardAnthropic · 1M context flat-priced · $25/Mtok output

$5.00/Mtok in

GPT-5.5 standardOpenAI · 2x/1.5x surcharge above 272K input · $30/Mtok output

$5.00/Mtok in

Opus 4.7 FastAnthropic · 6x standard · $150/Mtok output

$30.00/Mtok in

GPT-5.5-proOpenAI · $180/Mtok output · same 272K surcharge rule

$30.00/Mtok in

A few patterns worth noting before the per-task math. First, Composer 2.5 Fast ($3.00 in) and Sonnet 4.6 ($3.00 in) sit at identical headline input rates — but Sonnet 4.6 has a documented cache multiplier and Composer 2.5 Fast does not publish one. Second, Opus 4.7 Fast ($30 in) and GPT-5.5-pro ($30 in) share the same input headline; GPT-5.5-pro's long-context surcharge means it can run materially more expensive in practice on large contexts. Third, GPT-5.3-codex at $1.75 in / $14 out is the value SKU for Codex-specific workflows — unusually high output-to-input ratio compared to the rest of the field.

03 — Base ModelThe 100K-in / 20K-out modeling assumption.

Per-task cost requires a token volume assumption. This calculator uses 100,000 input tokens and 20,000 output tokens as the base case — a mid-point estimate for a 10-iteration agentic coding task on a single feature. It is explicitly a modeling assumption, not a vendor-disclosed figure. Real agent-loop consumption varies 5-10x across task types and loop structures depending on context window usage, tool-definition tokens, and reasoning verbosity.

For reference, Anthropic's own worked example uses 50K input / 15K output for a one-hour coding session — a smaller task than our base case. The per-loop modeling literature (5-15K input + 1-3K output per iteration) suggests a 10-iteration task maps to roughly 50-150K input and 10-30K output tokens. Our 100K/20K choice sits at the mid-range and keeps the math clean. Every per-task figure in this post scales linearly from this base; plug in your own observed volumes to adjust.

Anthropic also discloses that tool definitions carry a per-request overhead: tool-use system prompts add 313-346 tokens per request for auto/any/tool choice on Claude 4.x models, plus additional tokens for each tool definition schema. In a 10-iteration loop with 10 tool definitions, that overhead can add 3,000-5,000 input tokens before any task context is loaded. This is invisible in $/Mtok comparisons that don't adjust for loop count — and it applies to Anthropic models but has no equivalent published disclosure from Cursor or OpenAI.

The table below shows per-task costs at the 100K-in / 20K-out base case across all comparators, using headline rates with no caching or surcharges. Composer 2.5 cache pricing is not publicly documented by Cursor; those cells are left blank rather than estimated. See the agent token-budget calculator framework for a methodology to model your own loop volumes before committing to a vendor.

04 — Anthropic AnchorThe $0.705 worked example as our calibration point.

Anthropic's pricing documentation contains the one vendor-anchored per-session cost figure in this comparison. It reads, verbatim: “A one-hour coding session using Claude Opus 4.7 that consumes 50,000 input tokens and 15,000 output tokens: Input tokens 50,000 × $5 / 1,000,000 = $0.25 + Output tokens 15,000 × $25 / 1,000,000 = $0.375 + Session runtime 1.0 hour × $0.08 = $0.08 = Total $0.705.”

Two important caveats apply to this worked example. First, the $0.08/session-hour session-runtime charge is a Managed Agents SKU — it applies only to Claude Managed Agents, not to Claude Code, raw API calls, or any other Anthropic surface. If you're running raw API or Claude Code, drop the $0.08 line from the calculation. Second, the 50K input / 15K output example is roughly half our modeled 100K/20K base — at standard Opus 4.7 rates, our 100K/20K task maps to $1.00 in token costs (not including any session-hour charge).

Applying the same math to Composer 2.5 standard at 100K/20K: 100K × $0.50/Mtok + 20K × $2.50/Mtok = $0.05 + $0.05 = $0.10 per task. Against Opus 4.7's $1.00, that is exactly 10x. Against the Anthropic 50K/15K session: 50K × $0.50/Mtok + 15K × $2.50/Mtok = $0.025 + $0.0375 = $0.0625 per session — versus Opus 4.7's $0.625 in pure token costs (excluding the Managed Agents runtime). The ratio holds across both task sizes.

Vendor-anchored calibration

Anthropic's documented worked example: 50K input / 15K output on Opus 4.7 standard = $0.625 in token costs (+$0.08 Managed Agents runtime = $0.705 total). This is the only vendor-anchored per-session total in this comparison. Source: Anthropic pricing documentation. The $0.08/session-hour runtime charge applies only to Managed Agents — not Claude Code or raw API.

05 — Cache LeverCache-hit rate as the hidden 30-50% lever.

Anthropic's pricing documentation sets three cache multipliers: a 5-minute cache write at 1.25x base input price, a 1-hour cache write at 2x base input price, and a cache read (hit) at 0.1x base input price. That 0.1x read multiplier is the lever most cost analyses ignore. When 80% of a session's input tokens are cache reads, the effective input cost drops to roughly 20% of the headline rate.

Anthropic's own worked example confirms the magnitude: the same one-hour Opus 4.7 session with 40,000 of its 50,000 input tokens as cache reads totals $0.525 — a 25.5% reduction from the $0.705 uncached figure. Extrapolating to our 100K/20K base with 80% cache hit (80K cache reads + 20K new tokens): effective input cost = (80K × $0.5/Mtok) + (20K × $5/Mtok) = $0.04 + $0.10 = $0.14, plus output cost 20K × $25/Mtok = $0.50. Total $0.46/task at 80% cache hit.

Cursor publishes no cache pricing for Composer 2.5. The only “discount” Cursor advertises is the first-week double-usage promo — a free-credit promotion, not a per-token cache lever. This is a transparency gap that matters for long-running agent loops where context reuse would otherwise be a significant cost reducer.

Anthropic also discloses that Batch API and prompt caching discounts can be combined. A cached batch call costs 0.1x × 0.5x = 0.05x standard input on cache reads — effectively 95% cheaper than uncached synchronous input for high-hit-rate workloads.

0% cache hit

No caching (baseline)

Opus 4.7: $1.00/task. Sonnet 4.6: $0.60/task. Multiplier vs Composer 2.5 standard: 10x (Opus), 6x (Sonnet). The headline rate gap at its widest.

Composer 2.5 wins

20% cache hit

Light context reuse

Opus 4.7: ~$0.88/task. Sonnet 4.6: ~$0.53/task. Multiplier vs Composer 2.5 standard: ~8.8x (Opus), ~5.3x (Sonnet). Modest improvement from caching.

Composer 2.5 wins

40% cache hit

Moderate reuse

Opus 4.7: ~$0.70/task. Sonnet 4.6: ~$0.42/task. Multiplier vs Composer 2.5 standard: ~7x (Opus), ~4.2x (Sonnet). Anthropic-anchored: $0.525 total on the 50K/15K example.

Composer 2.5 wins

60% cache hit

Strong reuse

Opus 4.7: ~$0.58/task. Sonnet 4.6: ~$0.35/task. Multiplier vs Composer 2.5 standard: ~5.8x (Opus), ~3.5x (Sonnet). Cache discipline narrows the gap substantially.

Composer 2.5 wins

80% cache hit

High discipline

Opus 4.7: ~$0.46/task. Sonnet 4.6: ~$0.28/task. Multiplier vs Composer 2.5 standard: ~4.6x (Opus), ~2.8x (Sonnet). Breakeven at ~5 retries for Opus 4.7.

Composer 2.5 still wins

95% cache hit

Near-optimal reuse

Opus 4.7: ~$0.37/task. Sonnet 4.6: ~$0.22/task. Multiplier vs Composer 2.5 standard: ~3.7x (Opus), ~2.2x (Sonnet). Best-case caching still leaves Composer 2.5 cheaper per single task.

Composer 2.5 wins single task

Cache-hit rates of 40% and 80% are illustrative modeling assumptions — Anthropic publishes the multiplier but the hit rate is entirely workload-dependent. A team running Claude Code on a large stable codebase may see 60-80% cache hits; a team running fresh agentic tasks on novel inputs may see 10-20%. Make the cache-hit assumption explicit in any cost model you build.

06 — Tokenizer TrapOpus 4.7 uses up to 35% more tokens vs 4.6 — same rate, different effective cost.

Buried in Anthropic's Opus 4.7 announcement and repeated in the pricing documentation is this disclosure: “Opus 4.7 uses a new tokenizer compared to previous models, contributing to its improved performance on a wide range of tasks. This new tokenizer may use up to 35% more tokens for the same fixed text.” The per-Mtok rate is identical to Opus 4.6 at $5/$25. The effective per-task cost is not.

What this means in practice: if you have historical Opus 4.6 cost data and are migrating a workload to Opus 4.7, your cost may increase by up to 35% before any rate difference — because the same system prompt, tool definitions, and context window will tokenize into more tokens on the new model. On our modeled 100K/20K base case, a 35% tokenizer inflation produces an effective volume of 135K input / 27K output, for a per-task cost of 135K × $5/Mtok + 27K × $25/Mtok = $0.675 + $0.675 = $1.35/task — 35% above the $1.00 headline figure.

This is a worst-case scenario and the actual overhead will vary by content type (code and structured data tend to tokenize differently from natural language). But it is vendor-disclosed, not speculative. Any cost comparison between Opus 4.6 and Opus 4.7 that uses the same token volume for both models is underestimating Opus 4.7's true cost by up to 35%.

Headline $/Mtok rates undersell the true cost shift when the tokenizer itself inflates volume. The 35% overhead is vendor-disclosed, not a modeling assumption.Digital Applied synthesis, May 18, 2026

07 — Long ContextGPT-5.5 above 272K input is priced at 2x/1.5x for the full session.

The OpenAI GPT-5.5 model page documents a 1,050,000-token context window with 128,000 max output tokens. The same page includes this verbatim rule: “prompts with >272K input tokens are priced at 2x input and 1.5x output for the full session.” The phrase “for the full session” is load-bearing. Once you cross the 272K threshold, every token in that session — including the tokens you added before crossing it — is billed at the surcharge rate.

On a 600K-token agentic session: at standard rates, that's 600K × $5/Mtok = $3.00 in input costs. With the 2x surcharge triggered: $6.00 input. The output side escalates faster — 1.5x on $30/Mtok output = $45/Mtok for sessions that cross the threshold. GPT-5.5-pro hits the same rule: $30/Mtok standard input becomes $60/Mtok above 272K. On a 600K-token session with pro-tier: input costs jump from $18.00 to $36.00.

Contrast this with Anthropic's positioning. Anthropic's documentation states explicitly: “Opus 4.7, Opus 4.6, and Sonnet 4.6 include the full 1M token context window at standard pricing. A 900k-token request is billed at the same per-token rate as a 9k-token request.” That flat-rate 1M context is an under-priced differentiator — one the “GPT-5.5 has 1M context” headlines typically omit. For agentic coding workflows that routinely carry large context windows, the effective cost comparison between GPT-5.5 and Opus 4.7 changes materially above 272K input tokens.

The 272K trap in numbers

GPT-5.5 at 600K input tokens: standard rate would be $3.00 input. With 2x surcharge triggered: $6.00 input. GPT-5.5-pro at the same volume: $18.00 standard → $36.00 with surcharge. Opus 4.7 at 600K input: $3.00 — flat-priced, no surcharge. Source: OpenAI GPT-5.5 model page and Anthropic pricing docs.

For a deeper look at the GPT-5.5 context and pricing architecture, the GPT-5.5 vs Opus 4.7 head-to-head covers the quality side of the comparison. This post is the cost side.

08 — BreakpointAt what retry count does cheap stop winning?

The breakeven formula is simple: if Composer 2.5 standard needs N retries to complete what Opus 4.7 does in 1, the per-task costs equalize at N = (Opus 4.7 per-task cost) / (Composer 2.5 per-task cost). At the modeled 100K/20K base with no caching on either side, that is $1.00 / $0.10 = 10 retries. Below 10 retries, Composer 2.5 wins on total cost. Above 10, Opus 4.7 wins.

Published anecdotes on production coding agents suggest cheaper models need 1.2-2x more retries on hard tasks, not 5-10x more. Under typical retry inflation, Composer 2.5 standard wins on cost in nearly every scenario we modeled. The exception is fully-cached Opus 4.7 at high iteration counts — and even there, the breakeven is below typical production retry rates.

With 40% cache hit on Opus 4.7 (no cache on Composer 2.5): N = $0.70 / $0.10 = 7 retries. With 80% cache hit: N = $0.46 / $0.10 = approximately 4.6 retries. For workloads where Opus 4.7 Fast mode is in play (no cache, $6.00/task): N = $6.00 / $0.10 = 60 retries. Against GPT-5.5-pro (no cache, $6.60/task): N = 66 retries.

The cost-per-successful-task metric is the right frame for this calculation — not per-token rates in isolation. A model that completes a task on the first try at $1.00 may be cheaper than a model that costs $0.10 per attempt but needs 12 tries.

No-cache baseline

Opus 4.7 vs Composer 2.5 std

10retries

Breakeven at 10 retries with no caching on either model. At 100K in / 20K out: Composer $0.10/task vs Opus $1.00/task. Modeled basis, not vendor-disclosed.

No cache on either side

80% cache hit

Opus 4.7 at 80% cache

~5retries

80% cache hit drops Opus 4.7 to ~$0.46/task. Breakeven at ~4.6 retries. Best-case Anthropic cache discipline still leaves Composer 2.5 winning on single-task cost.

Anthropic 0.1x cache-read

Fast mode barrier

Opus 4.7 Fast, no cache

60retries

Opus 4.7 Fast at $30/$150 per Mtok = $6.00/task at our modeled volume. Breakeven vs Composer 2.5 standard: 60 retries. Fast mode is rarely the cost-optimal choice for high-volume loops.

Opus 4.7 Fast vs Composer std

Pro tier ceiling

GPT-5.5-pro, no cache

66retries

GPT-5.5-pro at $30/$180 per Mtok = $6.60/task. Breakeven vs Composer 2.5 standard: 66 retries. Same surcharge rule applies above 272K input — effective cost rises further in long-context loops.

GPT-5.5-pro vs Composer std

09 — VerdictWhen each model wins the cost argument.

The breakpoint math above gives the retry-count conditions. The routing decision also depends on workload type, lock-in tolerance, audit-trail requirements, and whether flat-rate long-context matters — factors that don't appear in $/Mtok comparisons. The matrix below summarizes when each model cluster earns its cost.

For teams beginning an AI transformation engagement, the most common mistake is selecting a model family on headline rates before auditing actual loop structures. Token volume per task, cache-hit rate, and retry distribution should be measured on representative prompts before any cost model is finalized.

Cost-sensitive scaffolding

Route to Composer 2.5 standard

$0.50 in / $2.50 out · no cache pricing published

For high-volume agentic loops where cost per task dominates and retry tolerance is moderate (under 10 per task), Composer 2.5 standard wins on total cost even against cached Opus 4.7. First-week double-usage promo extends the runway further.

Best cost per task at typical retry rates

1M-context architecture

Route to Opus 4.7 standard

$5 in / $25 out · flat-priced to 1M tokens

The only model in this comparison with flat-priced 1M context and a documented cache-read multiplier (0.1x). For tasks that routinely exceed 272K input tokens, Opus 4.7 can be cheaper than GPT-5.5 despite the higher headline rate. Add 80% cache discipline and effective cost drops to ~$0.46/task.

Best for long-context + cache-heavy workloads

Codex-specific SKU work

Route to GPT-5.3-codex ($1.75/$14)

$1.75 in / $14 out · $0.175 cached input

The dedicated Codex SKU at $1.75 in / $14 out sits between Composer 2.5 standard and Sonnet 4.6 on input cost. High output-to-input ratio makes it expensive for code-generation tasks; best fit is code-understanding or analysis workflows with lower output volumes.

Best for Codex-native workflows at mid-tier cost

Terminal / shell tasks

Consider GPT-5.5 (TBench wins despite cost)

$5 in / $30 out · 2x/1.5x above 272K input

GPT-5.5 benchmarks strongly on terminal-command and shell-script tasks. If your workload stays under 272K input tokens per session, the standard rate matches Opus 4.7 on input but outputs are 20% more expensive per token. Above 272K, the full-session surcharge makes it significantly more expensive.

Only viable under 272K input per session

One cross-reference worth reviewing before finalizing any routing decision: the Composer 2.5 launch guide covers the intelligence and benchmark side of today's release. This post covers the cost side. They are companion reads.

Conclusion

The 10x gap shrinks fast — but Composer 2.5 still wins most typical workloads.

The headline 10x cost gap between Composer 2.5 standard ($0.10/task) and Opus 4.7 standard ($1.00/task) shrinks fast when you apply cache discipline. Anthropic's documented 0.1x cache-read multiplier, combined with a realistic 80% cache-hit rate, drops Opus 4.7's effective per-task cost to roughly $0.46 — a 4.6x gap, not 10x. The Opus 4.7 tokenizer overhead (up to 35% more tokens for the same fixed text) adds up to another 35% on top of that. Neither adjustment appears in the headline $/Mtok comparison most teams run.

The hidden surcharges matter more than the rate card. GPT-5.5's 1M context claim is real — but above 272K input tokens, the 2x input / 1.5x output surcharge applies to the full session, not just the tokens above the threshold. Opus 4.7's 1M context is genuinely flat-priced across the full window. For agentic coding loops that carry large context windows (codebases, extended reasoning chains, full repository scans), that distinction changes the effective cost comparison significantly. The per-token headline makes GPT-5.5 look price-equivalent to Opus 4.7; the surcharge rule makes it materially more expensive above 272K.

For most agentic coding workloads — five to ten iterations per task, moderate context windows, mixed cache-hit rates — Composer 2.5 standard wins on total cost even against fully-cached Opus 4.7. The decision narrows to three non-cost factors: lock-in tolerance to a Cursor-hosted model, audit-trail and compliance needs that favor direct API access, and whether flat-rate long-context is an architectural requirement. If none of those apply, the per-task math is clear. For workloads where they do apply, the Opus 4.7 cost strategy guide covers the full cache and context optimization playbook.

Agent Coding Cost: Composer 2.5 vs Opus vs GPT-5.5

01 — Launch MathComposer 2.5 shipped today — here's the cost math no vendor publishes.

02 — Pricing RowHeadline $/Mtok rates across all comparators.

Input token rates by model · $/Mtok · lower is cheaper

03 — Base ModelThe 100K-in / 20K-out modeling assumption.

04 — Anthropic AnchorThe $0.705 worked example as our calibration point.

05 — Cache LeverCache-hit rate as the hidden 30-50% lever.

No caching (baseline)

Light context reuse

Moderate reuse

Strong reuse

High discipline

Near-optimal reuse

06 — Tokenizer TrapOpus 4.7 uses up to 35% more tokens vs 4.6 — same rate, different effective cost.

07 — Long ContextGPT-5.5 above 272K input is priced at 2x/1.5x for the full session.

08 — BreakpointAt what retry count does cheap stop winning?

Opus 4.7 vs Composer 2.5 std

Opus 4.7 at 80% cache

Opus 4.7 Fast, no cache

GPT-5.5-pro, no cache

09 — VerdictWhen each model wins the cost argument.

Route to Composer 2.5 standard

Route to Opus 4.7 standard

Route to GPT-5.3-codex ($1.75/$14)

Consider GPT-5.5 (TBench wins despite cost)

The 10x gap shrinks fast — but Composer 2.5 still wins most typical workloads.

Cost-optimized AI architecture starts with the right per-task math.

AI cost modeling engagements

The questions every engineer asks about agent cost math.

Continue exploring AI cost strategy.

AI Agent Stack Decision Tree: Team Routing 2026 Guide

AI Agent Pricing Landscape: May 2026 Tier Comparison

Cursor Composer 2.5: Agent Coding at 1/10 the Cost

Grok 4.5 vs Opus 4.8 vs GPT-5.5: Which Model Wins?