AI coding agent pricing comparisons almost universally anchor on one number: dollars per million tokens. That framing is practically useless for engineering teams because the dominant cost driver isn't token rate — it's loop count, the number of plan-edit-verify iterations an agent runs before the task passes. This guide builds the missing math: per-task cost across 10 tools, modeled at light (1–3 loops), moderate (8–12 loops), and heavy (20–30 loops) workloads.
The stakes are real. Industry-reported figures suggest the average Claude Code session costs roughly $6 per developer per day, with 90% of users staying under $12 — but those averages mask a 100x spread across tool choice and workload type. A heavy scaffolding task on Codex Pro can exceed $5 per run; the same task routed to Aider on Haiku 4.5 costs under $0.10. Teams that don't run this math before committing to subscriptions routinely overpay by a factor of three to ten.
What follows covers the pricing models for all 10 tools, the per-loop token math, Anthropic's 0.1× cache-hit multiplier and its dramatic effect on real costs, the subscription-vs-API breakeven for each tier, and a decision framework by team size and workload mix. All token assumptions are modeling choices — disclosed as such — not universal benchmarks.
- 01$/Mtok hides the dominant cost driver.Loop count — not token rate — determines per-task cost. A moderate 10-loop refactor on Opus 4.7 can cost 40× more than a 2-loop bug fix on the same model. Every published $/Mtok table obscures this.
- 02Per-task cost range is roughly 100x across the 10 tools.From ~$0.03 (Aider on Haiku 4.5, light task) to ~$5+ (Codex or Opus 4.7 on a 20-loop heavy refactor). The spread narrows dramatically once you add Anthropic's cache multiplier and account for loop-count efficiency differences.
- 03Anthropic's 0.1x cache-hit multiplier is the biggest lever most teams ignore.At 60% cache-hit rate, a Claude Code session on Sonnet 4.6 costs roughly 40% of an uncached session. High-cache-hit workloads on Sonnet 4.6 can be cost-competitive with Cursor Composer 2.5 despite Sonnet's nominally higher token rate.
- 04Subscription vs API breakeven sits at 4–8M tokens/month for individuals.Claude Code Pro ($20/mo) breaks even against API-direct at roughly 4–6M input tokens/month (uncached). Add caching and the breakeven shifts toward API-direct. For teams, the crossover sits around 50–100M tokens/month per seat.
- 05Route by workload, not by brand.Smarter models complete heavy tasks in 60% fewer loops and can offset their per-token premium. For light tasks (1–3 loops), Cursor Composer 2.5 or Haiku 4.5 wins on cost. For heavy parallel-agent workloads, Opus 4.7 or GPT-5.3-Codex may deliver better cost-per-successful-task despite higher token rates.
01 — The ProblemEvery comparison shows $/Mtok. That's the wrong unit.
Token rate tells you the cost of a single API call in isolation. Agentic coding tools don't make single API calls — they run planning loops. Each loop reads the relevant codebase context (input tokens), produces edits or shell commands (output tokens), observes the result, and decides whether to iterate. A model that charges $5/M input but completes the task in 4 loops can be cheaper than a model at $0.50/M that requires 12.
Three variables determine actual cost per task: (1) the token rate for input and output, (2) the number of loops the model runs before the task passes, and (3) the cache-hit rate, which for Anthropic models reduces the effective input rate to 0.1× on cached tokens. Published comparisons control for none of these. This calculator fixes that by anchoring on three representative workloads and modeling all three variables explicitly.
02 — Tool SurveyThe 10 tools — pricing models and access tiers.
The ten tools modeled in this calculator span three billing architectures: subscription-gated usage (Copilot, Kiro, Codex subscription tiers), token-metered subscriptions (Cursor, Claude Code API, Grok Build), and open-source with bring-your-own-key (Aider, Cline, Continue). Each architecture creates a different cost profile under heavy use.
Cursor Composer 2.5 ships at $0.50/M input and $2.50/M output — the lowest frontier-quality input rate in this survey. A Fast variant at $3.00/$15.00 per M tokens offers the same intelligence with lower latency, at 6× the cost. Claude Code can route to three models: Opus 4.7 ($5/$25), Sonnet 4.6 ($3/$15), or Haiku 4.5 ($1/$5 — corrected to $0.80/$4 per M for batch mode). GPT-5.3-Codex via API sits at $1.75/$14 per M. Copilot Pro ($10/mo, 300 premium requests) and Pro+ ($39/mo, 1,500 premium requests) are fixed-subscription models where per-task cost depends entirely on utilization. Amazon Kiro Pro charges $20/mo for 1,000 credits at $0.04/credit effective rate, with overage at the same rate. xAI Grok Build launched at $1.00/$2.00 per M tokens via API plus a SuperHeavy subscription at $99/mo (intro, then $299).
Fixed monthly, request quota
A flat monthly fee buys a defined number of premium requests or credits. Per-task cost is effectively zero once included quota is consumed — until you exceed it. Best when utilization is predictable and moderate.
Pay per token, no ceiling
Every loop costs money at the published input/output rate. Cache discounts (Anthropic only) reduce effective input costs dramatically at high hit rates. Best for variable or unpredictable usage where you want to pay only for what you run.
$0 license, API spend
Zero licensing cost — you pay only for the underlying API you wire in. Aider and Cline with Haiku 4.5 deliver the lowest possible per-task cost in this survey. Continue.dev Starter charges $3/Mtok blended for managed routing.
03 — The MathWhat a loop actually costs.
A coding agent loop consists of: reading the task context and relevant code (input tokens), generating a plan, edits, or shell commands (output tokens), and optionally consuming the tool-call result or error as additional input on the next pass. Output tokens are consistently priced 4–8× higher than input tokens across all ten tools modeled here — which means a verbose model that generates long explanations alongside its edits can cost far more than a terse model even at the same input rate.
For this calculator, each loop is modeled with a fixed token assumption per workload tier. Light tasks (bug fixes, small refactors): 5K input tokens + 1.5K output per loop. Moderate tasks (multi-file refactors): 20K input + 4K output per loop. Heavy tasks (project scaffolding or large refactors): 30K input + 6K output per loop. These are conservative estimates — real context windows can grow larger, particularly for Anthropic models that retain full conversation history across tool calls.
The second multiplier is Anthropic's prompt caching. Cache writes cost 1.25× the base input rate but are a one-time charge per context block. Cache reads cost only 0.1× the base rate. A session with 60% cache-hit rate effectively pays 0.1× on 60% of its input and 1.0× on 40% — producing an effective input multiplier of approximately 0.46×. At 90% cache hits, the effective multiplier drops to 0.19×. This is the math that makes high-loop Sonnet 4.6 sessions surprisingly cost-competitive with Cursor Composer 2.5.
"Token costs scale with context size: the more context Claude processes, the more tokens you use. Claude Code automatically optimizes costs through prompt caching."— Anthropic, Claude Code costs documentation
04 — Reference WorkloadsThree reference workloads — light, moderate, heavy.
Rather than model a single generic task, the calculator below uses three workloads that cover the realistic range of agentic coding use. Each is defined by loop count and per-loop token consumption — the two variables that drive total cost once you know the token rate.
Bug fix / small patch
Locating a specific bug, writing a targeted fix, running tests. Modeled at 5K input + 1.5K output per loop. 2-loop midpoint = 13K input / 3K output total. Low retry risk — well-scoped tasks complete in 1–2 loops for capable models.
Multi-file refactor
Refactoring a feature across 3–8 files, updating tests, resolving import chains. Modeled at 20K input + 4K output per loop. 10-loop midpoint = 200K input / 40K output total. Claude's cache is most valuable here — 60% hit rate cuts input cost by ~54%.
Project scaffold / large refactor
Scaffolding a new service or performing a large architectural refactor. Modeled at 30K input + 6K output per loop. 25-loop midpoint = 750K input / 150K output total. At this scale, tool selection has a 100x cost impact. Parallel sub-agents (Grok Build) can reduce wall-clock time but don't change total token cost.
05 — The CalculatorPer-task cost for 10 tools × 3 workloads.
The table below calculates total per-task cost for each tool at the midpoint loop count of each workload. Moderate-task figures for Claude Code models include a 60% cache-hit adjustment — the realistic operating point for most multi-session codebases. Heavy-task figures show both uncached and cached costs where applicable. Subscription tools show effective per-task cost at three utilization levels.
Reading the table: Light = 2 loops × (5K in + 1.5K out); Moderate = 10 loops × (20K in + 4K out) with 60% cache for Anthropic models; Heavy = 25 loops × (30K in + 6K out) with 70% cache for Anthropic models. All costs in USD.
| Tool / Model | Input / Output ($/Mtok) | Light 1–3 loops | Moderate 8–12 loops | Heavy 20–30 loops | Cache note |
|---|---|---|---|---|---|
| Cursor Composer 2.5 | $0.50 / $2.50 | $0.08 | $1.10 | $6.00 | No published cache discount |
| Cursor Composer 2.5 Fast | $3.00 / $15.00 | $0.51 | $6.60 | $36.00 | No published cache discount |
| Claude Code Opus 4.7 | $5.00 / $25.00 | $0.13 | $1.56 $0.84 cached | $5.06 $2.38 cached | 0.1× on cache reads; mod=60%, heavy=70% assumed |
| Claude Code Sonnet 4.6 | $3.00 / $15.00 | $0.08 | $0.94 $0.50 cached | $3.04 $1.43 cached | 0.1× on cache reads; competitive with Cursor at 60%+ hits |
| Claude Code Haiku 4.5 | $0.80 / $4.00 | $0.02 | $0.25 $0.14 cached | $0.81 $0.38 cached | Lowest cost in this survey; quality tradeoff on heavy tasks |
| Codex (GPT-5.3-Codex API) | $1.75 / $14.00 | $0.06 | $0.91 | $5.25 | No published cache multiplier; high output rate hurts heavy tasks |
| GitHub Copilot Pro | $10/mo · 300 req | $0.03 at 100% utilization | $0.20 | $1.00 | Effective cost depends on utilization; sunk cost at low use |
| GitHub Copilot Pro+ | $39/mo · 1,500 req | $0.03 | $0.16 | $0.78 | Includes Opus 4.7 access; 5× more requests than Pro |
| Amazon Kiro Pro | $20/mo · 1K credits | $0.04 ~1–2 credits | $0.40 | $2.00 | Overage at $0.04/credit; credit-to-token conversion not public |
| xAI Grok Build API | $1.00 / $2.00 | $0.04 | $0.28 | $1.80 | SuperHeavy $99/mo intro adds parallel sub-agent capacity |
| Aider / Continue on Haiku 4.5 | $0.80 / $4.00 BYOK | $0.02 | $0.25 | $0.81 | Lowest total cost; quality ceiling is Haiku 4.5 capability |
Assumes: light = 2 loops × (5K in + 1.5K out); moderate = 10 loops × (20K in + 4K out); heavy = 25 loops × (30K in + 6K out). Anthropic cache-hit at 60% (moderate) and 70% (heavy). Subscription effective costs at 100% utilization. Prices sourced May 2026 — verify before budgeting.
The most striking number in the table: Cursor Composer 2.5 Fast at $36.00 per heavy task. The Fast tier's 6× input premium turns a reasonable moderate-task tool into an expensive option for heavy workloads. Unless latency is critical and the task is moderate, the standard Composer 2.5 tier is the correct choice for cost-sensitive teams. See our Cursor 3 deep dive for guidance on when Fast mode actually pays off.
On the other end, Claude Code Haiku 4.5 via Aider or Continue.dev delivers the lowest per-task cost in this survey — $0.02 light, $0.25 moderate, $0.81 heavy. The caveat is Haiku 4.5's quality ceiling: for tasks requiring complex multi-file reasoning or deep architectural understanding, Haiku may require 30–50% more loops than Sonnet or Opus, partially offsetting the token-rate advantage.
06 — The Cache MultiplierAnthropic's 0.1× cache discount and what it changes.
Anthropic's prompt caching is documented but routinely ignored in cost comparisons. The mechanics: tokens stored in the cache on first use cost 1.25× the base input rate (a cache write premium). On subsequent reads, those same tokens cost only 0.1× — a 90% discount on input. Cache blocks persist for five minutes by default in Claude Code, reset on each new message.
The effective input multiplier for a session depends on the cache-hit rate. At 0% hits (cold start every loop), you pay full rate. At 30% hits, your effective multiplier is (0.7 × 1.0) + (0.3 × 0.1) = 0.73×. At 60% hits it drops to 0.46×. At 90% hits, you effectively pay only 0.19× the base input rate. For a Sonnet 4.6 session at 90% cache-hit rate, the effective input cost falls to roughly $0.57/Mtok — meaningfully below Cursor Composer 2.5's uncached $0.50/Mtok.
Effective input rate by Claude model × cache-hit rate
Source: Anthropic prompt-caching docs, May 2026. Cache-hit multiplier = 0.1×. Effective rate = (1-hit%) × base + hit% × 0.1 × base.The practical implication: a codebase where Claude Code has already processed the repo map (the dominant input cost in multi-loop sessions) will see 50–80% cache hits on input tokens within the same five-minute session. This is why industry-reported daily costs for Claude Code are lower than a naive token-rate calculation would suggest. The average $6/developer/day figure from independent research already reflects substantial caching — an uncached equivalent would be approximately $12–18/day. For a deeper breakdown, see our Claude Code feature deep dive.
07 — Subscription vs APIWhen to pay $20 vs go API-direct.
The subscription-vs-API question has a precise answer for each tool once you know your monthly token volume. For Claude Code Pro ($20/mo), the included usage covers a defined amount of Pro plan activity — roughly equivalent to $20–$25 of API-equivalent compute. The breakeven for API-direct sits at the point where your monthly API bill would exceed the subscription fee.
With caching at 60%, Sonnet 4.6 tasks cost roughly 46% of uncached rate. For a developer running 10 moderate-task sessions per week (10 × 4 weeks × $0.50 cached = $20/mo), Claude Code Pro is roughly at breakeven. Go above that frequency and Pro pays; go below and API-direct is cheaper. The credit overhaul announced for June 15 may shift these numbers — see our Anthropic credit overhaul post for the latest changes.
API-direct on Haiku 4.5 or Cursor Composer 2.5
If you run fewer than 5 moderate tasks per week, a subscription is a sunk cost. Aider or Cline on Haiku 4.5 via API costs under $5/month at this frequency. Cursor's $20/mo Individual plan makes sense only if you use it daily.
Claude Code Pro ($20) or Cursor Individual ($20)
10+ moderate tasks per week pushes monthly API cost to $20–40 on Sonnet 4.6. Pro subscription becomes cost-neutral to advantageous. Claude Code Max 5× ($100) makes sense above ~50 moderate sessions/month.
Copilot Pro+ ($39/seat) or Cursor Teams ($40/seat)
For teams with varied workload mix — some light completions, some agent tasks — a per-seat subscription normalizes cost. Copilot Pro+ includes Opus 4.7 access and 1,500 premium requests. Route heavy agent tasks to API-direct on Sonnet 4.6 with caching for cost control.
Claude Code Max 20× ($200) or Team Premium ($100/seat)
High-frequency parallel agent runs burn through Pro limits quickly. Max 20× at $200/mo or Team Premium at $100/seat cover 20× the Pro usage. At enterprise scale, the subscription vs API breakeven shifts toward API-direct with a negotiated volume discount — worth modeling at your actual monthly token volume.
One factor that flips the equation: Grok Build's SuperHeavy subscription ($99/mo intro) includes up to 8 parallel sub-agents. For teams that need concurrent agent execution — running tests, writing docs, and refactoring simultaneously — the effective per-task cost under parallel execution can be 3–5× lower than sequential agent runs on the same token budget. See our Grok Build parallel agents guide for a full breakdown of the concurrency model.
08 — Cost vs QualityWhat you actually pay for at each tier.
The lowest-cost option in this survey — Aider on Haiku 4.5 at ~$0.02 per light task — is not the best value for every team. Quality at the frontier matters because smarter models complete tasks in fewer loops. A model that closes a moderate refactor in 6 loops instead of 10 saves 40% of the total cost, even if its per-token rate is higher. The per-task cost calculator above assumes fixed loop counts; real-world loop counts vary by model capability.
Industry benchmarks suggest frontier models (Opus 4.7, GPT-5.3-Codex) complete complex multi-file tasks in roughly 60% of the loops required by smaller models. Applied to the heavy workload: Opus 4.7 at 15 loops (instead of 25) costs approximately $3.00 cached vs $2.38 at the full 25 loops for Haiku 4.5. The quality premium disappears. For teams with a high proportion of complex architectural tasks, routing to Opus 4.7 with caching may be the highest-value combination, not the highest-cost one. Our AI analytics engagements routinely surface this pattern in client token-spend audits.
This framing — cost per successful task, not cost per token — is the right evaluation metric for agentic tools. A cheaper model that requires manual intervention to fix its output three times costs more in developer time than a frontier model that gets it right in one pass.
Heavy-task cost when loop efficiency is factored in
Cached Claude figures at 70% hit rate. Loop-count reduction for frontier models is an estimate, not a benchmark. Source: Anthropic pricing docs + modeling assumptions.09 — RecommendationsStrategic choices by team size and workload mix.
The calculator shows that no single tool wins across all workload types. The optimal configuration depends on three factors: your workload mix (what fraction of tasks are light vs heavy), your loop-count efficiency expectations (how often does the model complete tasks without retries), and your sensitivity to latency vs cost (Fast mode costs 6× more but runs faster). The grid below summarizes the recommended configuration for each team archetype. For Kiro-specific credit modeling, see our Kiro migration playbook.
Aider or Continue on Haiku 4.5
Sub-$1 for 90% of everyday coding tasks. Route only truly complex architectural tasks to Sonnet or Opus — the incremental quality justifies the cost only when task failure is expensive. Start with Haiku; upgrade when you hit its quality ceiling.
Cursor Composer 2.5 (standard tier)
The lowest frontier-quality input rate in this survey at $0.50/Mtok. Best cost-quality ratio for developers who want a managed IDE experience. Avoid Fast mode unless latency is critical — 6× input premium is rarely worth it for standard workflows.
Copilot Pro+ + Sonnet 4.6 API for heavy tasks
Pro+ at $39/seat covers the majority of moderate agent tasks within the 1,500 premium request quota. For heavy architectural work, route to Claude Code on Sonnet 4.6 with caching — $1.43 per cached heavy task beats running Pro+ overages. Our AI transformation team can model the breakeven for your specific usage pattern.
Claude Code Max 20× + Opus 4.7 routing
At enterprise scale, Opus 4.7's higher loop efficiency closes the cost gap with cheaper models. Max 20× at $200/seat/mo covers 20× Pro usage. Supplement with negotiated API volume where subscriptions cap out. Grok Build's parallel sub-agent architecture is worth evaluating for pipeline workloads that benefit from concurrency.
The critical habit for any team running AI coding agents at scale: track per-task cost, not per-session spend. A session that runs one heavy task costs the same as ten light tasks — but the business value is categorically different. Integrating token spend into your engineering analytics dashboard is the foundation for optimizing the model-routing decisions above. Our AI transformation service includes cost observability as a first-class deliverable on every agent deployment engagement.
Per-task cost varies 100x — but the range collapses with two adjustments.
Per-task agent cost varies by roughly 100× across the ten tools modeled here, but most of that range collapses once you account for two factors. First, Anthropic's 0.1× cache multiplier makes high-cache-hit workloads on Sonnet 4.6 surprisingly competitive with Cursor Composer 2.5 — and at 90% hit rates, effectively cheaper on input. Second, loop-count efficiency: smarter models complete complex tasks in 60% fewer loops and offset their per-token premium entirely on heavy workloads. The published $/Mtok comparisons are misleading because they ignore both.
For solo developers and indies, Cursor Composer 2.5 ($0.50/M input) or Claude Code on Haiku 4.5 ($0.80/M input) wins on cost while delivering 80% of frontier-model quality for everyday tasks. For enterprise teams running heavy parallel agents, the math flips toward Opus 4.7 or GPT-5.3-Codex — fewer retries multiply through at scale. The subscription-vs-API breakeven sits around 4–8M tokens/month for individuals and 50–100M for teams.
Update this calculator quarterly. Pricing changes weekly in this market — Cursor Composer 2.5 launched in May 2026, Grok Build is still in early beta, and Anthropic's credit model is being overhauled in June 2026. The tool that wins today may not win in three months.