Claude Sonnet 5 shipped on June 30, 2026 as the most agentic model in the Sonnet line — a model Anthropic describes as able to make plans, use tools like browsers and terminals, and run autonomously at a level that, only a few months ago, took a larger and more expensive model. The headline is not a single benchmark. It is that a Sonnet-tier model now does most agentic work within a few points of the Opus 4.8 flagship, at a fraction of the price.
That matters because the constraint on putting agents into production is rarely raw capability — it is cost at volume. An agentic loop that drives a screen or a terminal burns tokens on every step, so the model you can afford to run thousands of times a month is usually the one that ships. Sonnet 5 is Anthropic’s bet that near-flagship quality at Sonnet prices is what moves agentic automation from demo to deployment.
This guide covers exactly what launched and where to use it: an honest read of Anthropic’s benchmark chart (every figure is first-party), the specific gaps where Opus 4.8 still leads, the pricing against both Sonnet 4.6 and Opus 4.8, the API changes that affect how you build, and a routing framework for deciding which model each step of your pipeline should call.
- 01It is the most agentic Sonnet, by design.Anthropic built Sonnet 5 to plan, use browsers and terminals, and run autonomously. Its testers note it finishes complex tasks where previous Sonnets stopped short and checks its own output without being asked — the behaviors that separate a usable agent from a chatbot.
- 02With tools in the loop, it is near-Opus.On Anthropic's own numbers, Sonnet 5 sits within ~2 points of Opus 4.8 on Terminal-Bench (80.4 vs 82.7) and OSWorld-Verified computer use (81.2 vs 83.4), and effectively ties it on knowledge work (GDPval-AA v2: 1618 vs 1615).
- 03Opus 4.8 still leads the hardest pure reasoning.On SWE-bench Pro (63.2 vs 69.2) and Humanity's Last Exam with no tools (43.2 vs 49.8), Opus 4.8 keeps a ~6-point edge. The flagship is still the right call for the top of the difficulty range, especially without tools to lean on.
- 04The price is the durable advantage.Sonnet 5 lists at $3 / $15 per million tokens — $2 / $10 introductory through August 31, 2026. That is 40-60% of Opus 4.8's $5 / $25, and the same list price as Sonnet 4.6 for a large jump in capability.
- 05All the benchmarks are Anthropic-reported.The figures here come from Anthropic's release chart, not an independent evaluator. Treat them as directional first-party claims on a common axis, and run your own task-level evaluation before committing a workload.
01 — What ShippedA Sonnet built to act, not just answer.
Sonnet 5 is available immediately and broadly. It is the default model on Anthropic’s Free and Pro plans, available to Max, Team, and Enterprise users, and live across the Claude apps, Claude Code, and the Claude Platform. On the API you call it with the model ID claude-sonnet-5 — a clean string with no date suffix — and it carries a 1M-token context window with up to 128K output tokens.
The framing Anthropic chose is “most agentic Sonnet yet,” and the substance behind that phrase is what the model does between the question and the answer. It makes a plan, calls tools — a browser, a terminal, a search — and runs the loop autonomously instead of handing control back after one step. Two behaviors its early-access testers singled out are the tell: it finishes complex tasks where previous Sonnet models would stop short, and it checks its own output without being explicitly asked. Self-verification is the difference between an agent you can leave running and one you have to babysit.
Read against the rest of the Sonnet line, this is a step change rather than a point release. Our Claude Sonnet 4.6 guide covered a capable but clearly mid-tier model; Sonnet 5 moves the tier’s ceiling up to brush against the flagship. And for anyone who followed the speculation, this is the model behind the “Fennec” leak we analyzed in January — now shipped, named, and benchmarked.
02 — The BenchmarksThe numbers, read honestly.
Anthropic published Sonnet 5 against Sonnet 4.6, with Opus 4.8 in the chart as a reference. One caveat governs every figure below, so state it first: these are first-party numbers from Anthropic’s release, not results from an independent evaluator. Read the table as vendor claims arranged on a common axis — useful for direction, not a substitute for evaluating the model on your own tasks.
| Capability · Suite | Sonnet 5 | Sonnet 4.6 | Opus 4.8 (ref) |
|---|---|---|---|
| Agentic codingSWE-bench Pro | 63.2% | 58.1% | 69.2% |
| Agentic codingTerminal-Bench 2.1 | 80.4% | 67.0% | 82.7% |
| Multidisciplinary reasoningHumanity's Last Exam (no tools) | 43.2% | 34.6% | 49.8% |
| Multidisciplinary reasoningHumanity's Last Exam (with tools) | 57.4% | 46.8% | 57.9% |
| Computer useOSWorld-Verified | 81.2% | 78.5% | 83.4% |
| Knowledge workGDPval-AA v2 (Elo) | 1618 | 1395 | 1615 |
The shape of the table is the story. The jump over Sonnet 4.6 is large everywhere — and largest where it counts for agents. Terminal-Bench, which measures real terminal work, climbs from 67.0% to 80.4%. Humanity’s Last Exam with tools rises from 46.8% to 57.4%. And on knowledge work — GDPval-AA v2, Anthropic’s economic-value benchmark — Sonnet 5 leaps 223 Elo points, from 1395 to 1618, which is where it crosses into Opus territory outright.
GDPval-AA v2 (Elo)
Sonnet 5 effectively ties Opus 4.8 (1615) and adds 223 Elo over Sonnet 4.6's 1395. On Anthropic's economic-value benchmark, this is Opus-tier output at Sonnet cost.
Terminal-Bench 2.1
Up from 67.0% on Sonnet 4.6 and within 2.3 points of Opus 4.8 (82.7%). The biggest gains land exactly where an agent does real terminal work.
OSWorld-Verified
Driving a real screen, Sonnet 5 sits 2.2 points behind Opus 4.8 (83.4%) and ahead of Sonnet 4.6 (78.5%) — close enough to anchor browser agents.
03 — Near Opus, And NotWithin a whisker with tools, behind without them.
“Close to Opus 4.8” is the honest summary, but the gap is not uniform — and where it opens up tells you exactly when to reach for the flagship. The pattern is clean: when tools are in the loop, Sonnet 5 is within a point or two of Opus 4.8; when the task is pure reasoning with nothing to lean on, Opus pulls ahead by roughly six points. The bars below put Sonnet 5’s five percentage benchmarks against the Opus 4.8 reference.
Sonnet 5 vs the Opus 4.8 reference · Anthropic-reported
Source: Anthropic Claude Sonnet 5 release chart, June 30, 2026 — first-party figures, no independent third-party verification as of June 2026The most striking pair is Humanity’s Last Exam with and without tools. With tools, Sonnet 5 (57.4%) is half a point behind Opus 4.8 (57.9%) — a statistical tie. Without tools, the gap widens to 6.6 points (43.2% vs 49.8%). That is the whole thesis of the release in one comparison: hand Sonnet 5 the ability to search, run code, and check itself, and it reasons like the flagship; take the tools away and the flagship’s deeper unaided reasoning shows.
SWE-bench Pro tells the same story from the coding side. The 6-point gap (63.2% vs 69.2%) is real, and on the hardest end-to-end software tasks Opus 4.8 is still the model that clears bars Sonnet 5 does not. But Terminal-Bench — closer to how an agentic coding tool actually operates, step by step in a real shell — is a near tie. For most production agents, which work with tools rather than in a single unaided pass, Sonnet 5 captures the part of the flagship that the workload actually uses. Opus 4.8 earns its premium on the tasks at the very top of the difficulty range, not on the average one.
"Sonnet 5 is a substantial improvement over Sonnet 4.6 on reasoning, tool use, coding, and knowledge work. Its performance is close to Opus 4.8, at lower prices."— Anthropic, Introducing Claude Sonnet 5, June 30, 2026
04 — The Price StoryThe number that actually moves a decision.
With the benchmarks close, price is the lever — and it is where Sonnet 5 makes its case. The list rate is $3 per million input tokens and $15 per million output, with introductory pricing of $2 / $10 through August 31, 2026. The standard rate matches Sonnet 4.6 exactly, so once the intro window closes you are paying the same list price as the predecessor for a large step up in capability. The comparison that matters, though, is against Opus 4.8.
| Model | Input ($/Mtok) | Output ($/Mtok) | Note |
|---|---|---|---|
| Claude Sonnet 5 — introductory | $2.00 | $10.00 | Through August 31, 2026 |
| Claude Sonnet 5 — standard | $3.00 | $15.00 | From September 1, 2026 |
| Claude Sonnet 4.6 | $3.00 | $15.00 | Predecessor — same list price |
| Claude Opus 4.8 | $5.00 | $25.00 | Reference flagship |
Per million tokens (intro)
Introductory rate through August 31, 2026, then $3. Against Opus 4.8's $5 input, that is 40% during the intro window and 60% afterward.
Per million tokens (intro)
Introductory rate, then $15. Against Opus 4.8's $25 output, that is 40-60% — and output is where agentic loops, with their constant tool calls and self-checks, spend most.
Standard input, unchanged
Standard Sonnet 5 list pricing is identical to Sonnet 4.6's $3 / $15. You get the full capability jump at the same list price once the introductory rate ends.
One honest hedge keeps the cost comparison clean. Sonnet 5 uses a new tokenizer that turns the same text into roughly 30% more tokens than Sonnet 4.6 did, so a per-token price that looks identical does not translate to an identical bill on the same workload — re-measure with real token counts before you forecast. Even after that adjustment, the structural point holds: a model that performs within a couple of points of the flagship at 40-60% of its price changes which automations clear the bar of being worth building. That arithmetic, not a hero benchmark, is what tends to decide a model-routing decision.
05 — What Changed For BuildersThree API changes worth a second look.
If you are integrating Sonnet 5 rather than chatting with it, three things behave differently from Sonnet 4.6. None is hard, but each can quietly change cost or output if you carry a 4.6 configuration over unchanged.
Adaptive by default
On claude-sonnet-5, leaving the thinking parameter unset now runs adaptive thinking — Sonnet 4.6 ran without it. Control depth with effort, and note Sonnet 5 is the first Sonnet to support the xhigh level.
A new tokenizer
The same text tokenizes to roughly 30% more tokens than on Sonnet 4.6. Per-token pricing is unchanged, but token-budgeted limits, max_tokens, and cost baselines all shift — re-baseline with a real token count.
1M context, sharper vision
A 1M-token context window with 128K max output, plus the first Sonnet-tier high-resolution image support (up to 2576px). Larger workflows and denser screenshots fit in a single pass.
The effort dial is the lever most worth tuning. A rough cross-model mapping from Anthropic’s own guidance: Sonnet 5 at medium is comparable in intelligence to Sonnet 4.6 at high, and Sonnet 5 at high is comparable to Sonnet 4.6 at max. In practice that means you can often hold quality steady and step effort down a notch to cut latency and tokens, or hold effort and bank the capability gain. For the hardest agentic coding, xhigh is the new ceiling. Set it deliberately rather than inheriting the 4.6 default.
06 — Sonnet 5 Or Opus 4.8?Which model each step should call.
The benchmark gaps above translate into a simple routing rule: default to Sonnet 5 for the volume of agentic work, and escalate to Opus 4.8 only at the top of the difficulty range. The cleanest pipelines run a Sonnet 5 floor with an Opus 4.8 ceiling, switching models per step rather than per project — the shared tool surface and 1M context make those hand-offs cheap.
High-volume agentic work
Multi-step coding agents, browser and terminal automation, content and ops pipelines, RAG and tool-calling. With tools in the loop, Sonnet 5 is within ~2 points of Opus 4.8 at 40-60% of the price.
Hardest reasoning, no tools
SWE-bench Pro-class refactors and unaided reasoning, where Opus 4.8 keeps a ~6-point lead. When one hard decision is worth the premium, the flagship earns it.
Tiered agent pipelines
Run cheap steps — planning, retrieval, routine edits — on Sonnet 5 and escalate only the hardest sub-task to Opus 4.8. The shared API and 1M context keep the routing clean.
Before you switch a workload
Sonnet 5 uses a new tokenizer (~30% more tokens than 4.6) and adaptive thinking is on by default. Re-measure token counts and max_tokens, set effort deliberately, then evaluate on your own tasks.
07 — Putting It To WorkWhere a team starts today.
For a marketing or operations team, the practical opening is the work that was previously too token-expensive to automate at a frontier model’s price. Agent-written reporting, content pipelines that draft and self-check, lead-routing and CRM hygiene agents, browser automations that read dashboards and flag anomalies — these are high-volume, repetitive tasks where a near-flagship model at 40-60% of the cost changes the build-versus-skip calculation.
The sequencing we use with clients is the same regardless of model: prove value on a single well-scoped, low-risk workflow — usually read-only at first — add human-confirmed writes once the audit trail earns trust, and only then widen autonomy. Sonnet 5’s self-checking and tool fluency make that ramp faster, but the discipline is unchanged: scope tight, log everything, and route the occasional hard step to Opus 4.8 rather than over-paying for the whole pipeline. That scoping — which workflows, which guardrails, which model per step — is exactly where our agentic AI transformation engagements begin, before any model commitment. For teams rolling agents out at scale, our enterprise deployment playbook covers the governance layer.
08 — ConclusionThe tier ceiling moves up.
A Sonnet that does most agentic work like a flagship, at a fraction of the price.
Claude Sonnet 5 is best read as a pricing event wearing a benchmark headline. On Anthropic’s own numbers it lands within a couple of points of Opus 4.8 wherever tools are in the loop — Terminal-Bench, computer use, knowledge work — and effectively ties the flagship on economic-value tasks. The honest qualifier is that those figures are first-party, and that Opus 4.8 keeps a real ~6-point lead on the hardest pure reasoning and end-to-end software tasks.
Keep the framing precise. This is not “Sonnet matches Opus.” It is “Sonnet now captures the part of the flagship that most production agents actually use, at 40-60% of the cost.” The hardest workloads still belong to Opus 4.8; the volume belongs to Sonnet 5. The teams that win will route between them deliberately rather than paying flagship prices for an average task.
The forward read is straightforward. When a Sonnet-tier model runs agents this close to the flagship, near-flagship agentic capability stops being a premium line item and starts being the default floor. That, not a fraction of a benchmark point, is what this release actually changes.