Claude Sonnet 5 shipped on June 30, 2026 as the most agentic model in the Sonnet line — a model Anthropic describes as able to make plans, use tools like browsers and terminals, and run autonomously at a level that, only a few months ago, took a larger and more expensive model. The headline is not a single benchmark. It is that a Sonnet-tier model now does most agentic work within a few points of the Opus 4.8 flagship, at a fraction of the price.

That matters because the constraint on putting agents into production is rarely raw capability — it is cost at volume. An agentic loop that drives a screen or a terminal burns tokens on every step, so the model you can afford to run thousands of times a month is usually the one that ships. Sonnet 5 is Anthropic’s bet that near-flagship quality at Sonnet prices is what moves agentic automation from demo to deployment.

This guide covers exactly what launched and where to use it: an honest read of Anthropic’s benchmark chart (every figure is first-party), the specific gaps where Opus 4.8 still leads, the pricing against both Sonnet 4.6 and Opus 4.8, the API changes that affect how you build, and a routing framework for deciding which model each step of your pipeline should call.

Key takeaways

01
It is the most agentic Sonnet, by design.Anthropic built Sonnet 5 to plan, use browsers and terminals, and run autonomously. Its testers note it finishes complex tasks where previous Sonnets stopped short and checks its own output without being asked — the behaviors that separate a usable agent from a chatbot.
02
With tools in the loop, it is near-Opus.On Anthropic's own numbers, Sonnet 5 sits within ~2 points of Opus 4.8 on Terminal-Bench (80.4 vs 82.7) and OSWorld-Verified computer use (81.2 vs 83.4), and effectively ties it on knowledge work (GDPval-AA v2: 1618 vs 1615).
03
Opus 4.8 still leads the hardest pure reasoning.On SWE-bench Pro (63.2 vs 69.2) and Humanity's Last Exam with no tools (43.2 vs 49.8), Opus 4.8 keeps a ~6-point edge. The flagship is still the right call for the top of the difficulty range, especially without tools to lean on.
04
The price is the durable advantage.Sonnet 5 lists at $3 / $15 per million tokens — $2 / $10 introductory through August 31, 2026. That is 40-60% of Opus 4.8's $5 / $25, and the same list price as Sonnet 4.6 for a large jump in capability.
05
All the benchmarks are Anthropic-reported.The figures here come from Anthropic's release chart, not an independent evaluator. Treat them as directional first-party claims on a common axis, and run your own task-level evaluation before committing a workload.

01 — What ShippedA Sonnet built to act, not just answer.

Sonnet 5 is available immediately and broadly. It is the default model on Anthropic’s Free and Pro plans, available to Max, Team, and Enterprise users, and live across the Claude apps, Claude Code, and the Claude Platform. On the API you call it with the model ID claude-sonnet-5 — a clean string with no date suffix — and it carries a 1M-token context window with up to 128K output tokens.

The framing Anthropic chose is “most agentic Sonnet yet,” and the substance behind that phrase is what the model does between the question and the answer. It makes a plan, calls tools — a browser, a terminal, a search — and runs the loop autonomously instead of handing control back after one step. Two behaviors its early-access testers singled out are the tell: it finishes complex tasks where previous Sonnet models would stop short, and it checks its own output without being explicitly asked. Self-verification is the difference between an agent you can leave running and one you have to babysit.

Read against the rest of the Sonnet line, this is a step change rather than a point release. Our Claude Sonnet 4.6 guide covered a capable but clearly mid-tier model; Sonnet 5 moves the tier’s ceiling up to brush against the flagship. And for anyone who followed the speculation, this is the model behind the “Fennec” leak we analyzed in January — now shipped, named, and benchmarked.

Anthropic, in its own words

Anthropic frames the release plainly: Sonnet 5 “makes plans, uses tools like browsers and terminals, and runs autonomously at a level that just a few months ago required larger and more expensive models.” Its early-access partners found it finishes complex tasks where previous Sonnets stopped short and checks its own work unprompted — and that its performance is close to Opus 4.8, at lower prices.

02 — The BenchmarksThe numbers, read honestly.

Anthropic published Sonnet 5 against Sonnet 4.6, with Opus 4.8 in the chart as a reference. One caveat governs every figure below, so state it first: these are first-party numbers from Anthropic’s release, not results from an independent evaluator. Read the table as vendor claims arranged on a common axis — useful for direction, not a substitute for evaluating the model on your own tasks.

Claude Sonnet 5 versus Claude Sonnet 4.6 across five benchmark suites, with Claude Opus 4.8 shown for reference. Figures are from Anthropic's June 30, 2026 release chart and are first-party (Anthropic-reported) with no independent third-party verification as of June 2026. GDPval-AA v2 is an Elo score, not a percentage.
Capability · Suite	Sonnet 5	Sonnet 4.6	Opus 4.8 (ref)
Agentic codingSWE-bench Pro	63.2%	58.1%	69.2%
Agentic codingTerminal-Bench 2.1	80.4%	67.0%	82.7%
Multidisciplinary reasoningHumanity's Last Exam (no tools)	43.2%	34.6%	49.8%
Multidisciplinary reasoningHumanity's Last Exam (with tools)	57.4%	46.8%	57.9%
Computer useOSWorld-Verified	81.2%	78.5%	83.4%
Knowledge workGDPval-AA v2 (Elo)	1618	1395	1615

The shape of the table is the story. The jump over Sonnet 4.6 is large everywhere — and largest where it counts for agents. Terminal-Bench, which measures real terminal work, climbs from 67.0% to 80.4%. Humanity’s Last Exam with tools rises from 46.8% to 57.4%. And on knowledge work — GDPval-AA v2, Anthropic’s economic-value benchmark — Sonnet 5 leaps 223 Elo points, from 1395 to 1618, which is where it crosses into Opus territory outright.

Knowledge work

GDPval-AA v2 (Elo)

1618

Sonnet 5 effectively ties Opus 4.8 (1615) and adds 223 Elo over Sonnet 4.6's 1395. On Anthropic's economic-value benchmark, this is Opus-tier output at Sonnet cost.

Ties Opus 4.8

Agentic coding

Terminal-Bench 2.1

80.4%

Up from 67.0% on Sonnet 4.6 and within 2.3 points of Opus 4.8 (82.7%). The biggest gains land exactly where an agent does real terminal work.

+13.4 vs 4.6

Computer use

OSWorld-Verified

81.2%

Driving a real screen, Sonnet 5 sits 2.2 points behind Opus 4.8 (83.4%) and ahead of Sonnet 4.6 (78.5%) — close enough to anchor browser agents.

2.2 pt gap

The caveat that governs the whole chart

Every figure here is Anthropic-reported. There is no independent third-party verification of these scores as of June 2026, and a model’s own release chart is exactly where you expect it to look strongest. The right use of these numbers is to decide what to test, not to decide what to deploy — for a cross-vendor view of how the labs measure agentic work, see our agentic coding tools matrix.

03 — Near Opus, And NotWithin a whisker with tools, behind without them.

“Close to Opus 4.8” is the honest summary, but the gap is not uniform — and where it opens up tells you exactly when to reach for the flagship. The pattern is clean: when tools are in the loop, Sonnet 5 is within a point or two of Opus 4.8; when the task is pure reasoning with nothing to lean on, Opus pulls ahead by roughly six points. The bars below put Sonnet 5’s five percentage benchmarks against the Opus 4.8 reference.

Sonnet 5 vs the Opus 4.8 reference · Anthropic-reported

Source: Anthropic Claude Sonnet 5 release chart, June 30, 2026 — first-party figures, no independent third-party verification as of June 2026

Computer use · OSWorld-VerifiedOpus 4.8 at 83.4 · 2.2 pt gap

81.2

Near tie

Agentic coding · Terminal-Bench 2.1Opus 4.8 at 82.7 · 2.3 pt gap

80.4

Near tie

Reasoning · HLE (with tools)Opus 4.8 at 57.9 · 0.5 pt gap

57.4

Statistical tie

Agentic coding · SWE-bench ProOpus 4.8 at 69.2 · 6.0 pt gap

63.2

Reasoning · HLE (no tools)Opus 4.8 at 49.8 · 6.6 pt gap

43.2

Within ~2 pts of Opus 4.8Opus 4.8 clearly ahead

The most striking pair is Humanity’s Last Exam with and without tools. With tools, Sonnet 5 (57.4%) is half a point behind Opus 4.8 (57.9%) — a statistical tie. Without tools, the gap widens to 6.6 points (43.2% vs 49.8%). That is the whole thesis of the release in one comparison: hand Sonnet 5 the ability to search, run code, and check itself, and it reasons like the flagship; take the tools away and the flagship’s deeper unaided reasoning shows.

SWE-bench Pro tells the same story from the coding side. The 6-point gap (63.2% vs 69.2%) is real, and on the hardest end-to-end software tasks Opus 4.8 is still the model that clears bars Sonnet 5 does not. But Terminal-Bench — closer to how an agentic coding tool actually operates, step by step in a real shell — is a near tie. For most production agents, which work with tools rather than in a single unaided pass, Sonnet 5 captures the part of the flagship that the workload actually uses. Opus 4.8 earns its premium on the tasks at the very top of the difficulty range, not on the average one.

"Sonnet 5 is a substantial improvement over Sonnet 4.6 on reasoning, tool use, coding, and knowledge work. Its performance is close to Opus 4.8, at lower prices."— Anthropic, Introducing Claude Sonnet 5, June 30, 2026

04 — The Price StoryThe number that actually moves a decision.

With the benchmarks close, price is the lever — and it is where Sonnet 5 makes its case. The list rate is $3 per million input tokens and $15 per million output, with introductory pricing of $2 / $10 through August 31, 2026. The standard rate matches Sonnet 4.6 exactly, so once the intro window closes you are paying the same list price as the predecessor for a large step up in capability. The comparison that matters, though, is against Opus 4.8.

Published list pricing per million tokens for Claude Sonnet 5 (introductory and standard), Claude Sonnet 4.6, and Claude Opus 4.8. Sonnet 5 rates are from Anthropic's June 30, 2026 announcement; the others are current published list prices. All figures are exact.
Model	Input ($/Mtok)	Output ($/Mtok)	Note
Claude Sonnet 5 — introductory	$2.00	$10.00	Through August 31, 2026
Claude Sonnet 5 — standard	$3.00	$15.00	From September 1, 2026
Claude Sonnet 4.6	$3.00	$15.00	Predecessor — same list price
Claude Opus 4.8	$5.00	$25.00	Reference flagship

Input

Per million tokens (intro)

Introductory rate through August 31, 2026, then $3. Against Opus 4.8's $5 input, that is 40% during the intro window and 60% afterward.

40% of Opus, intro

Output

Per million tokens (intro)

$10

Introductory rate, then $15. Against Opus 4.8's $25 output, that is 40-60% — and output is where agentic loops, with their constant tool calls and self-checks, spend most.

vs $25 Opus 4.8

vs Sonnet 4.6

Standard input, unchanged

Standard Sonnet 5 list pricing is identical to Sonnet 4.6's $3 / $15. You get the full capability jump at the same list price once the introductory rate ends.

Same list as 4.6

One honest hedge keeps the cost comparison clean. Sonnet 5 uses a new tokenizer that turns the same text into roughly 30% more tokens than Sonnet 4.6 did, so a per-token price that looks identical does not translate to an identical bill on the same workload — re-measure with real token counts before you forecast. Even after that adjustment, the structural point holds: a model that performs within a couple of points of the flagship at 40-60% of its price changes which automations clear the bar of being worth building. That arithmetic, not a hero benchmark, is what tends to decide a model-routing decision.

05 — What Changed For BuildersThree API changes worth a second look.

If you are integrating Sonnet 5 rather than chatting with it, three things behave differently from Sonnet 4.6. None is hard, but each can quietly change cost or output if you carry a 4.6 configuration over unchanged.

Thinking

Adaptive by default

omit the param

On claude-sonnet-5, leaving the thinking parameter unset now runs adaptive thinking — Sonnet 4.6 ran without it. Control depth with effort, and note Sonnet 5 is the first Sonnet to support the xhigh level.

effort: low → xhigh

Tokens

A new tokenizer

~30% more tokens

The same text tokenizes to roughly 30% more tokens than on Sonnet 4.6. Per-token pricing is unchanged, but token-budgeted limits, max_tokens, and cost baselines all shift — re-baseline with a real token count.

re-measure budgets

Inputs

1M context, sharper vision

2576px long edge

A 1M-token context window with 128K max output, plus the first Sonnet-tier high-resolution image support (up to 2576px). Larger workflows and denser screenshots fit in a single pass.

1M ctx · 128K out

The effort dial is the lever most worth tuning. A rough cross-model mapping from Anthropic’s own guidance: Sonnet 5 at medium is comparable in intelligence to Sonnet 4.6 at high, and Sonnet 5 at high is comparable to Sonnet 4.6 at max. In practice that means you can often hold quality steady and step effort down a notch to cut latency and tokens, or hold effort and bank the capability gain. For the hardest agentic coding, xhigh is the new ceiling. Set it deliberately rather than inheriting the 4.6 default.

06 — Sonnet 5 Or Opus 4.8?Which model each step should call.

The benchmark gaps above translate into a simple routing rule: default to Sonnet 5 for the volume of agentic work, and escalate to Opus 4.8 only at the top of the difficulty range. The cleanest pipelines run a Sonnet 5 floor with an Opus 4.8 ceiling, switching models per step rather than per project — the shared tool surface and 1M context make those hand-offs cheap.

Default to Sonnet 5

High-volume agentic work

Multi-step coding agents, browser and terminal automation, content and ops pipelines, RAG and tool-calling. With tools in the loop, Sonnet 5 is within ~2 points of Opus 4.8 at 40-60% of the price.

Most production agents

Reach for Opus 4.8

Hardest reasoning, no tools

SWE-bench Pro-class refactors and unaided reasoning, where Opus 4.8 keeps a ~6-point lead. When one hard decision is worth the premium, the flagship earns it.

Top of the difficulty range

Mix per step

Tiered agent pipelines

Run cheap steps — planning, retrieval, routine edits — on Sonnet 5 and escalate only the hardest sub-task to Opus 4.8. The shared API and 1M context keep the routing clean.

Sonnet 5 floor, Opus ceiling

Re-baseline first

Before you switch a workload

Sonnet 5 uses a new tokenizer (~30% more tokens than 4.6) and adaptive thinking is on by default. Re-measure token counts and max_tokens, set effort deliberately, then evaluate on your own tasks.

Test, don't assume

07 — Putting It To WorkWhere a team starts today.

For a marketing or operations team, the practical opening is the work that was previously too token-expensive to automate at a frontier model’s price. Agent-written reporting, content pipelines that draft and self-check, lead-routing and CRM hygiene agents, browser automations that read dashboards and flag anomalies — these are high-volume, repetitive tasks where a near-flagship model at 40-60% of the cost changes the build-versus-skip calculation.

The sequencing we use with clients is the same regardless of model: prove value on a single well-scoped, low-risk workflow — usually read-only at first — add human-confirmed writes once the audit trail earns trust, and only then widen autonomy. Sonnet 5’s self-checking and tool fluency make that ramp faster, but the discipline is unchanged: scope tight, log everything, and route the occasional hard step to Opus 4.8 rather than over-paying for the whole pipeline. That scoping — which workflows, which guardrails, which model per step — is exactly where our agentic AI transformation engagements begin, before any model commitment. For teams rolling agents out at scale, our enterprise deployment playbook covers the governance layer.

08 — ConclusionThe tier ceiling moves up.

The shape of Sonnet 5, June 2026

A Sonnet that does most agentic work like a flagship, at a fraction of the price.

Claude Sonnet 5 is best read as a pricing event wearing a benchmark headline. On Anthropic’s own numbers it lands within a couple of points of Opus 4.8 wherever tools are in the loop — Terminal-Bench, computer use, knowledge work — and effectively ties the flagship on economic-value tasks. The honest qualifier is that those figures are first-party, and that Opus 4.8 keeps a real ~6-point lead on the hardest pure reasoning and end-to-end software tasks.

Keep the framing precise. This is not “Sonnet matches Opus.” It is “Sonnet now captures the part of the flagship that most production agents actually use, at 40-60% of the cost.” The hardest workloads still belong to Opus 4.8; the volume belongs to Sonnet 5. The teams that win will route between them deliberately rather than paying flagship prices for an average task.

The forward read is straightforward. When a Sonnet-tier model runs agents this close to the flagship, near-flagship agentic capability stops being a premium line item and starts being the default floor. That, not a fraction of a benchmark point, is what this release actually changes.

Claude Sonnet 5: Near-Opus Agentic Coding

01 — What ShippedA Sonnet built to act, not just answer.

02 — The BenchmarksThe numbers, read honestly.

GDPval-AA v2 (Elo)

Terminal-Bench 2.1

OSWorld-Verified

03 — Near Opus, And NotWithin a whisker with tools, behind without them.

Sonnet 5 vs the Opus 4.8 reference · Anthropic-reported

04 — The Price StoryThe number that actually moves a decision.

Per million tokens (intro)

Per million tokens (intro)

Standard input, unchanged

05 — What Changed For BuildersThree API changes worth a second look.

Adaptive by default

A new tokenizer

1M context, sharper vision

06 — Sonnet 5 Or Opus 4.8?Which model each step should call.

High-volume agentic work

Hardest reasoning, no tools

Tiered agent pipelines

Before you switch a workload

07 — Putting It To WorkWhere a team starts today.

08 — ConclusionThe tier ceiling moves up.

A Sonnet that does most agentic work like a flagship, at a fraction of the price.

Near-flagship agents make governed automation genuinely affordable.

Agentic automation engagements

The questions we get every week.

Continue exploring frontier releases.

Claude Sonnet 4.6: Benchmarks, Pricing & Complete Guide

Claude Sonnet 5 Fennec Leak: Complete Analysis Guide

Anthropic Computer Use API: Desktop Automation Guide

Anthropic Accuses Alibaba of Record Model Distillation

Codex Record & Replay: Show It Once, Skip the Script

Google AI Plans: Free vs Plus vs Pro vs Ultra 2026