AI DevelopmentNew Release9 min readPublished June 30, 2026

Most agentic Sonnet yet · default on Free and Pro · 80.4 Terminal-Bench · near-Opus, lower price

Claude Sonnet 5: Near-Opus Agentic Coding

Anthropic released Claude Sonnet 5 on June 30, 2026 — its most agentic Sonnet yet, built to make plans, drive browsers and terminals, and run autonomously while checking its own work. On Anthropic’s own benchmarks it lands within a few points of Opus 4.8 on agentic coding, computer use, and knowledge work, and it ships at $3 / $15 per million tokens — $2 / $10 through August 31. The benchmarks are first-party; the price cut is the durable story.

DA
Digital Applied Team
Senior AI engineers · Published Jun 30, 2026
PublishedJun 30, 2026
Read time9 min
SourceAnthropic release + chart
Terminal-Bench 2.1
80.4%
Anthropic-reported · Opus 4.8 at 82.7
Input price
$3/1M
$2 intro to Aug 31 · Opus 4.8 $5
GDPval-AA v2
1618
knowledge work · Opus 4.8 at 1615
Context window
1M
128K max output

Claude Sonnet 5 shipped on June 30, 2026 as the most agentic model in the Sonnet line — a model Anthropic describes as able to make plans, use tools like browsers and terminals, and run autonomously at a level that, only a few months ago, took a larger and more expensive model. The headline is not a single benchmark. It is that a Sonnet-tier model now does most agentic work within a few points of the Opus 4.8 flagship, at a fraction of the price.

That matters because the constraint on putting agents into production is rarely raw capability — it is cost at volume. An agentic loop that drives a screen or a terminal burns tokens on every step, so the model you can afford to run thousands of times a month is usually the one that ships. Sonnet 5 is Anthropic’s bet that near-flagship quality at Sonnet prices is what moves agentic automation from demo to deployment.

This guide covers exactly what launched and where to use it: an honest read of Anthropic’s benchmark chart (every figure is first-party), the specific gaps where Opus 4.8 still leads, the pricing against both Sonnet 4.6 and Opus 4.8, the API changes that affect how you build, and a routing framework for deciding which model each step of your pipeline should call.

Key takeaways
  1. 01
    It is the most agentic Sonnet, by design.Anthropic built Sonnet 5 to plan, use browsers and terminals, and run autonomously. Its testers note it finishes complex tasks where previous Sonnets stopped short and checks its own output without being asked — the behaviors that separate a usable agent from a chatbot.
  2. 02
    With tools in the loop, it is near-Opus.On Anthropic's own numbers, Sonnet 5 sits within ~2 points of Opus 4.8 on Terminal-Bench (80.4 vs 82.7) and OSWorld-Verified computer use (81.2 vs 83.4), and effectively ties it on knowledge work (GDPval-AA v2: 1618 vs 1615).
  3. 03
    Opus 4.8 still leads the hardest pure reasoning.On SWE-bench Pro (63.2 vs 69.2) and Humanity's Last Exam with no tools (43.2 vs 49.8), Opus 4.8 keeps a ~6-point edge. The flagship is still the right call for the top of the difficulty range, especially without tools to lean on.
  4. 04
    The price is the durable advantage.Sonnet 5 lists at $3 / $15 per million tokens — $2 / $10 introductory through August 31, 2026. That is 40-60% of Opus 4.8's $5 / $25, and the same list price as Sonnet 4.6 for a large jump in capability.
  5. 05
    All the benchmarks are Anthropic-reported.The figures here come from Anthropic's release chart, not an independent evaluator. Treat them as directional first-party claims on a common axis, and run your own task-level evaluation before committing a workload.

01What ShippedA Sonnet built to act, not just answer.

Sonnet 5 is available immediately and broadly. It is the default model on Anthropic’s Free and Pro plans, available to Max, Team, and Enterprise users, and live across the Claude apps, Claude Code, and the Claude Platform. On the API you call it with the model ID claude-sonnet-5 — a clean string with no date suffix — and it carries a 1M-token context window with up to 128K output tokens.

The framing Anthropic chose is “most agentic Sonnet yet,” and the substance behind that phrase is what the model does between the question and the answer. It makes a plan, calls tools — a browser, a terminal, a search — and runs the loop autonomously instead of handing control back after one step. Two behaviors its early-access testers singled out are the tell: it finishes complex tasks where previous Sonnet models would stop short, and it checks its own output without being explicitly asked. Self-verification is the difference between an agent you can leave running and one you have to babysit.

Read against the rest of the Sonnet line, this is a step change rather than a point release. Our Claude Sonnet 4.6 guide covered a capable but clearly mid-tier model; Sonnet 5 moves the tier’s ceiling up to brush against the flagship. And for anyone who followed the speculation, this is the model behind the “Fennec” leak we analyzed in January — now shipped, named, and benchmarked.

Anthropic, in its own words
Anthropic frames the release plainly: Sonnet 5 “makes plans, uses tools like browsers and terminals, and runs autonomously at a level that just a few months ago required larger and more expensive models.” Its early-access partners found it finishes complex tasks where previous Sonnets stopped short and checks its own work unprompted — and that its performance is close to Opus 4.8, at lower prices.

02The BenchmarksThe numbers, read honestly.

Anthropic published Sonnet 5 against Sonnet 4.6, with Opus 4.8 in the chart as a reference. One caveat governs every figure below, so state it first: these are first-party numbers from Anthropic’s release, not results from an independent evaluator. Read the table as vendor claims arranged on a common axis — useful for direction, not a substitute for evaluating the model on your own tasks.

Claude Sonnet 5 versus Claude Sonnet 4.6 across five benchmark suites, with Claude Opus 4.8 shown for reference. Figures are from Anthropic's June 30, 2026 release chart and are first-party (Anthropic-reported) with no independent third-party verification as of June 2026. GDPval-AA v2 is an Elo score, not a percentage.
Capability · SuiteSonnet 5Sonnet 4.6Opus 4.8 (ref)
Agentic codingSWE-bench Pro63.2%58.1%69.2%
Agentic codingTerminal-Bench 2.180.4%67.0%82.7%
Multidisciplinary reasoningHumanity's Last Exam (no tools)43.2%34.6%49.8%
Multidisciplinary reasoningHumanity's Last Exam (with tools)57.4%46.8%57.9%
Computer useOSWorld-Verified81.2%78.5%83.4%
Knowledge workGDPval-AA v2 (Elo)161813951615

The shape of the table is the story. The jump over Sonnet 4.6 is large everywhere — and largest where it counts for agents. Terminal-Bench, which measures real terminal work, climbs from 67.0% to 80.4%. Humanity’s Last Exam with tools rises from 46.8% to 57.4%. And on knowledge work — GDPval-AA v2, Anthropic’s economic-value benchmark — Sonnet 5 leaps 223 Elo points, from 1395 to 1618, which is where it crosses into Opus territory outright.

Knowledge work
GDPval-AA v2 (Elo)
1618

Sonnet 5 effectively ties Opus 4.8 (1615) and adds 223 Elo over Sonnet 4.6's 1395. On Anthropic's economic-value benchmark, this is Opus-tier output at Sonnet cost.

Ties Opus 4.8
Agentic coding
Terminal-Bench 2.1
80.4%

Up from 67.0% on Sonnet 4.6 and within 2.3 points of Opus 4.8 (82.7%). The biggest gains land exactly where an agent does real terminal work.

+13.4 vs 4.6
Computer use
OSWorld-Verified
81.2%

Driving a real screen, Sonnet 5 sits 2.2 points behind Opus 4.8 (83.4%) and ahead of Sonnet 4.6 (78.5%) — close enough to anchor browser agents.

2.2 pt gap
The caveat that governs the whole chart
Every figure here is Anthropic-reported. There is no independent third-party verification of these scores as of June 2026, and a model’s own release chart is exactly where you expect it to look strongest. The right use of these numbers is to decide what to test, not to decide what to deploy — for a cross-vendor view of how the labs measure agentic work, see our agentic coding tools matrix.

03Near Opus, And NotWithin a whisker with tools, behind without them.

“Close to Opus 4.8” is the honest summary, but the gap is not uniform — and where it opens up tells you exactly when to reach for the flagship. The pattern is clean: when tools are in the loop, Sonnet 5 is within a point or two of Opus 4.8; when the task is pure reasoning with nothing to lean on, Opus pulls ahead by roughly six points. The bars below put Sonnet 5’s five percentage benchmarks against the Opus 4.8 reference.

Sonnet 5 vs the Opus 4.8 reference · Anthropic-reported

Source: Anthropic Claude Sonnet 5 release chart, June 30, 2026 — first-party figures, no independent third-party verification as of June 2026
Computer use · OSWorld-VerifiedOpus 4.8 at 83.4 · 2.2 pt gap
81.2
Near tie
Agentic coding · Terminal-Bench 2.1Opus 4.8 at 82.7 · 2.3 pt gap
80.4
Near tie
Reasoning · HLE (with tools)Opus 4.8 at 57.9 · 0.5 pt gap
57.4
Statistical tie
Agentic coding · SWE-bench ProOpus 4.8 at 69.2 · 6.0 pt gap
63.2
Reasoning · HLE (no tools)Opus 4.8 at 49.8 · 6.6 pt gap
43.2
Within ~2 pts of Opus 4.8Opus 4.8 clearly ahead

The most striking pair is Humanity’s Last Exam with and without tools. With tools, Sonnet 5 (57.4%) is half a point behind Opus 4.8 (57.9%) — a statistical tie. Without tools, the gap widens to 6.6 points (43.2% vs 49.8%). That is the whole thesis of the release in one comparison: hand Sonnet 5 the ability to search, run code, and check itself, and it reasons like the flagship; take the tools away and the flagship’s deeper unaided reasoning shows.

SWE-bench Pro tells the same story from the coding side. The 6-point gap (63.2% vs 69.2%) is real, and on the hardest end-to-end software tasks Opus 4.8 is still the model that clears bars Sonnet 5 does not. But Terminal-Bench — closer to how an agentic coding tool actually operates, step by step in a real shell — is a near tie. For most production agents, which work with tools rather than in a single unaided pass, Sonnet 5 captures the part of the flagship that the workload actually uses. Opus 4.8 earns its premium on the tasks at the very top of the difficulty range, not on the average one.

"Sonnet 5 is a substantial improvement over Sonnet 4.6 on reasoning, tool use, coding, and knowledge work. Its performance is close to Opus 4.8, at lower prices."— Anthropic, Introducing Claude Sonnet 5, June 30, 2026

04The Price StoryThe number that actually moves a decision.

With the benchmarks close, price is the lever — and it is where Sonnet 5 makes its case. The list rate is $3 per million input tokens and $15 per million output, with introductory pricing of $2 / $10 through August 31, 2026. The standard rate matches Sonnet 4.6 exactly, so once the intro window closes you are paying the same list price as the predecessor for a large step up in capability. The comparison that matters, though, is against Opus 4.8.

Published list pricing per million tokens for Claude Sonnet 5 (introductory and standard), Claude Sonnet 4.6, and Claude Opus 4.8. Sonnet 5 rates are from Anthropic's June 30, 2026 announcement; the others are current published list prices. All figures are exact.
ModelInput ($/Mtok)Output ($/Mtok)Note
Claude Sonnet 5 — introductory$2.00$10.00Through August 31, 2026
Claude Sonnet 5 — standard$3.00$15.00From September 1, 2026
Claude Sonnet 4.6$3.00$15.00Predecessor — same list price
Claude Opus 4.8$5.00$25.00Reference flagship
Input
Per million tokens (intro)
$2

Introductory rate through August 31, 2026, then $3. Against Opus 4.8's $5 input, that is 40% during the intro window and 60% afterward.

40% of Opus, intro
Output
Per million tokens (intro)
$10

Introductory rate, then $15. Against Opus 4.8's $25 output, that is 40-60% — and output is where agentic loops, with their constant tool calls and self-checks, spend most.

vs $25 Opus 4.8
vs Sonnet 4.6
Standard input, unchanged
$3

Standard Sonnet 5 list pricing is identical to Sonnet 4.6's $3 / $15. You get the full capability jump at the same list price once the introductory rate ends.

Same list as 4.6

One honest hedge keeps the cost comparison clean. Sonnet 5 uses a new tokenizer that turns the same text into roughly 30% more tokens than Sonnet 4.6 did, so a per-token price that looks identical does not translate to an identical bill on the same workload — re-measure with real token counts before you forecast. Even after that adjustment, the structural point holds: a model that performs within a couple of points of the flagship at 40-60% of its price changes which automations clear the bar of being worth building. That arithmetic, not a hero benchmark, is what tends to decide a model-routing decision.

05What Changed For BuildersThree API changes worth a second look.

If you are integrating Sonnet 5 rather than chatting with it, three things behave differently from Sonnet 4.6. None is hard, but each can quietly change cost or output if you carry a 4.6 configuration over unchanged.

Thinking
Adaptive by default
omit the param

On claude-sonnet-5, leaving the thinking parameter unset now runs adaptive thinking — Sonnet 4.6 ran without it. Control depth with effort, and note Sonnet 5 is the first Sonnet to support the xhigh level.

effort: low → xhigh
Tokens
A new tokenizer
~30% more tokens

The same text tokenizes to roughly 30% more tokens than on Sonnet 4.6. Per-token pricing is unchanged, but token-budgeted limits, max_tokens, and cost baselines all shift — re-baseline with a real token count.

re-measure budgets
Inputs
1M context, sharper vision
2576px long edge

A 1M-token context window with 128K max output, plus the first Sonnet-tier high-resolution image support (up to 2576px). Larger workflows and denser screenshots fit in a single pass.

1M ctx · 128K out

The effort dial is the lever most worth tuning. A rough cross-model mapping from Anthropic’s own guidance: Sonnet 5 at medium is comparable in intelligence to Sonnet 4.6 at high, and Sonnet 5 at high is comparable to Sonnet 4.6 at max. In practice that means you can often hold quality steady and step effort down a notch to cut latency and tokens, or hold effort and bank the capability gain. For the hardest agentic coding, xhigh is the new ceiling. Set it deliberately rather than inheriting the 4.6 default.

06Sonnet 5 Or Opus 4.8?Which model each step should call.

The benchmark gaps above translate into a simple routing rule: default to Sonnet 5 for the volume of agentic work, and escalate to Opus 4.8 only at the top of the difficulty range. The cleanest pipelines run a Sonnet 5 floor with an Opus 4.8 ceiling, switching models per step rather than per project — the shared tool surface and 1M context make those hand-offs cheap.

Default to Sonnet 5
High-volume agentic work

Multi-step coding agents, browser and terminal automation, content and ops pipelines, RAG and tool-calling. With tools in the loop, Sonnet 5 is within ~2 points of Opus 4.8 at 40-60% of the price.

Most production agents
Reach for Opus 4.8
Hardest reasoning, no tools

SWE-bench Pro-class refactors and unaided reasoning, where Opus 4.8 keeps a ~6-point lead. When one hard decision is worth the premium, the flagship earns it.

Top of the difficulty range
Mix per step
Tiered agent pipelines

Run cheap steps — planning, retrieval, routine edits — on Sonnet 5 and escalate only the hardest sub-task to Opus 4.8. The shared API and 1M context keep the routing clean.

Sonnet 5 floor, Opus ceiling
Re-baseline first
Before you switch a workload

Sonnet 5 uses a new tokenizer (~30% more tokens than 4.6) and adaptive thinking is on by default. Re-measure token counts and max_tokens, set effort deliberately, then evaluate on your own tasks.

Test, don't assume

07Putting It To WorkWhere a team starts today.

For a marketing or operations team, the practical opening is the work that was previously too token-expensive to automate at a frontier model’s price. Agent-written reporting, content pipelines that draft and self-check, lead-routing and CRM hygiene agents, browser automations that read dashboards and flag anomalies — these are high-volume, repetitive tasks where a near-flagship model at 40-60% of the cost changes the build-versus-skip calculation.

The sequencing we use with clients is the same regardless of model: prove value on a single well-scoped, low-risk workflow — usually read-only at first — add human-confirmed writes once the audit trail earns trust, and only then widen autonomy. Sonnet 5’s self-checking and tool fluency make that ramp faster, but the discipline is unchanged: scope tight, log everything, and route the occasional hard step to Opus 4.8 rather than over-paying for the whole pipeline. That scoping — which workflows, which guardrails, which model per step — is exactly where our agentic AI transformation engagements begin, before any model commitment. For teams rolling agents out at scale, our enterprise deployment playbook covers the governance layer.

08ConclusionThe tier ceiling moves up.

The shape of Sonnet 5, June 2026

A Sonnet that does most agentic work like a flagship, at a fraction of the price.

Claude Sonnet 5 is best read as a pricing event wearing a benchmark headline. On Anthropic’s own numbers it lands within a couple of points of Opus 4.8 wherever tools are in the loop — Terminal-Bench, computer use, knowledge work — and effectively ties the flagship on economic-value tasks. The honest qualifier is that those figures are first-party, and that Opus 4.8 keeps a real ~6-point lead on the hardest pure reasoning and end-to-end software tasks.

Keep the framing precise. This is not “Sonnet matches Opus.” It is “Sonnet now captures the part of the flagship that most production agents actually use, at 40-60% of the cost.” The hardest workloads still belong to Opus 4.8; the volume belongs to Sonnet 5. The teams that win will route between them deliberately rather than paying flagship prices for an average task.

The forward read is straightforward. When a Sonnet-tier model runs agents this close to the flagship, near-flagship agentic capability stops being a premium line item and starts being the default floor. That, not a fraction of a benchmark point, is what this release actually changes.

Turn near-flagship agents into governed workflows

Near-flagship agents make governed automation genuinely affordable.

We help marketing and operations teams put agentic models like Claude Sonnet 5 to work — scoping the right workflow, routing the hard steps to Opus 4.8, and locking the guardrails down before any autonomy, delivered in days not quarters.

Free consultationExpert guidanceTailored solutions
What we work on

Agentic automation engagements

  • Model routing — Sonnet 5 floor, Opus 4.8 ceiling, per step
  • Agent scoping — which workflows, which guardrails
  • Cost modelling against your real token mix
  • Self-checking and audit logging for compliance
  • Read-only to human-in-loop to autonomous rollout
FAQ · Claude Sonnet 5

The questions we get every week.

Claude Sonnet 5 is Anthropic's most agentic Sonnet-tier model, released on June 30, 2026. Anthropic describes it as able to make plans, use tools like browsers and terminals, and run autonomously at a level that previously required larger, more expensive models. It is the default model on the Free and Pro plans, available to Max, Team, and Enterprise users, and live across the Claude apps, Claude Code, and the Claude Platform. On the API you call it with the model ID claude-sonnet-5, and it carries a 1M-token context window with up to 128K output tokens.
Related dispatches

Continue exploring frontier releases.