An Anthropic SDK v2 to v3 migration looks small from the outside — a major-version bump on a single dependency — and unusually large from the inside. The TypeScript SDK v3 release reshapes four surfaces at once: typed streaming delta events, a structured tool-result schema with explicit error discrimination, a first-party compaction API that replaces hand-rolled summarisation, and a wave of parameter renames cleaned up under the hood.

None of those changes is conceptually exotic; each one collapses a class of fragile custom code that production teams have been carrying since v1. The migration risk sits in the combination — a codebase that has hand-typed stream parsers, custom tool-result error handling, and a bespoke summarisation layer will touch every one of the four breaking-change axes in the same week.

This playbook covers what changed, where the breakage actually lands, the wrap-then-cut compatibility pattern that keeps the migration reversible, and the four pitfalls we have watched teams hit during the first weeks after the release. Everything below is grounded in the public Anthropic TypeScript SDK changelog and the patterns we ship for client repos.

Key takeaways

01
Typed streaming removes the most fragile custom code in v2 codebases.Adopt fully, not partially. Hand-typed stream parsers were the single largest source of production bugs in v2; the v3 delta-event union types eliminate the entire class.
02
Tool-result schema strictness lifts production reliability.But requires per-tool migration. v3 introduces structured inputs and an is_error discriminator, so silent failures stop masquerading as successful outputs in your traces.
03
Compaction API replaces hand-rolled summarisation.Trust it; remove your custom code. The built-in conversation compaction is calibrated to Anthropic's own context-management research and produces better recall than most bespoke layers.
04
Parameter renames are mechanical — codemods cover most.Don't fear this. The official codemod sweep handles the bulk of camelCase / snake_case alignment and renamed option keys; manual cleanup is a focused review pass, not a rewrite.
05
Wrap-compatibility layer makes incremental migration safe.Adapter pattern wins. A thin shim that exposes v2 call signatures backed by v3 internals lets you cut over one call-site at a time and roll back without a re-deploy.

01 — What's Newv3 ships in four axes — typed streaming, tools, compaction, types.

The v3 release sits on four breaking-change axes. Knowing which ones you touch determines the size of the migration. A single-call-site script that streams completions and ignores tools is a one-hour job; a multi-agent platform with custom tool-result handling and a bespoke summarisation layer is a multi-day rebuild.

The four axes, in roughly increasing order of surface area:

Typed streaming. The streaming surface returns a discriminated union of delta events with explicit TypeScript types — message_start, content_block_delta, message_delta, message_stop, and the new thinking_delta for extended-thinking traces. v2 returned untyped chunks that every team parsed by hand.
Tool-result schema. Tool inputs are structured with named fields, and tool results carry an is_error discriminator so failures stop being plain strings. The schema is enforced at the SDK boundary, which means malformed tool calls fail loudly rather than silently passing through to the model.
Compaction API. A first-party messages.compact()method that trims conversation history while preserving the signal Anthropic's own context-management research has flagged as important. Replaces the "summarise the last N turns" helpers most teams maintained in-house.
Parameter renames and types. A general cleanup pass — option keys aligned, deprecated aliases removed, response types refined. The official codemods cover the bulk of the mechanical work.

Axis 1

Streaming

Typed delta events

Discriminated union over message_start, content_block_delta, message_delta, message_stop, plus the new thinking_delta. Eliminates hand-typed parsers — the most fragile code in v2 codebases.

High value · High blast radius

Axis 2

Tools

Structured tool I/O

Tool input is a typed object, tool result carries is_error discriminator. Failures stop being strings that the model interprets — they surface as discrete error states the runtime can branch on.

Per-tool migration

Axis 3

Compaction

Built-in summarisation

messages.compact() trims history with Anthropic's recommended retention policy. Replaces bespoke 'summarise last N turns' helpers. Better recall than most in-house implementations.

Delete custom code

Axis 4

Renames

Option keys & types

Codemod-covered for the bulk. camelCase / snake_case alignment, deprecated alias removal, response-type refinement. Manual cleanup is a focused review pass, not a rewrite.

Mechanical

Sizing rule of thumb

One-axis migrations finish in a day. Two-axis migrations take a week. Four-axis migrations — a platform that hits every breaking change at once — should be wrap-then-cut over one to two weeks, not a single PR. The worst outcomes we have watched come from teams treating a four-axis migration like a one-day refactor.

02 — Typed StreamingDelta events become first-class types.

The streaming surface is where v3 pays off most visibly. In v2, every team that consumed streaming responses wrote its own delta parser — usually a switch statement over event-type strings, with string-typed payloads that the compiler could not check. Those parsers were the single largest source of production bugs we saw in v2 codebases. A new event type would ship, the parser would fall through to its default branch, and a fraction of streamed tokens would silently disappear from the rendered output.

In v3, the stream returns a discriminated union. Each event carries an explicit type literal and TypeScript narrows the payload accordingly. The compiler now catches the exhaustiveness holes that v2 left to runtime. The recommended shape is aswitch on event.type with an unreachabledefault branch — anything missing is a compile error.

The five event types you handle in practice:

message_start — opens the stream with the message envelope. Use it to initialise per-message state in your renderer.
content_block_start / content_block_delta / content_block_stop — the body of the response. Text blocks deliver token-by-token deltas; tool-use blocks deliver input JSON deltas; thinking blocks deliver thinking_delta events that you may choose to render or hide.
message_delta — terminal metadata (stop reason, output token count). Use it to finalise your message state before message_stop.
message_stop — closes the stream. Commit the rendered message, emit telemetry, return.

The migration discipline here is to adopt fully, not partially. Wrapping the v3 stream behind a v2-shaped emitter (untypedchunk events) defeats the purpose — you keep paying the parsing-bug tax. Spend the extra day rewriting your renderer against the typed events; it will be the last time you touch the file.

"Hand-typed stream parsers were the single largest source of production bugs we saw in v2 codebases. v3 eliminates the entire class — adopt the typed events fully, not partially."— Production lesson · Digital Applied SDK migration kit

For teams already running extended thinking on Claude Opus or Sonnet, the new thinking_delta stream is the cleanest way to surface reasoning traces. v2 required parsing the thinking block out of the final message; v3 streams it alongside the answer with its own discriminated event. Most production UIs hide the trace by default and reveal it on demand — the typed stream makes that toggle a one-line conditional rather than a buffer-and-parse dance.

03 — Tool-Result SchemaStructured inputs and error discrimination.

Tool use is the second axis where v3 changes the shape of the data you handle. In v2, tool results were strings — whether the tool succeeded, partially succeeded, or threw a hard error, you handed the model a string and let the prompt do the work of distinguishing outcomes. That worked, more or less, until the day a long-running tool returned an empty string on timeout and the model confidently reported the operation as successful.

v3 introduces an is_error discriminator on every tool result and structures the tool input as a typed object. The two changes work together — typed inputs let the SDK reject malformed calls at the boundary, and the error discriminator lets the runtime branch on failure without consulting the model. Silent failures stop masquerading as successful outputs in your traces.

The choice matrix below maps the v2 shape onto the v3 shape for each common tool-use pattern:

v2 → v3

Tool input · string blob → typed object

v2 accepted arbitrary input JSON and validated downstream. v3 validates against the declared tool schema at the SDK boundary — malformed inputs fail fast with a typed error instead of reaching your handler.

Migrate per-tool

v2 → v3

Tool result · string → structured

v2 tool results were strings. v3 results carry a content array plus an is_error boolean. The runtime can branch on is_error without parsing the result string; the model still sees a usable representation.

Branch on is_error

v2 → v3

Error handling · silent → explicit

v2 errors were either thrown (caught by the caller, lost to the model) or stringified (passed to the model, sometimes misinterpreted). v3's is_error path delivers both — runtime branches and model awareness.

Explicit error path

v2 → v3

Migration order · per-tool, not bulk

Migrate one tool at a time, behind a thin adapter that re-shapes v2-style results into v3 envelopes. Cut the adapter once every tool is on the v3 schema. Bulk migration creates a long-lived broken state.

Per-tool sequence

The migration discipline mirrors the streaming axis: do it per-tool and do it fully. A half-migrated tool that still returns strings but pretends to be v3-shaped is the worst of both worlds. The adapter pattern is the safer move — for each tool, wrap the v2 handler with a thin function that constructs a v3 result envelope, then cut the wrapper once the handler itself is rewritten.

One operational note worth highlighting: the new tool schema interacts with model-side tool choice. If you previously coaxed tool selection by prompt-engineering hints into the tool name or description, the more strictly-typed tool definitions in v3 make those hints redundant — and occasionally counter-productive. Re-run your eval suite on the cleaned-up tool definitions before assuming equivalent selection behavior.

Per-tool migration order

Pick the lowest-traffic tool first, migrate it end-to-end, ship, watch your traces for a day. Then the next. Bulk-migrating ten tools in one PR is the path to a long-lived broken state where nothing is on v2 and nothing is fully on v3. We have seen that pattern cost a team a full sprint that should have been two days.

04 — Compaction APIBuilt-in conversation compaction.

The compaction API is the change most teams underestimate. Every production v2 deployment we audited carried a hand-rolled summarisation layer — a helper that grabbed the last N turns, asked the model for a digest, and replaced the trimmed history with the digest. Those helpers worked, but they leaked context in characteristic ways: tool-call rationales got summarised away, extended-thinking traces were ignored, and the recall on multi-step plans degraded steadily across long sessions.

v3 ships messages.compact()as a first-party method. It accepts a conversation, a target token budget, and an optional retention policy, and returns a compacted history that preserves the signal Anthropic's own context-management research has flagged as important — tool calls, decisions, plan state — while collapsing the verbose intermediate content.

The discipline is to trust it. The single most common mistake we have seen in early v3 migrations is keeping the bespoke summarisation layer running alongside the new API, on the theory that the team's hand-tuned prompts are smarter than the SDK default. They usually are not — and the moment you keep both, you carry the maintenance cost of both indefinitely.

Approximate recall · bespoke summarisation vs v3 compaction

Recall framings are illustrative — exact numbers depend on conversation length, tool taxonomy, and the eval prompts you use.

Bespoke summarisation (v2 era)Hand-rolled · variable recall · tool-call rationale often lost

~baseline

messages.compact() defaultv3 first-party · retention policy preserves plan state

Higher recall

messages.compact() tunedv3 with retention policy customised for your tool taxonomy

Best recall

For most teams the practical migration is a three-line change in the orchestrator: replace the call to summariseHistory() with messages.compact(), delete the bespoke helper, and re-run the eval suite. If recall holds, ship. If it regresses on a specific class of conversation — a long multi-tool agent, a multi-document RAG flow — the retention-policy hook is where you customise.

05 — Renamed ParamsThe compatibility matrix.

The fourth axis is the most mechanical and the least scary. v3 cleans up a backlog of parameter naming inconsistencies — option keys that drifted between camelCase and snake_case, response fields whose names did not match their semantics, deprecated aliases that had been carried forward since v1. The official codemods cover most of it; what remains is a focused review pass.

Four broad categories cover the work, summarised in the grid below:

Category 1

Option key casing

camelCase / snake_case alignment

TypeScript SDK now uses camelCase consistently at the call-site, snake_case on the wire. Codemods rename the call-sites; manual cleanup verifies no string-typed config maps slipped through.

Codemod-covered

Category 2

Deprecated aliases removed

max_tokens_to_sample → max_tokens

Aliases carried since v1 are gone. The codemod renames the call-sites; review pass catches any options blob constructed dynamically (config files, env-driven overrides, test fixtures).

Codemod + review

Category 3

Response field refinement

Narrowed union types

Response types narrowed — stop_reason, content block kinds, usage breakdown. Compiler catches most call-sites that assumed the wider v2 shape; manual fix is straightforward.

Compile-caught

Category 4

Error type hierarchy

Discriminated error classes

v3 surfaces a proper error class hierarchy — RateLimitError, OverloadedError, BadRequestError, AuthenticationError. Replaces v2's loosely-typed catch-and-inspect pattern with branchable instanceof checks.

Refactor opportunity

The codemod sweep is the right first step on this axis. Run it on a clean working tree, review the diff, fix the spots the codemod could not reach — typically dynamic config builders and test fixtures with string-typed option keys — and the rest is a compile-then-test cycle.

Where the renames meet the type refinements, you may discover a handful of call-sites that were working in v2 only because the wider response type let through cases the model never actually produced. Those are usually latent bugs, not migration costs. Treat them as a free find.

06 — Phased RolloutAudit → wrap → cut over → retire.

The wrap-then-cut pattern is the safest path through a four-axis migration. It treats v3 adoption as an incremental, reversible operation rather than a single switch flip — and the cost of the adapter shim is repaid the first time you need to roll a call-site back without re-deploying the whole service.

Four phases, in order. Each phase is independently shippable; you should be able to pause at the boundary of any phase and stay in production.

Phase 1 — Audit. Run the codemod sweep on a throwaway branch. Inventory every call-site touched and every tool definition. Identify which of the four axes each touches. Score the migration size; if it is four-axis, plan for two weeks, not two days.
Phase 2 — Wrap. Introduce the v3 dependency alongside v2 (alias the package or vendor it under a different name). Build an adapter that exposes v2-shaped call signatures backed by v3 internals — typed streams unwrapped to chunks for consumers not yet migrated, v3 tool results normalised back to strings for handlers not yet migrated.
Phase 3 — Cut over. Per call-site, switch from the adapter to direct v3 calls. Migrate streaming consumers first (highest payoff), then tool handlers (per-tool, lowest traffic first), then compaction (one-line swap), then the renames (codemod-covered).
Phase 4 — Retire. Once every call-site is on v3, delete the adapter and the v2 dependency. Run the eval suite one more time. Tag a release.

Why wrap-then-cut wins

The alternative — a single big-bang migration PR — looks faster on paper. In practice, big-bang PRs sit unmerged for weeks because every reviewer wants to verify a different surface, every CI failure touches three axes at once, and any rollback requires reverting everything. Wrap-then-cut ships a series of small, independent PRs, each of which is reviewable in an hour and revertable in a minute.

For teams that prefer outside execution on the migration itself — including the audit, the adapter, the per-tool migrations, and the rollback playbook — our AI transformation engagements run this end-to-end with a published rollback procedure for the first 24 hours after each cutover. The same approach applies to model migrations on the API surface — we keep an active playbook for the Claude Opus 4.6 to 4.7 migration, which sits one layer above the SDK migration covered here.

07 — Common PitfallsFour ways the migration bites.

Four failure modes show up reliably in the first weeks after a major SDK bump. None is exotic; each is preventable with the wrap-then-cut pattern above. Naming them up front saves a Friday evening.

Partial streaming migration. Team adopts the typed delta events on the new code path, leaves the old code path consuming untyped chunks via the adapter, and forgets to cut the adapter. Six months later the codebase carries both parsers and the bug surface has doubled. Fix: cut the adapter the same week you finish the streaming consumers — it is a ten-line PR.
Tool migration drift. A team migrates the high-traffic tools first, ships, declares victory, and never comes back for the long-tail. Six weeks later a low-traffic tool that still returns a v2-shaped string masquerades as an error because the runtime is now branching on is_error. Fix: track tool-migration progress as a numbered list in the migration doc and close it out.
Custom compaction shadow-running. Team adopts messages.compact() on the agent path but leaves the bespoke summariser running in the chat path on the theory that chat is different. Now you maintain both implementations indefinitely. Fix: rip out the bespoke summariser the same sprint you adopt the API. If the eval suite shows a regression, tune the retention policy, not the alternative.
Codemod drift. Team runs the codemods early in the migration, then continues making changes against v2-shaped code for weeks before cutting over. When the cutover lands, another codemod pass is needed — and the second pass surfaces merge conflicts in every PR. Fix: gate new code against v3 from the moment Phase 2 lands, even while old code is on the adapter.

The unifying theme is that v3 rewards full adoption and punishes half measures. The SDK is internally consistent — typed streaming assumes typed tool I/O assumes structured compaction assumes cleaned-up parameter naming. A codebase that runs half on v2 and half on v3 carries both consistency models and the seams between them quietly accumulate bugs. The wrap-then-cut phasing buys you time; it does not buy you permission to skip the cut.

If the migration unlocks a deeper refactor — building custom subagents on top of the cleaned-up tool schema, for example — the Claude Code custom subagent walkthrough picks up where this playbook leaves off. The structured tool I/O in v3 is the substrate that subagent tool allowlists actually depend on.

Conclusion

SDK migrations compound — the wrap-then-cut pattern wins every time.

An Anthropic SDK v2 to v3 migration is unusually consequential for a major-version bump because it touches four surfaces at once. Typed streaming retires the most fragile custom code in v2 codebases. Tool-result schema strictness lifts production reliability. The compaction API replaces hand-rolled summarisation. The parameter renames are the easy part. Done in sequence with an adapter shim in the middle, each phase ships independently and rolls back cleanly.

The pattern we have watched teams regret is the opposite — a single big-bang PR that touches every axis, sits unmerged for weeks, and ships with the seam between v2 idioms and v3 idioms quietly carried forward. v3 rewards full adoption and punishes half measures, and the wrap-then-cut phasing is what makes full adoption reachable without freezing the rest of the roadmap.

Practical next step: run the codemods on a throwaway branch this week. Score the migration size against the four axes. If it is one or two axes, plan a focused PR. If it is three or four, schedule the wrap-then-cut over two weeks with an explicit rollback playbook for each cutover. Either way, the SDK release is the right occasion to delete custom code you have been carrying since v1.

Anthropic SDK v2 to v3 Migration Playbook: TypeScript

01 — What's Newv3 ships in four axes — typed streaming, tools, compaction, types.

Typed delta events

Structured tool I/O

Built-in summarisation

Option keys & types

02 — Typed StreamingDelta events become first-class types.

03 — Tool-Result SchemaStructured inputs and error discrimination.

Tool input · string blob → typed object

Tool result · string → structured

Error handling · silent → explicit

Migration order · per-tool, not bulk

04 — Compaction APIBuilt-in conversation compaction.

Approximate recall · bespoke summarisation vs v3 compaction

05 — Renamed ParamsThe compatibility matrix.

Option key casing

Deprecated aliases removed

Response field refinement

Error type hierarchy

06 — Phased RolloutAudit → wrap → cut over → retire.

07 — Common PitfallsFour ways the migration bites.

SDK migrations compound — the wrap-then-cut pattern wins every time.

SDK migrations are low-risk when phased correctly — and high-risk when rushed.

SDK migration engagements

The questions developers ask before bumping the SDK.

Continue exploring AI engineering playbooks.

Claude Opus 4.6 to 4.7 Migration Playbook: Breaking Changes

LangChain to Vercel AI SDK Migration Playbook: 2026

DeepSeek V3.2 to V4 Migration Playbook: Open-Weight Stack