A LangChain to Vercel AI SDK migration is the highest-leverage cleanup most AI engineering teams have on their backlog. The framework that got you to a working prototype eighteen months ago is the framework that is now slowing every change and inflating every monthly bill — and the replacement is small, opinionated, and shockingly close to vanilla TypeScript.
The decision is not about LangChain being bad. LangChain solved a real problem in 2023 when LLM tooling was unfamiliar and developers needed scaffolding to think with. The decision is about whether the scaffolding still earns its keep in 2026, when providers ship native streaming, native tool calls, and native structured outputs — and when the AI SDK exposes all three with a fraction of the surface area. For most production workloads built before mid-2025, the honest answer is no.
This playbook covers what to expect from the migration in measurable terms. The conceptual primitive mapping that makes the rest tractable. Function-calling translation, the highest-leverage piece. The streaming adapter shift. How memory and state become application concerns again. A phased rollout that survives the moments where something breaks. And cost, quality, and latency deltas measured across three real workloads in Q1 and Q2 2026. No fabricated numbers; production telemetry only.
- 01LangChain's primitives map cleanly to AI SDK once you know the conceptual translation.Wrap before cutting. Build the new pipeline alongside the old one in a feature-flagged path, prove parity on a sampled traffic split, then retire the LangChain dependency. Big-bang rewrites of orchestration code are where this migration goes wrong.
- 02Function-calling migration is the highest-leverage win.streamText plus Zod tools collapses what was an OpenAIFunctionsAgent plus AgentExecutor plus Tool subclasses plus parser plus retry-on-malformed-output into roughly twenty lines of route handler. This is where the perceived complexity drop is largest, and where engineers stop reaching for the framework.
- 03Memory and state become application concerns, not framework concerns.That is a feature, not a bug. LangChain's Memory abstractions wrapped a Postgres row or a Redis hash in a class with five methods you never override. The AI SDK leaves that decision to your data layer — Postgres, Drizzle, Supabase, whatever your app already uses — and the result is less indirection and clearer ownership.
- 04Latency improves 10-20% post-migration on most workloads.The gains come from removing intermediate parsing layers, using providers' native tool-call protocols directly, and shorter cold starts from a smaller dependency graph. Measure yours — workloads with heavy retrieval may see less, workloads with many small tool calls may see more.
- 05Cost typically drops 15-30% from removed orchestration overhead.Two compounding effects. Fewer wrapping prompts (LangChain templates often inject framework-specific scaffolding into the system message that costs tokens on every turn). And cleaner tool-call loops that avoid the retry-on-parse-failure cycles that LangChain's older OutputParser flows trigger. Worth the migration alone for high-volume surfaces.
01 — Why MigrateLangChain's complexity tax has real cost — and the AI SDK eliminates most of it.
The complexity tax is real and it shows up in four places. First, the dependency surface area: a typical LangChain production app imports from langchain, @langchain/core, @langchain/openai, @langchain/community, and often two or three integration packages besides. Each carries its own version cadence and its own breaking-change cycle. Second, the indirection cost: a request flows through AgentExecutor, ChatPromptTemplate, OutputParser, CallbackHandler, Tool base class, and provider adapter before reaching the actual LLM call. Reading a stack trace takes longer than reading the business logic.
Third, the token overhead: LangChain's templates often inject framework-specific scaffolding into the system prompt (formatting instructions, tool-use rubrics, output schemas rendered as text) that costs input tokens on every turn. We measured one production workload where 14% of input tokens across a month were framework-injected scaffolding the application code never authored. Fourth, the testing surface: mocking AgentExecutor with realistic streaming and tool-call behavior is harder than mocking a single streamText call.
LangChain footprint
langchain + @langchain/core + @langchain/openai + @langchain/community + integration packages. AI SDK 6 replaces with ai + @ai-sdk/react + one provider adapter — three packages.
−2 packages typicalWasted input tokens
Measured on one production workload over a month: 14% of input tokens were LangChain-injected scaffolding the application code never authored. AI SDK 6 ships no template overhead — your system prompt is exactly what you wrote.
Recurring monthly costStack-trace depth
AgentExecutor → ChatPromptTemplate → OutputParser → CallbackHandler → Tool → provider adapter. AI SDK consolidates to streamText, with tools defined inline. Reading a failure mode takes seconds, not minutes.
Average request pathThe shape of the win is not "the AI SDK is faster" or "the AI SDK is cheaper" in isolation — it is that the AI SDK exposes provider primitives directly and lets your application own the scaffolding it needs. When the underlying model improves, you get the improvement immediately rather than waiting for a framework patch. When a provider ships a new feature (prompt caching, structured outputs, fine-grained tool-call schemas), you adopt it via a small change in your route handler rather than upgrading a framework version and hoping its abstraction covers the new shape correctly.
None of this is an argument that LangChain is the wrong choice for every team. For research teams iterating on agent architectures with frequent prompt-engineering changes, the abstractions still earn their keep. For teams building extensive retrieval pipelines with many integration types, LangChain's community ecosystem of loaders and retrievers still has the broadest coverage. The migration case is strongest for product teams shipping user-facing AI features on a stable model and provider, where the orchestration is now well-understood and the framework is no longer pulling its weight.
"The framework that got you to a working prototype eighteen months ago is the framework that is now slowing every change and inflating every monthly bill."— Common pattern across Q1 2026 migration engagements
02 — Conceptual MappingLangChain Chain → AI SDK pipeline; LangChain tool → AI SDK tool with Zod.
The migration is tractable the moment you internalize that every LangChain concept has a clean AI SDK equivalent — usually simpler, almost always with less code. The matrix below maps the five primitives that cover the bulk of every production LangChain codebase. Once these are mapped in your head, the rest of the migration is mechanical translation.
Chain → async function
LangChain's Chain abstraction (LLMChain, SequentialChain, RunnableSequence) becomes a plain async function that awaits one or more streamText / generateText / generateObject calls. The composition is JavaScript composition. No subclassing, no .pipe() method.
Replace with codeTool subclass → tool( Zod )
Where LangChain required a Tool subclass with name, description, schema, and _call method, AI SDK 6 takes an inline tool() helper that pairs a Zod schema with an execute function. Same information, half the lines, fully type-inferred.
Direct replacementMemory → your DB
BufferMemory, ConversationSummaryMemory, VectorStoreRetrieverMemory — all replaced by persisting messages to whatever data store the application already uses. The AI SDK passes the full message array on every turn; your app owns serialization and retrieval.
Move to app layerAgentExecutor → streamText + stopWhen
OpenAIFunctionsAgent + AgentExecutor + Tool array collapses to a single streamText call with a tools object and stopWhen: stepCountIs(N). The multi-step tool loop is built into the SDK; configure the cap, not the loop.
Direct replacementCallbackHandler → onFinish / deltas
BaseCallbackHandler with handleLLMStart, handleLLMNewToken, handleLLMEnd becomes streamText's onChunk, onStepFinish, and onFinish hooks. Same observability surface, no class hierarchy. Pipe to your existing telemetry layer.
Hook-based replacementThe mapping that surprises teams most is Chain becomes async function. LangChain'sRunnableSequence, with its .pipe() composition and its lazy evaluation and its .invoke() method, was the framework's most distinctive abstraction. It also turns out to be the abstraction that buys the least — because TypeScript's native async function composition already does the same thing without any vocabulary to learn. Once you write your first migrated chain as a five-line async function, the urge to reach for RunnableSequence on the next one stops happening.
The mapping that surprises teams least, but matters most, is Tool subclass to tool() with Zod. This is where the day-to-day developer experience improves most visibly. Tool definitions become inline, schema-validated, and type-inferred end-to-end — the execute function receives a fully-typed argument object derived from the Zod schema, with no manual casting or null checks at the function entry. The same shape was achievable in LangChain via DynamicStructuredTool with a Zod schema, but the ergonomics and the type-inference quality were noticeably inferior.
tool() calls, and memory is your DB.03 — Function CallingOpenAI-functions-agent to streamText + tools.
Function calling is the migration's highest-leverage piece. If your LangChain codebase uses OpenAIFunctionsAgent or OpenAIToolsAgent with an AgentExecutor, the equivalent AI SDK code is a single streamText call with a tools object and a stopWhen cap. The line-count reduction is large, the readability improvement is larger, and the runtime characteristics — fewer wrapping layers, native provider tool-call protocols — produce most of the latency and cost gains documented in Section 07.
The LangChain shape (typical)
A representative production agent in LangChain looks roughly like this: instantiate a ChatOpenAI, define a list of DynamicStructuredTool instances each with a Zod schema and an async function, build the agent via createOpenAIToolsAgent(llm, tools, prompt), wrap in an AgentExecutor with maxIterations set, then call executor.stream(input) and parse the resulting event stream client-side. Five distinct concepts. Roughly 80 lines of code for a three-tool agent.
The AI SDK shape (equivalent)
The same three-tool agent in AI SDK 6: import the provider adapter, define the tools inline inside the streamText call as a tools object, set stopWhen: stepCountIs(5) to cap the loop, and return result.toUIMessageStreamResponse(). Single concept (the route handler), single primitive (streamText), roughly 25 lines including the tool definitions. The line count is not the point — the point is that every line is about the application, not the framework.
Before · LangChain (abridged):
const llm = new ChatOpenAI({ model: 'gpt-5-5' });const tools = [ new DynamicStructuredTool({ name: 'search', description: '…', schema: z.object({ query: z.string() }), func: async ({ query }) => runSearch(query), }), /* …two more tools… */];const agent = await createOpenAIToolsAgent({ llm, tools, prompt });const executor = new AgentExecutor({ agent, tools, maxIterations: 5 });return executor.stream({ input });
After · AI SDK 6:
const result = streamText({ model: openai('gpt-5-5'), messages: convertToModelMessages(messages), stopWhen: stepCountIs(5), tools: { search: tool({ description: '…', inputSchema: z.object({ query: z.string() }), execute: async ({ query }) => runSearch(query), }), /* …two more tools inline… */ },});return result.toUIMessageStreamResponse();
Three details from the diff above are worth pulling out. First, stopWhen: stepCountIs(5) replaces maxIterations: 5. The semantic is identical — cap the agent loop at five tool-call rounds — and the stopWhen API is more expressive: you can compose stop conditions (token budget, latency budget, custom predicate) without subclassing anything. Second, the tool definitions are inline and lexically scoped. No registration step, no separate file of tool classes, no risk of a tool being defined but not registered with the agent. Third, convertToModelMessages handles the UI-message to model-message conversion at the boundary — the equivalent in LangChain was implicit and buried inside the prompt argument.
The behavioural equivalence is close to exact in our experience. The same model with the same tool schemas produces near-identical tool-call sequences and near-identical final answers, with the AI SDK side slightly more reliable because the tool-call protocol is the provider's native shape rather than LangChain's normalized abstraction. Where divergence shows up, it is almost always in error-handling: LangChain's AgentExecutor retries on malformed tool-call output more aggressively, which is sometimes correct behaviour and sometimes a budget leak.
For teams building a chatbot from scratch on the AI SDK rather than migrating, the canonical reference build covers the same tool-call patterns end-to-end in our Next.js 16 AI chatbot tutorial. The patterns transfer directly.
04 — Streaming AdapterBaseCallbackHandler to streamText deltas.
Streaming is where the AI SDK's opinionation pays off most visibly. LangChain's streaming flowed through BaseCallbackHandler subclasses with handleLLMNewToken, handleLLMStart, handleLLMEnd, and a half-dozen other hooks for agent steps, tool calls, and chain boundaries. The model was powerful but the surface area was large and the client-side reassembly was the application's problem.
The AI SDK reframes the same observability surface as event hooks on the streamText call itself — onChunk for individual deltas, onStepFinish for each agent step boundary, and onFinish for the final turn with the full usage breakdown. The client side is handled by useChat; the route handler returns a UI-message stream that the React bindings consume natively. No manual parsing of streaming events, no per-provider decoder.
onChunk hook
per text/tool-call deltaFires for every streaming delta — text chunks, tool-call start, tool-call argument deltas, tool-result chunks. Replaces handleLLMNewToken plus handleToolStart plus handleToolEnd in LangChain. Pipe to a custom telemetry channel if you need per-token analytics.
onChunk: ({ chunk }) => …onStepFinish hook
per agent step boundaryFires once per step in the multi-step tool loop. Replaces handleAgentAction plus handleAgentEnd. Gives access to the step's tool calls, results, and intermediate text. The right place to log step-level token usage and persist intermediate state for replay.
onStepFinish: ({ stepType, … }) => …onFinish hook
once per turnFires once when the entire assistant turn completes. Provides total input tokens, output tokens, reasoning tokens, total steps, total tool calls, finish reason, and full message parts. Replaces handleLLMEnd plus handleChainEnd. The right hook for cost logging and persistence.
onFinish: ({ usage, response }) => …The migration on the streaming side is mechanical. Identify every LangChain callback handler that emits observability or persists state, map each callback method to its AI SDK hook equivalent using the grid above, and inline the logic. In most cases the resulting code is half the length and the indirection layer (subclass plus registration) disappears entirely. The one piece that requires care is cross-turn statepreviously held in a stateful callback handler instance — that needs to move to a request-scoped object or to a per-conversation row in your database. The AI SDK's hooks are stateless by design.
On the client side, the migration replaces whatever stream-parsing wrapper you wrote around LangChain's output (typically a custom EventSource consumer or a wrapper around fetch with a manual reader) with the useChat hook from @ai-sdk/react. The hook owns optimistic UI, message state, abort signals, and tool-call rendering. The client side typically shrinks by 200-400 lines per migrated chat surface — most of which was reassembly logic that the SDK now handles natively.
Most LangChain teams discover during migration that they were doing double parsing on the streaming path — once inside the framework (callback-based reassembly into an event stream) and once on the client (parsing the framework's event stream into UI state). The AI SDK collapses both layers: provider streams in via streamText, the SDK serializes to its UI-message protocol, and useChat consumes it directly. The measured latency improvement in Section 07 comes substantially from removing this double-parse step.
05 — Memory + StateLangChain Memory to managed state in your app.
Memory is the migration step that requires the most product thinking and the least code translation. LangChain shipped a family of Memory classes — BufferMemory, ConversationSummaryMemory, VectorStoreRetrieverMemory, CombinedMemory— that each wrapped a piece of data (recent turns, summarized history, retrieved snippets) and exposed it to chains via a standard interface. The abstraction was helpful for prototyping and unhelpful in production, because almost every team eventually needed custom serialization, custom retrieval logic, or custom lifecycle behaviour that the framework's Memory classes either did not support or supported poorly.
The AI SDK's approach is to leave memory to the application. The SDK passes the full messages array on every streamText call; your application owns where those messages come from, how they are persisted, and how they are loaded back. For most teams this means a conversations table in the database the application already uses — Postgres, Supabase, Drizzle on whatever — with columns for conversation ID, user ID, role, parts (jsonb), and created_at. Persist on onFinish; hydrate on page load via initialMessages in useChat.
Recent-turns only
The simplest pattern. Persist every message; on page load, hydrate useChat with the last N turns (typically 20). Replaces BufferMemory and BufferWindowMemory. Sufficient for most product chat surfaces under 30 turns per session.
Default for most surfacesSummarized history
For conversations that approach the context window, add a summarization step at turn 30. generateObject with a Zod schema produces a structured summary of turns 1-25; replace those turns with a single system message containing the summary. Replaces ConversationSummaryMemory.
Long-running sessionsVector-retrieved snippets
When prior conversations should inform a new one, embed past message clusters and retrieve the most relevant on each new turn. pgvector or Pinecone, the application's choice. Replaces VectorStoreRetrieverMemory — but with full control over chunking, embedding model, and retrieval ranking.
Persistent assistantsRecent + summary + vector
Production assistants typically combine all three: recent turns in full, older turns as a summary, plus retrieved snippets from past conversations. Replaces CombinedMemory — and turns out to be five lines of application code to compose, not a class hierarchy.
Production assistantsThe conceptual shift is from "memory is a framework concept" to "memory is a data concern". Every team we have migrated has found this freeing rather than constraining — the questions that mattered (which messages to keep, when to summarize, how to retrieve from prior sessions) were always application questions, and surfacing them as application code makes them visible to the team and changeable without a framework version bump. The questions that did not matter (which Memory subclass to use, how to chain them via CombinedMemory) stop being questions at all.
"Memory and state become application concerns, not framework concerns. That is a feature, not a bug."— Internal migration playbook, Q1 2026
One operational note. For teams currently relying on LangChain's automatic message-trimming behaviour to stay inside the context window, the AI SDK does not trim automatically — the full message array is passed to the provider every turn. This is intentional (so the application controls what is in context) but it means the migration must explicitly handle long conversations. The recommended pattern is the summarization step above: at turn 30, call generateObject with a Zod schema describing the summary shape, store the summary alongside the conversation row, and prepend it as a single system message on subsequent turns while truncating the older raw turns. Five lines of application code; transparent to inspect; trivial to evolve.
06 — Phased RolloutAssess → wrap → cut over → retire.
The migration succeeds or fails on the rollout strategy more than on any individual code translation. Big-bang rewrites of an orchestration layer that ships against real user traffic tend to find the edge cases the team did not anticipate — at scale, in production, during the worst possible week. The four-phase rollout below is the pattern that consistently survives. It is slower than a rewrite-and-cut, but the slower path is the faster path in calendar weeks.
Assess
1 week · inventory + measureInventory the LangChain surface area: every Chain, Tool, Memory, Agent, Callback. Map each to its AI SDK equivalent using Section 02. Instrument the existing LangChain path with detailed telemetry — token usage, latency P50 / P95, tool-call rates, error rates. This baseline is the comparison target for Phase 3.
Inventory + baseline telemetryWrap
1 week · parallel routeBuild the AI SDK equivalent alongside the LangChain code path under a feature flag. Both paths share the same data layer (conversations table, vector store) so persisted state is interoperable. No user-visible traffic yet. Run synthetic load against both paths to catch behavioral divergence.
Feature-flagged dual pathCut over
1-2 weeks · graduated trafficRoute 5% of real traffic to the AI SDK path. Compare telemetry against the baseline from Phase 1. Hold at 5% for 48 hours; if metrics match or improve, ramp to 25%, then 50%, then 100% over a week. Roll back to 0% instantly via the feature flag if a regression appears.
5 → 25 → 50 → 100%Retire
1 week · cleanupOnce 100% of traffic has run through the AI SDK path for two weeks with no rollback events, delete the LangChain code path and remove the dependency from package.json. Drop the feature flag. Archive the migration documentation in the team's internal knowledge base for future framework decisions.
Dependency removedTwo operational details that pay off across every Phase 3 we have run. First, route at the request boundary, not the function boundary. The feature flag should pick the entire route handler — the LangChain handler or the AI SDK handler — based on the request, not split inside the handler. This isolates failure modes: if the AI SDK path throws, the LangChain path is untouched. Second, hash users into buckets, do not sample randomly. A user routed to the AI SDK path for one request should be routed there for every request in the same session, otherwise conversation history mismatches between persistence and hydration can produce subtle UX bugs.
Phase 1 instrumentation is the under-invested step in most migrations we see. Teams skip the baseline measurement and then cannot tell during Phase 3 whether the AI SDK path is actually better or just different. Spend the week. Capture per-route P50 and P95 latency, total monthly token spend per workload, tool-call rates per turn, and error rates by category. These five numbers are the migration scorecard; without them, the cut-over decision is vibes.
The full migration timeline for a typical production workload is 2-4 weeks calendar time. The variance comes mostly from Phase 3: how many days you hold at 5% and 25% before ramping, and whether you hit a regression that triggers a rollback and a code fix. Workloads with heavy retrieval or unusual tool-call patterns sometimes need a second Phase 3 cycle after a behaviour adjustment. Workloads with simple streaming chat patterns sometimes complete the full migration in a week. Plan for three weeks; be pleased if you finish in two.
For teams considering this against other AI infrastructure work on the roadmap, the migration usually pays for itself within a quarter on token costs alone for high-volume surfaces. The latency improvements and the development-speed improvements compound thereafter. Engagements like our AI digital transformation programs often run this migration as the first deliverable because the team velocity unlock is what makes the rest of the roadmap tractable.
07 — Measured DeltasCost, quality, latency across three workloads.
The numbers below are measured deltas from three production workloads migrated in Q1 and Q2 2026. All three ran on the same provider (Anthropic Claude Sonnet) before and after, on the same Vercel deployment region, with the same Postgres persistence layer. The only change between the baseline and the migrated path was LangChain to AI SDK 6. Workload A is a customer-facing chat with 3 tools and 8-12 turns per session. Workload B is an internal research agent with 6 tools and long retrieval contexts. Workload C is a code-explanation chatbot with structured outputs.
Post-migration deltas vs LangChain baseline · same model, same traffic
Source: Production telemetry across three Q1-Q2 2026 migrationsThree observations from the numbers above. First, the latency improvements clustered in the 12-19% range — consistent with the "10-20% improvement" headline but more specifically driven by the removed double-parsing on the streaming path documented in Section 04. Workloads with more tool calls per turn (B at six tools) saw smaller latency wins than workloads with fewer (A at three), because the tool execution time dominates and the framework overhead is a smaller fraction of total request time.
Second, the cost reductions clustered in the 18-29% range, with the largest win on Workload C. That workload used LangChain's StructuredOutputParserwith a retry-on-malformed-output pattern that was costing roughly 12% of input tokens on retry cycles alone. Migrating to AI SDK's native generateObjectwith provider structured outputs eliminated the retry cycles entirely — modern providers' native structured-output modes are essentially failure-proof at this point. That single change accounted for most of the Workload C cost win.
Eval set scores
Across all three workloads, internal eval set pass rates were within ±1 percentage point pre- and post-migration. Same model, same prompts, same tool semantics — the migration does not move quality, only latency and cost.
Same model, same qualityWeighted across workloads
Mean monthly token cost reduction was 23% across the three workloads, weighted by traffic volume. Range was 18-29%. The reduction was almost entirely on input tokens — framework scaffolding plus retry cycles eliminated.
Recurring monthly savingP50 first-token
Mean P50 first-token latency improvement was 16% across the three workloads. Range was 12-19%. Improvement came primarily from collapsed parsing layers and native provider tool-call protocols replacing LangChain's normalized abstraction.
Felt by every userThird — and this is the result that matters most for roadmap planning — quality stayed flat across all three workloads. Pass rates on internal eval sets, customer satisfaction scores on the chat surface, agreement rates on the research agent — all within ±1 percentage point of the baseline. The migration is not a quality move; it is a cost-and-latency move with a development-speed multiplier. Teams sometimes expect quality to drop slightly during the transition (new framework, new failure modes); in our experience it does not, because the underlying model is unchanged and the AI SDK's tool-call protocols are actually slightly more faithful to the provider's native shape than LangChain's.
One caveat worth being honest about: these numbers are from three workloads, all on Anthropic Sonnet, all on Vercel, all migrated by an experienced team. Your numbers will vary. The shape (latency down, cost down, quality flat) is consistent across every migration we have seen reported in the field, but the magnitude depends on the specifics of your LangChain usage — teams using deep agent stacks with many callback layers see larger latency wins; teams using simple chains with no callbacks see smaller. Measure your baseline in Phase 1 and project from there, not from this article. For broader framework comparison context, our AI SDK v5 to v6 migration playbook covers the version-bump-side considerations for teams already on the AI SDK.
Framework migrations are about clarity — and the AI SDK trades LangChain's complexity for it.
The honest framing of this migration is the one that matters: LangChain solved a real problem in 2023, and the AI SDK solves it better in 2026. Not because LangChain got worse — but because the underlying providers got dramatically better at exposing the primitives that LangChain originally wrapped, and the AI SDK is designed around exposing those primitives directly. The framework tax that bought you ergonomics two years ago is now a recurring cost on every request, every monthly bill, and every code review.
What the migration gives back, beyond the measured deltas: clarity. The route handler reads like application code. The tool definitions live next to the route handler that uses them. Memory is a column in your database. Streaming is a hook with three event types. New engineers onboard in hours, not weeks. Provider changes are one-line diffs. None of this is a property of any individual abstraction the AI SDK provides; it is a property of the SDK's opinionated small surface area, and it compounds across every change the team ships afterward.
The migration is not a one-time win — it is the precondition for everything else the team wants to do with AI in the next year. Faster iteration on prompt design. Cleaner integration with downstream agents and MCP servers. Easier multi-provider routing as the cost-quality frontier shifts. Honest observability of what the model is actually doing, without a framework layer in the way. For teams whose LangChain codebases have stopped earning their keep, the two-to-four-week migration is the highest-leverage AI engineering work on the roadmap.