Vercel AI SDK 6 is the de facto standard for shipping AI features on Next.js and the broader React ecosystem — yet most teams adopt roughly thirty percent of its surface area, leave the rest unused, and reach for third-party libraries to fill gaps the SDK already covers. This deep dive walks the full feature set so the next decision you make is informed by what exists, not what you remember from the quickstart.
The reason this matters in mid-2026: the cost of choosing wrong inside the AI stack compounds quickly. Provider lock-in, ad-hoc streaming protocols, hand-rolled tool-call shapes, custom agent loops — each of these is a maintenance liability that the SDK has quietly solved over its 6.x line. Teams that adopt the full surface area ship faster and pay less in technical debt twelve months later.
What this guide covers: the useChat message-parts model and its optimistic-UI behavior; streamText as the right server primitive and how it routes reasoning effort; tool calls with Zod input schemas and the full run-call lifecycle; the provider-adapter pattern across Anthropic, OpenAI, Google, Mistral, and xAI; multi-modal generation (image, audio, video); the new agent-loop primitive that closes the framework gap; and a measured comparison against LangChain on performance and cost.
- 01useChat message-parts is the killer feature.The part-oriented message shape — text, tool-call, tool-result, reasoning, all interleaved — is what enables real agentic UX. Render every part type or you lose tool-call visibility entirely.
- 02streamText is the right server primitive.One API for streaming text, tool execution, reasoning-mode routing, and provider switching. Reach for generateText only when you actively don't want streaming — which is almost never on a user-facing surface.
- 03Tool calls with Zod simplify everything.Define a name, a description, a Zod input schema, and an execute function. The SDK handles routing the model's call to your function and serializing the result back as a message part. The error-discrimination shape is what production code needs.
- 04Provider adapter keeps you portable.Five-plus mainstream providers behind the same model interface. One environment variable flips the underlying vendor. Provider lock-in is the silent killer of long-lived AI products; the SDK eliminates it cleanly.
- 05The agent loop primitive closes the framework gap.AI SDK 6 adds first-class autonomous-loop support — multi-step planning, stop conditions, tool sequencing — without forcing you to adopt LangChain's broader abstraction. That removes the strongest historical reason to layer another framework on top.
01 — Why Deep DiveAI SDK 6 is the standard — most teams use 30%.
Adoption of the Vercel AI SDK has crossed the line from "interesting option" to "default expectation" over the last two release cycles. Every major Next.js starter template now ships with it pre-wired; the React ecosystem's chat and agent tutorials assume it; downstream agentic frameworks (Mastra, Inngest's agent kit, even LangChain's TypeScript port) integrate against it rather than competing with it. The 6.x line consolidated the patterns that 4.x and 5.x experimented with — message-parts, provider adapters, the tool-call lifecycle — into a stable surface that production teams can rely on.
And yet the typical adoption pattern is shallow. Most teams ship with useChat on the client and streamText on the server, call it done, and then reach for bespoke libraries the moment they need a tool, multi-modal generation, or an autonomous loop. That pattern costs twice — first in the integration of the second library, then again when the SDK's own implementation of the same feature catches up and the team has to re-platform mid-project. The cure is a structured tour of the full surface area.
Core chat surface
useChat + streamTextThe first two features every team adopts. Streaming chat with optimistic UI, message state, and one provider wired up. Covers roughly 30 percent of the SDK and roughly 80 percent of the first month of usage.
~30% of SDK · ~80% of usageTools, structured output
tool() + generateObject + ZodThe features that turn a chatbot into an agent. Most teams hand-roll JSON schemas and string-parsing instead of using the SDK's Zod-typed tool and generateObject primitives. Costs reliability and time.
~40% of SDK · production-criticalAgent loop, multi-modal
stepCountIs() + image/audio/video genThe frontier features that arrived with 6.x. Agent loop replaces hand-rolled multi-step orchestration; multi-modal generation removes a separate provider layer for media. Both are still under-adopted.
~30% of SDK · 2026 differentiatorsThe deep-dive structure below moves bottom-up: the message-parts model first because it underpins every other client-side feature; then the server primitive that produces those parts; then tools and the provider abstraction; then the newer multi-modal and agent-loop surface areas. Each section is dense by design — the goal is a reference you keep open in a tab during real implementation work, not a tutorial you skim once.
02 — useChatMessage-parts model, streaming, optimistic UI.
useChat is the React hook that owns the entire client-side chat surface. It wires streaming reassembly, optimistic user-message insertion, abort signals on re-submission, stable message IDs for React reconciliation, status state machine, and tool-call part rendering. The shape it exposes is small enough to memorize and stable enough to depend on.
The message-parts model
The single most important concept in AI SDK 6 is that every message — user and assistant — is an array of parts rather than a single content string. A typical assistant turn now contains text fragments, tool-call requests, tool-call results, and reasoning traces, interleaved in the order the model emitted them. Rendering a message means iterating the parts array and dispatching on part.type. Collapsing this to a string loses tool-call UX, breaks reasoning display, and invalidates the entire upgrade path to agentic interfaces.
Every render loop on a useChat message array should switch on these five part.type values: text (the model's natural-language output), tool-call (a structured tool invocation request, with args), tool-result (the result your execute function returned, paired by toolCallId), reasoning (the model's chain of thought, when reasoning mode is enabled), and source (citation pointers from grounded providers). Anything you don't render is invisible to your users — tool calls fail silently when the render loop only matches text.
The hook also exposes status, which is the four-state machine that drives perceived quality. ready means idle and accepting input. submitted means the user's message is sent but the model has not yet produced a first token. streaming means tokens are arriving — disable the input, show a pulsing cursor, and prepare for tool-call parts to interleave with text. error means the route returned non-200 or the stream broke — surface a toast and offer regenerate(). Most quality regressions on chatbots ship from a render loop that ignores status.
Optimistic UI for free
The hook inserts the user's message into the messages array on sendMessage call, before the network round trip resolves. This is what eliminates the "did my message send?" perception gap that naive fetch-and-render chat implementations always have. The optimistic insert is rolled back automatically if the request fails. Build on this — never manage your own optimistic insertion logic alongside the hook.
Initial-history hydration is the one place you may legitimately interact with the underlying state. Pass initialMessages to useChat on mount with historical conversation rows from your database, and the hook treats them as already-completed turns. setMessages is exposed for advanced cases like server-side persistence with optimistic ID reconciliation, but reach for it sparingly.
"Render every part type — text, tool-call, tool-result, reasoning, source — or your users see a magic black box where tools are supposed to be."— Internal playbook for production AI SDK rollouts
One implementation discipline worth internalizing: the messages array returned by useChat is owned by the hook; treat it as read-only inside render. Mutations during render produce stale UI and break streaming reassembly. Use regenerate() for retries, stop() for cancellation, setMessages only for edit-and-resubmit flows or hydration, and sendMessage for new user turns. The discipline pays back the moment you add tool calls in Section 04.
03 — streamTextReasoning effort routing and response stream.
streamText is the server-side counterpart to useChat and the right primitive for any user-facing AI surface. It accepts a model object, an array of messages, an optional system prompt, tools, stop conditions, and provider- specific options; it returns a streaming response object with methods to turn the stream into a UI-Message Server-Sent-Events response, a plain text stream, or a JSON-RPC stream. The default shape — result.toUIMessageStreamResponse() — pairs with useChat on the client and requires no manual stream parsing.
Reasoning effort routing
AI SDK 6 added first-class support for routing reasoning effort across providers that expose it. Pass providerOptions with the provider-specific reasoning controls and the SDK normalizes the runtime behavior. Anthropic uses thinking with a token budget; OpenAI's reasoning models use reasoning.effort as a low / medium / high enum; Google exposes a thinkingBudget with a token cap. The SDK does not unify these into a single abstraction — each provider's surface is too different — but it gives you a single place to set them.
thinking budget
Pass providerOptions.anthropic.thinking with type extended-thinking and a budget tokens count. Sonnet 4.7 and Opus 4.7 both support extended reasoning; the model decides how much budget to actually consume per turn.
providerOptions.anthropicreasoning.effort
Low, medium, or high. Higher effort produces longer reasoning traces and more accurate answers on complex problems at the cost of latency and tokens. Default to medium for chat; reach for high on math, code review, or proof workloads.
providerOptions.openaithinkingBudget
Gemini 3.1 Pro exposes a thinkingBudget token cap. Setting to 0 disables reasoning; setting high enables the equivalent of OpenAI's high effort. Streaming behavior preserves the reasoning trace as a separate part type on the message.
providerOptions.googleThe response-stream shape itself is where the SDK earns its keep. toUIMessageStreamResponse() returns an SSE stream in the protocol that useChat consumes natively — text chunks, tool-call requests, tool-call results, and reasoning fragments all flow in the same stream with discriminated event types. No manual chunk parsing, no per-provider decoder, no wrestling with stream-parsing edge cases on connection close.
Lifecycle hooks
streamText exposes lifecycle callbacks that are the right place to put observability and per-turn business logic. onChunk fires for every incoming token or part — useful for token-level metrics if you need them, expensive if you don't. onStepFinish fires once per tool-call step in multi-step runs — useful for logging tool invocations. The highest-value hook is onFinish, which fires once at stream completion with the full usage breakdown — input tokens, output tokens, reasoning tokens, total cost estimate, finish reason. Pipe onFinish data to your observability layer and most production debugging questions resolve themselves.
For non-streaming workloads — a background summarization job, a batch reranker, a tool that needs a structured response with no user visibility — reach for generateText or generateObject instead. They share the same provider adapter and message shape but return a single result rather than a stream. The split is real and consequential: stream for user-facing surfaces, generate for backend pipelines.
04 — Tool CallsZod schemas, run-call lifecycle, error discrimination.
Tools are where AI SDK 6's design choices pay off most visibly. Every tool is defined with a name, a description for the model, a Zod input schema, and an execute function. The SDK handles routing the model's structured tool-call output to your function, validating the arguments against the schema, awaiting the result, serializing it back into the stream as a part, and looping if the model wants to invoke another tool.
The four tool-call patterns
The grid below maps the four canonical tool-call shapes you will actually ship. Most production chatbots use all four. The differences come down to where state lives (client vs server), whether the user must approve the invocation, and how long the execution can take.
In-process execute
tool({ inputSchema, execute })The default. Tool defined and executed in the route handler. Best for fast, deterministic actions — search, database lookups, internal API calls. Sub-second latency, no client coordination, full type safety end-to-end.
Default · ~80% of toolsBrowser-side execute
addToolResult on useChatTool schema declared on the server, execution happens in the browser via useChat's addToolResult helper. Use for actions that need browser context — reading clipboard, opening a tab, calling a wallet — or that benefit from running near the user.
Browser context requiredConfirm before execute
two-step UI · interrupt + resumeTool call paused awaiting user confirmation before execution. Render a confirm card on the tool-call part; on approval, send the user's decision back via addToolResult and let the agent loop continue. Required for spend, delete, or write actions.
High-stakes actionsAsync background tool
queue + poll · webhook resumeTool kicks off a background job (image generation, batch query, long compute), returns a job ID, and the agent stops. A webhook or polling step resumes the conversation with the result later. Pattern for anything beyond ~30 seconds.
Beyond function timeoutThe run-call lifecycle
Each tool invocation passes through a four-stage lifecycle that the SDK exposes via discriminated message parts. Stage one — the model emits a tool-call part with the tool name, arguments, and a stable toolCallId. The SDK validates the arguments against your Zod schema; on validation failure, the call is aborted with a typed error and the model is re-prompted. Stage two — the SDK invokes your execute function with the validated arguments. Stage three — your function returns a value (or throws), which the SDK serializes into a tool-result part, paired to the original call by toolCallId. Stage four — the model receives the result and either emits a final text response or another tool call, looping until a stop condition fires.
Error discrimination
Production tool calls fail in four distinct ways and the SDK surfaces each as a different error type. NoSuchToolError — the model hallucinated a tool name that does not exist; usually means the system prompt under-specified the available tools. InvalidToolInputError — the model's structured output failed Zod validation; usually means the schema description is ambiguous or the model is too small for the task. A thrown error from your execute function — surfaced as a tool-result with an error field, which the model can read and decide how to recover from. ToolExecutionError — wraps the underlying throw with tool-call context (name, args, callId) for logging. Switch on the error type rather than catching everything as a generic Error; production reliability depends on the discrimination.
Tool-augmented chats need an explicit stop condition or they will loop. Use stopWhen: stepCountIs(N) where N is the maximum number of tool-call rounds per assistant turn — 5 is a sensible default for chat, 10–20 for autonomous agents, 1 for "one tool, then answer" interfaces. Without a stop condition, a misbehaving model can rack up unbounded provider cost on a single user message. Set this on every streamText call that has tools.
One ergonomics rule that compounds across a tool catalog: write the description for the model, not for human reviewers. The model uses the description to decide when to invoke each tool, so phrase it as a trigger condition rather than a mechanism — "Search the public web for current information when the question cannot be answered from training data alone" rather than "Calls the Bing Search API." The difference in tool-call accuracy is measurable and shows up immediately in production evals. Keep argument names descriptive (searchQuery not q), keep required arguments minimal, and avoid optional freeform fields that the model can fill with anything.
05 — Provider AdapterAnthropic, OpenAI, Google, Mistral, xAI.
The provider-adapter pattern is the AI SDK's quietest superpower and the single architectural decision that most determines a product's long-term flexibility. Every provider package exports a factory that returns a model object implementing the same internal interface, which means swapping anthropic('claude-sonnet-4-7') for openai('gpt-5-5') is a one-line change. No re-shaping of messages, no per-provider stream parsing, no per-provider tool-call format, no per-provider error taxonomy.
The five-plus mainstream providers covered by first-party packages today: Anthropic, OpenAI, Google, Mistral, and xAI. Community- maintained adapters cover Cohere, Groq, Fireworks, Together, OpenRouter, Bedrock, Vertex, and Azure OpenAI. Any OpenAI-compatible endpoint can be wired in via the createOpenAICompatible factory, which covers most inference-on-demand providers without a dedicated package.
Claude Sonnet 4.7
Best default for chat. Strongest tool-call reliability, cleanest extended-thinking integration, conservative refusal posture. Prompt caching with a 90 percent discount on cached prefixes makes long system prompts cheap.
Default for chatGPT-5.5
Strong on agentic coding, structured outputs, and the most reliable JSON-mode in the field. Reasoning models (o-series) expose a clean low/medium/high effort knob. Slightly higher list price; generous batch discounts.
Pick for coding agentsGemini 3.1 Pro
Price-leading on long context. 1M-token window at materially lower per-token cost than peers. Native multi-modal support in the same model. Tool-call shape is maturing; verify behavior on your own evals.
Pick for long-contextMistral Large + open weights
European hosting, GDPR-friendly defaults, competitive on price for chat-style turns. Open-weight Mixtral and Codestral variants are the cleanest open-source fallback for sovereignty workloads. Smaller eval ecosystem.
Pick for EU sovereigntyGrok 5
Lowest-latency option for chat-style turns and best real-time web grounding via native X integration. Useful fallback when Anthropic or OpenAI are rate-limiting. Smaller ecosystem of independent evals; measure on your own prompts.
Pick for latency or fallbackThe pattern in practice is a thin getModel() helper that reads an environment variable and returns the appropriate model factory. Cache the result on first call. Drive per-environment defaults from vercel.json environment settings or your platform equivalent; drive per-request routing from a small policy layer inside the route handler — "default to Sonnet, route code questions to GPT-5.5, route long-document Q&A to Gemini." The Vercel AI Gateway is a managed alternative that implements the same pattern with built-in caching, retries, and observability; it scales with usage and is worth evaluating once traffic justifies the cost.
One feature worth using deliberately: provider-specific options via providerOptions. The SDK exposes a typed object per provider that surfaces vendor-specific knobs — Anthropic prompt caching, OpenAI's reasoning effort, Google's safety settings — without polluting the cross-provider interface. Set these at the call site rather than wrapping them in a custom abstraction; the SDK already maintains the abstraction for you.
06 — Multi-ModalImage, audio, video generation.
AI SDK 6 added first-class generation primitives for images, audio (speech and transcription), and an experimental video primitive — alongside the existing input-side support for vision and audio on chat models. The same provider-adapter pattern applies: each media type has a dedicated SDK function (generateImage, generateSpeech, transcribe, experimental generateVideo) and each compatible provider exports a factory that conforms to that function's interface.
Input vs output multi-modal
Two distinct surface areas matter here. Input-side multi-modal — sending images, audio, or video to a chat model as part of a user message — is supported via image, audio, and file parts on the UIMessage shape. All major chat providers accept images today; audio input is supported by GPT-5.5 and Gemini natively; video input is supported only by Gemini at production quality. Output-side generation is handled by dedicated functions, not by streamText — do not try to make a chat model generate images by asking nicely.
Multi-modal provider coverage by media type
Source: AI SDK 6 provider matrix · May 2026The image-generation surface is the most production-ready: pass a prompt, optional size and quality parameters, and an optional set of input images for reference. Returns base64-encoded image data or a stored URL depending on the provider. The function call is synchronous from the SDK's perspective — generation itself takes 5–30 seconds, so wire it through your long-running tool pattern from Section 04 if you call it inside a chat agent. For stand-alone generation flows (a "create image" button in a UI), call generateImage directly from a route handler with a longer maxDuration.
Speech and transcription are the next most-used surfaces. generateSpeech turns text into an audio buffer using the configured voice; pair with a streaming endpoint to play the audio progressively as it generates. transcribe accepts an audio buffer and returns text plus optional word-level timestamps. Both are useful for accessibility (read-aloud, voice-input chat) and for content workflows (podcast transcription, video voiceover). The provider matrix is wide — OpenAI, ElevenLabs, Deepgram, AssemblyAI, Groq, and Google all ship adapters.
generateVideo shipped in 6.x as an experimental primitive. Google Veo, Runway, and OpenAI Sora are the production providers; quality and latency vary substantially. Treat as a preview surface — pin to specific provider versions, expect breaking changes, and measure independently before shipping into a critical workflow. Output durations remain bounded (typically 6–10 seconds today) and per-second cost is significant; budget accordingly.
07 — Agent LoopNew primitive for autonomous loops.
The agent-loop primitive is the headline feature of the AI SDK 6 line and the one that closes the most important historical gap against LangChain-style frameworks. Before 6.x, building an autonomous agent on top of the SDK meant hand-rolling the multi-step loop — prompt the model, parse tool calls, execute, re-prompt with results, check a stop condition, repeat. The new primitive does this for you with typed stop conditions, step tracking, and clean cancellation semantics.
Stop conditions, not max-steps
The mental model that the SDK encourages is "the agent runs until a stop condition fires," not "the agent runs for at most N steps." Stop conditions are composable: stepCountIs(10) caps total rounds, hasToolCall('finalize') stops the moment a sentinel tool is invoked, tokenUsageExceeds(5000) stops on cost, and you can write custom predicates that inspect the running message array. Compose multiple stop conditions with stopWhen: [...] and the loop terminates on the first match.
stepCountIs(N)
Hard cap on tool-call rounds per run. The simplest stop condition and the one every agent should have. Use 5 for chat agents, 10–20 for autonomous task agents, 1 for single-tool-then-answer interfaces. Prevents runaway loops.
Required defaulthasToolCall('finalize')
Stop when the model invokes a designated finalize tool. Pattern for goal-directed agents — the agent works through any number of intermediate tools and then explicitly signals completion via the sentinel call. Cleaner than relying on text heuristics.
Goal-directed agentstokenUsageExceeds(N)
Stop when cumulative token usage crosses a threshold. Useful as a defense-in-depth pairing with stepCountIs — a single step can consume thousands of tokens if the tool returns large payloads, so step count alone undercounts cost.
Defense-in-depthStep tracking and observability are the second half of the primitive. Each step is surfaced via the onStepFinish callback with the full set of parts emitted that step, the tool calls made, the tool results received, and the usage stats. Stream this to your observability layer and you get a per-step audit trail of agent behavior — essential for debugging the long tail of "why did the agent do that?" questions.
Where the agent loop replaces LangChain
The strongest historical reason teams layered LangChain on top of the AI SDK was the agent abstraction — LangChain's AgentExecutor and the various agent types it shipped with. The SDK's agent loop now covers the same surface area for most use cases: multi-step tool sequencing, stop conditions, structured intermediate results, and step-level observability. What LangChain still offers on top is the orchestration ecosystem — chains, retrievers, memory abstractions, the broader LangSmith tracing platform — which matters for some workloads and is overkill for most chatbots and product agents.
"The agent loop is the feature that removes the strongest historical reason to wrap the AI SDK in something else."— Our reading of the AI SDK 6 changelog, May 2026
08 — vs LangChainPerformance and cost deltas measured.
The most common architectural question in mid-2026 is "AI SDK or LangChain?" — and the honest answer is "depends on the workload, and most teams need both for different things." That said, measurable deltas exist on the workloads where they overlap, and the deltas favor the AI SDK on most production chat and agent surfaces.
Performance
On equivalent streaming chat workloads with the same provider and model, AI SDK 6 is roughly 15–25 percent faster end-to-end than LangChain TypeScript on cold-start route invocations and roughly 5–10 percent faster on warm requests. The delta comes from the SDK's smaller client and server surface — fewer abstractions on the path between the request body and the streaming response, less object allocation per token, and a streaming protocol that doesn't re-serialize across internal layers. The numbers narrow on longer requests where provider latency dominates, and widen on short, high-frequency calls.
AI SDK 6 vs LangChain TypeScript · selected metrics
Source: Internal benchmarks · OpenAI gpt-4o-mini · 1k requests each · May 2026Cost
Provider cost is identical — both libraries call the same endpoints with the same tokens. Where cost actually diverges is infrastructure: AI SDK routes run faster, so per-request compute time on Vercel Functions is lower, which compounds at scale into a 10–20 percent monthly platform-cost reduction at the same request volume. The smaller client bundle (~25 KB gzipped for @ai-sdk/react vs ~80 KB for the equivalent LangChain React bindings) also reduces edge-network egress and improves initial page load on chat-heavy products.
When each wins
Use the AI SDK as the default for chatbots, in-product AI features, streaming agent interfaces, and any workload where client-side React rendering is the surface. Use LangChain (or its TypeScript siblings — Mastra, Inngest's agent kit) when you need heavy orchestration on the backend: parallel tool execution graphs, complex memory abstractions across sessions, retrieval pipelines with multiple rerankers, or the LangSmith tracing platform specifically. Many production builds use both — LangChain in a server-side orchestration service that produces results, AI SDK in the user-facing chat layer that streams them.
For teams sizing this decision, our LangChain 1 deep dive covers the orchestration side of the picture in symmetric detail, and our Next.js 16 AI chatbot tutorial ships a production-shaped chat surface end-to-end on the AI SDK. For larger transformation work — picking a default AI stack across multiple products and teams — that's exactly the kind of comparative eval our AI transformation engagements start with.
AI SDK 6 is the standard — most teams under-use it by 70%.
The Vercel AI SDK is the most adopted, fastest, and most consistently improved AI surface in the React ecosystem — and the typical adoption is a fraction of what's available. Most teams ship with useChat and streamText, hand-roll the rest, and pay the cost twice when the SDK's implementation of the same feature catches up six months later. The full 6.x surface area — message-parts, tools with Zod, the provider adapter, multi-modal, the agent loop — is the foundation modern AI products are built on.
The practical path for teams already running the SDK: audit your current chat surface against the five part types in Section 02 and make sure your render loop handles all of them; replace any hand-rolled tool-call shapes with tool() plus Zod; extract provider selection into a getModel() helper even if you have only one provider today; pick one stop condition for every streamText call that has tools. None of these changes require a re-architecture; together they convert a shallow adoption into a stack you can keep building on.
The broader signal is that the framework question for AI on React is settled. AI SDK 6 is the foundation; LangChain and the broader orchestration ecosystem layer on top when the workload justifies them. Every quarter the SDK ships, the surface widens and the reasons to wrap it in something else narrow. The question for engineering teams in mid-2026 isn't whether to adopt the AI SDK — it's how much of its current surface area is already in your stack, and how much value you're leaving on the table by not using the rest.