Vercel AI SDK 6: Streaming AI Chat with Next.js
Build production-ready streaming AI chat with Vercel AI SDK 6 and Next.js. Server Actions, structured output, tool calling, and multi-provider setup guide.
API Boilerplate Reduction
Providers Supported
Type Safety Coverage
Streaming Latency
Key Takeaways
Building AI-powered applications with Next.js has gone through three distinct phases. In 2023, developers wrote raw fetch calls to OpenAI's API and manually handled streaming with ReadableStream. In 2024, AI SDK v4 and v5 introduced the useChat hook and API route helpers, dramatically reducing boilerplate. Now, AI SDK 6 represents the third generation: a framework that treats AI inference as a first-class primitive in React's Server Component architecture.
The key shift in v6 is moving from REST API routes to React Server Actions. Instead of creating /api/chat endpoints and configuring streaming middleware, you write a Server Action that calls the model and return the stream directly to your client component. This is not just a convenience improvement — it enables end-to-end type safety, eliminates an entire class of serialization bugs, and integrates naturally with React's concurrent rendering model.
This guide walks through building a production-ready streaming chat application with AI SDK 6 and Next.js. We cover setup, streaming chat, structured output, tool calling, multi-provider configuration, error handling, and deployment. Every code example is production-tested and follows the patterns used in real applications serving thousands of users.
What Changed in AI SDK 6
AI SDK 6 is a major version with significant architectural changes from v5. Understanding what changed and why prevents confusion when migrating existing code or following outdated tutorials. The core philosophy shifted from "make API calls easier" to "make AI a native part of your React application."
- Server Actions replace API routes — the useChat hook now connects to Server Actions instead of /api/chat endpoints. The old StreamingTextResponse helper is removed in favor of toDataStreamResponse for edge cases
- Unified model interface — the LanguageModelV1 interface is now standard across all providers, making model swapping a one-line change with consistent behavior guarantees
- Native Zod 4 integration — structured output uses Zod schemas directly in generateObject and streamObject calls, with automatic JSON Schema conversion and validation
- Improved tool system — tools now have typed parameters via Zod, automatic execution, multi-step tool calling, and built-in error handling with the tool result passed back to the model
- Provider packages restructured — each provider is a separate package (@ai-sdk/openai, @ai-sdk/anthropic, @ai-sdk/google) with tree-shakeable exports
// app/api/chat/route.ts
import { openai } from "@ai-sdk/openai"
import { streamText } from "ai"
export async function POST(req) {
const { messages } = await req.json()
const result = streamText({
model: openai("gpt-4o"),
messages,
})
return result.toDataStreamResponse()
}
// Separate client component
// must match the API route path
const { messages, input, handleSubmit }
= useChat({ api: "/api/chat" })// app/actions/chat.ts
"use server"
import { openai } from "@ai-sdk/openai"
import { streamText } from "ai"
export async function chat(messages) {
const result = streamText({
model: openai("gpt-4o"),
messages,
})
return result.toDataStream()
}
// Client component
// type-safe, no URL to configure
const { messages, input, handleSubmit }
= useChat({ api: chat })The Server Action approach eliminates several pain points. You no longer need to maintain separate API route files, manually serialize request bodies, or worry about URL path mismatches between client and server. TypeScript validates the connection between your Server Action and your useChat hook at compile time. If you rename a parameter or change the return type, the compiler catches it immediately rather than failing at runtime with a cryptic serialization error.
Project Setup and Provider Configuration
Setting up AI SDK 6 in a Next.js project requires installing the core package and at least one provider package. The modular architecture means you only install the providers you use, keeping your bundle size minimal.
# Core SDK (required)
pnpm add ai
# Provider packages (install the ones you need)
pnpm add @ai-sdk/openai # OpenAI (GPT-4o, GPT-4.1, o3)
pnpm add @ai-sdk/anthropic # Anthropic (Claude Opus 4.6, Sonnet)
pnpm add @ai-sdk/google # Google (Gemini 3.1 Pro, Flash)
# For structured output
pnpm add zod # Schema validation (v4 recommended)# .env.local
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
GOOGLE_GENERATIVE_AI_API_KEY=AI...
# Provider packages read these env vars automatically.
# No configuration code needed — just set the vars
# and import the provider.Each provider package exports a factory function that creates model instances. These functions read API keys from environment variables automatically, so there is no configuration boilerplate. You create a model instance by calling the provider function with a model identifier string.
import { openai } from "@ai-sdk/openai"
import { anthropic } from "@ai-sdk/anthropic"
import { google } from "@ai-sdk/google"
// Create model instances — each is interchangeable
const gpt4o = openai("gpt-4o")
const gpt41mini = openai("gpt-4.1-mini")
const claude = anthropic("claude-sonnet-4-6")
const gemini = google("gemini-3.1-pro")
// All models implement LanguageModelV1
// Any function that accepts a model works with all of them
import { generateText } from "ai"
const { text } = await generateText({
model: gpt4o, // swap to claude, gemini, etc.
prompt: "Explain RAG in one paragraph",
})The provider-agnostic model interface is one of AI SDK 6's strongest design decisions. Your application logic, prompts, tool definitions, and streaming infrastructure all work identically across providers. When Anthropic releases a new model, you change one string identifier. When pricing changes make Google more cost-effective for certain queries, you route to Gemini without touching your chat UI, tool definitions, or error handling. This flexibility is critical for production applications where cost optimization and capability matching require using different models for different tasks.
Streaming Chat with Server Actions
The streaming chat pattern is the most common AI SDK use case. A user types a message, the server streams the response token by token, and the client renders each token as it arrives. AI SDK 6 makes this pattern remarkably concise using Server Actions and the useChat hook.
"use server"
import { openai } from "@ai-sdk/openai"
import { streamText } from "ai"
import type { CoreMessage } from "ai"
export async function chat(messages: CoreMessage[]) {
const result = streamText({
model: openai("gpt-4o"),
system: `You are a helpful assistant for Digital Applied,
a digital marketing agency. Answer questions about
marketing, SEO, content strategy, and web development.
Be concise and actionable.`,
messages,
maxTokens: 2048,
temperature: 0.7,
})
return result.toDataStream()
}"use client"
import { useChat } from "ai/react"
import { chat } from "@/app/actions/chat"
export function Chat() {
const {
messages,
input,
handleInputChange,
handleSubmit,
isLoading,
error,
} = useChat({ api: chat })
return (
<div className="flex flex-col h-screen max-w-2xl mx-auto">
<div className="flex-1 overflow-y-auto p-4 space-y-4">
{messages.map((message) => (
<div
key={message.id}
className={
message.role === "user"
? "bg-blue-100 p-3 rounded-lg ml-auto max-w-[80%]"
: "bg-zinc-100 p-3 rounded-lg mr-auto max-w-[80%]"
}
>
<p className="text-sm whitespace-pre-wrap">
{message.content}
</p>
</div>
))}
{isLoading && (
<div className="text-zinc-400 text-sm">Thinking...</div>
)}
{error && (
<div className="text-red-500 text-sm">
Error: {error.message}
</div>
)}
</div>
<form onSubmit={handleSubmit} className="p-4 border-t">
<div className="flex gap-2">
<input
value={input}
onChange={handleInputChange}
placeholder="Ask a question..."
className="flex-1 border rounded-lg px-4 py-2"
disabled={isLoading}
/>
<button
type="submit"
disabled={isLoading || !input.trim()}
className="bg-zinc-900 text-white px-6 py-2 rounded-lg
disabled:opacity-50"
>
Send
</button>
</div>
</form>
</div>
)
}That is a complete, functional streaming chat application in two files. The useChat hook manages the entire conversation lifecycle: it maintains message history, handles streaming, manages loading states, and provides error handling. When the user submits a message, the hook calls the Server Action, opens a streaming connection, and updates the messages array in real-time as tokens arrive.
For more complex chat interfaces, the useChat hook provides additional controls: append to programmatically add messages, reload to regenerate the last response, stop to cancel an in-progress stream, and setMessages to modify the conversation history. These primitives let you build features like message editing, response regeneration, conversation branching, and system prompt switching without managing streaming state manually. These are the same patterns used in production web applications serving real users.
Structured Output with Zod Schemas
One of the most powerful features in AI SDK 6 is structured output: the ability to constrain an LLM to generate responses matching a specific schema. Instead of generating free-text and then trying to parse it into structured data (a fragile pattern that breaks unpredictably), you define a Zod schema and the model generates valid JSON that conforms to it on the first attempt.
"use server"
import { openai } from "@ai-sdk/openai"
import { generateObject } from "ai"
import { z } from "zod"
const BlogAnalysisSchema = z.object({
title: z.string().describe("The blog post title"),
seoScore: z.number().min(0).max(100)
.describe("SEO optimization score out of 100"),
readability: z.enum(["easy", "moderate", "advanced"])
.describe("Reading level"),
keywords: z.array(z.string())
.describe("Top 5 target keywords"),
improvements: z.array(z.object({
area: z.string(),
suggestion: z.string(),
priority: z.enum(["high", "medium", "low"]),
})).describe("Specific improvement suggestions"),
estimatedReadingTime: z.number()
.describe("Estimated reading time in minutes"),
})
type BlogAnalysis = z.infer<typeof BlogAnalysisSchema>
export async function analyzeBlogPost(
content: string
): Promise<BlogAnalysis> {
const { object } = await generateObject({
model: openai("gpt-4o"),
schema: BlogAnalysisSchema,
prompt: `Analyze this blog post for SEO and readability.
Provide actionable improvement suggestions.
Blog content:
${content}`,
})
return object
// TypeScript knows this is BlogAnalysis
// No parsing, no try-catch, no "invalid JSON" errors
}The generateObject function handles schema conversion, model-specific formatting, and validation automatically. For OpenAI models, it uses the structured output API (response_format with json_schema). For Anthropic, it uses tool_use with a single-tool pattern. For models without native structured output support, it falls back to prompt-based extraction with retry logic. You do not need to know or care about these implementation details — the SDK abstracts them behind a consistent interface.
"use server"
import { openai } from "@ai-sdk/openai"
import { streamObject } from "ai"
import { z } from "zod"
const ProductSchema = z.object({
name: z.string(),
tagline: z.string(),
features: z.array(z.object({
title: z.string(),
description: z.string(),
})),
targetAudience: z.string(),
pricingTier: z.enum(["free", "starter", "pro", "enterprise"]),
})
export async function generateProductBrief(description: string) {
const result = streamObject({
model: openai("gpt-4o"),
schema: ProductSchema,
prompt: `Generate a product brief for: ${description}`,
})
return result.toTextStream()
}
// Client component
"use client"
import { useObject } from "ai/react"
import { generateProductBrief } from "@/app/actions/product"
function ProductForm() {
const { object, submit, isLoading } = useObject({
api: generateProductBrief,
schema: ProductSchema,
})
// 'object' is partially typed as the stream builds up
// object.name might exist before object.features
return (
<div>
{object?.name && <h2>{object.name}</h2>}
{object?.tagline && <p>{object.tagline}</p>}
{object?.features?.map((f, i) => (
<div key={i}>
<h3>{f.title}</h3>
<p>{f.description}</p>
</div>
))}
</div>
)
}Streaming structured output is particularly useful for UI that displays structured data progressively. As the model generates each field of the JSON object, the useObject hook updates the partial object in real-time. Users see the name appear first, then the tagline, then features populating one by one. This progressive rendering feels significantly faster than waiting for the complete object, even though the total generation time is identical.
Tool Calling and Function Execution
Tools give your AI application the ability to interact with external systems: query databases, call APIs, perform calculations, or trigger workflows. In AI SDK 6, tools are defined with typed parameters (via Zod schemas) and execute functions. The model decides when to call a tool based on the conversation context, and the SDK handles the multi-step execution flow automatically.
"use server"
import { openai } from "@ai-sdk/openai"
import { streamText, tool } from "ai"
import { z } from "zod"
export async function chatWithTools(messages) {
const result = streamText({
model: openai("gpt-4o"),
system: "You are a marketing analytics assistant.",
messages,
tools: {
getWebsiteMetrics: tool({
description: "Get website analytics metrics for a domain",
parameters: z.object({
domain: z.string().describe("The website domain"),
dateRange: z.enum(["7d", "30d", "90d"])
.describe("Time period for metrics"),
}),
execute: async ({ domain, dateRange }) => {
// In production: call your analytics API
const metrics = await fetchAnalytics(domain, dateRange)
return {
visitors: metrics.visitors,
pageViews: metrics.pageViews,
bounceRate: metrics.bounceRate,
topPages: metrics.topPages.slice(0, 5),
}
},
}),
generateSeoAudit: tool({
description: "Run an SEO audit on a URL",
parameters: z.object({
url: z.string().url().describe("URL to audit"),
}),
execute: async ({ url }) => {
const audit = await runSeoCheck(url)
return {
score: audit.score,
issues: audit.issues,
recommendations: audit.recommendations,
}
},
}),
searchBlogPosts: tool({
description: "Search published blog posts by topic",
parameters: z.object({
query: z.string().describe("Search query"),
limit: z.number().default(5)
.describe("Max results to return"),
}),
execute: async ({ query, limit }) => {
const posts = await searchPosts(query, limit)
return posts.map(p => ({
title: p.title,
url: p.url,
summary: p.excerpt,
}))
},
}),
},
maxSteps: 5, // Allow up to 5 tool calls per response
})
return result.toDataStream()
}The maxSteps parameter controls how many tool-calling rounds the model can perform. With maxSteps set to 5, the model can call a tool, receive the result, decide to call another tool based on the first result, and repeat up to 5 times before generating its final response. This enables complex workflows like: "Get website metrics for example.com, then run an SEO audit on the top-performing page, then search our blog for related content to recommend."
- User asks: "How is our SEO performing this month?"
- Model decides to call getWebsiteMetrics with domain "digitalapplied.com" and dateRange "30d"
- SDK executes the tool function and passes the result back to the model
- Model analyzes the metrics and decides to call generateSeoAudit on the top page
- SDK executes the audit and returns results
- Model generates a natural language response combining metrics data and audit findings
Multi-Provider Support
Production AI applications rarely rely on a single model provider. Different models excel at different tasks: Claude is strong at analysis and nuanced writing, GPT-4o offers broad general capability with fast inference, and Gemini provides cost-effective performance for simpler tasks. AI SDK 6 makes multi-provider setups straightforward with its unified model interface.
import { openai } from "@ai-sdk/openai"
import { anthropic } from "@ai-sdk/anthropic"
import { google } from "@ai-sdk/google"
import type { LanguageModelV1 } from "ai"
// Define model tiers by use case
const models = {
// Complex analysis, strategy, long-form content
premium: anthropic("claude-sonnet-4-6"),
// General purpose, fast, reliable
standard: openai("gpt-4o"),
// Simple queries, classification, extraction
economy: google("gemini-3.1-flash"),
// Ultra-fast for autocomplete, suggestions
fast: openai("gpt-4.1-mini"),
} satisfies Record<string, LanguageModelV1>
type ModelTier = keyof typeof models
// Route queries based on complexity
function selectModel(query: string): ModelTier {
const wordCount = query.split(" ").length
// Long, complex queries → premium model
if (wordCount > 100) return "premium"
// Questions requiring analysis → standard
if (query.includes("analyze") || query.includes("compare"))
return "standard"
// Short, simple queries → economy
if (wordCount < 20) return "economy"
return "standard"
}
// Usage in Server Action
export async function smartChat(messages) {
const lastMessage = messages.at(-1)?.content ?? ""
const tier = selectModel(lastMessage)
const result = streamText({
model: models[tier],
messages,
})
return result.toDataStream()
}- GPT-4o: Best general-purpose, fast streaming, strong tool use
- GPT-4.1-mini: 90% of GPT-4o quality at 15% of cost
- o3: Best for complex reasoning, math, code generation
- Opus 4.6: Most capable, best for complex analysis and long-form
- Sonnet 4.6: Balanced speed and quality, excellent coding
- Haiku 4.5: Fastest Anthropic model, great for classification
- Gemini 3.1 Pro: Largest context window (2M tokens), multimodal
- Gemini 3.1 Flash: Cost-effective, fast for high-volume tasks
- Gemini 3.1 Flash-Lite: Cheapest option, simple extraction tasks
The model router pattern is production-critical for cost management. In a typical AI application, 60-70% of queries are simple and can be handled by economy-tier models at 10% of the cost of premium models. By routing intelligently, you can reduce your monthly AI inference bill by 50-60% without any perceived quality degradation for most users. Monitor the classification accuracy of your router and adjust thresholds based on user feedback.
Rate Limiting and Error Handling
AI APIs are expensive and abuse-prone. Without rate limiting, a single bad actor or bug in your frontend can rack up thousands of dollars in API costs in minutes. Production AI applications need rate limiting at the user level, request validation, and comprehensive error handling to be viable.
"use server"
import { openai } from "@ai-sdk/openai"
import { streamText } from "ai"
import { Ratelimit } from "@upstash/ratelimit"
import { Redis } from "@upstash/redis"
import { headers } from "next/headers"
const ratelimit = new Ratelimit({
redis: Redis.fromEnv(),
limiter: Ratelimit.slidingWindow(20, "1h"),
analytics: true,
prefix: "ai-chat",
})
export async function chat(messages) {
// 1. Get user identifier
const headersList = await headers()
const ip = headersList.get("x-forwarded-for") ?? "anonymous"
// 2. Check rate limit
const { success, remaining, reset } = await ratelimit.limit(ip)
if (!success) {
throw new Error(
`Rate limit exceeded. Try again in ${Math.ceil((reset - Date.now()) / 1000)}s. ${remaining} requests remaining.`
)
}
// 3. Validate input
const lastMessage = messages.at(-1)
if (!lastMessage || typeof lastMessage.content !== "string") {
throw new Error("Invalid message format")
}
if (lastMessage.content.length > 10_000) {
throw new Error("Message too long. Maximum 10,000 characters.")
}
// 4. Stream with error handling
try {
const result = streamText({
model: openai("gpt-4o"),
messages,
maxTokens: 2048,
abortSignal: AbortSignal.timeout(30_000),
onError: ({ error }) => {
// Log but don't expose internal errors to client
console.error("Stream error:", error)
},
})
return result.toDataStream()
} catch (error) {
if (error instanceof Error) {
if (error.message.includes("rate_limit")) {
throw new Error("AI provider rate limit hit. Please retry.")
}
if (error.message.includes("context_length")) {
throw new Error(
"Conversation too long. Start a new chat."
)
}
}
throw new Error("Failed to generate response. Please retry.")
}
}The rate limiter shown above uses Upstash Redis for a distributed sliding window implementation. This works across multiple Vercel serverless function instances because the state is stored in Redis, not in-memory. The sliding window algorithm allows 20 requests per hour per IP, smoothing out bursts rather than allowing 20 requests in the first minute and then blocking for 59 minutes.
- Rate limiting: Per-user limits on requests per time window. Track by IP for anonymous users, by user ID for authenticated users
- Input validation: Maximum message length, conversation history depth limits, content filtering for prompt injection attempts
- Timeout handling: AbortSignal.timeout to kill requests that exceed acceptable response times. 30 seconds is a reasonable maximum for chat responses
- Provider error mapping: Convert provider-specific errors (rate_limit_exceeded, context_length_exceeded) into user-friendly messages
- Cost monitoring: Track token usage per user and set monthly budget caps. Alert when a single user or session exceeds normal consumption patterns
For authenticated applications, replace IP-based rate limiting with user ID-based limits. This prevents circumvention via VPNs and provides more accurate per-user tracking. If your application has free and paid tiers, configure different rate limits per tier: 5 requests/hour for free users, 100 requests/hour for Pro users, unlimited for Enterprise. Upstash's Ratelimit library supports prefix-based namespacing to run multiple limit configurations simultaneously.
Production Deployment on Vercel
Deploying an AI SDK 6 application on Vercel requires attention to a few platform-specific configurations that affect performance, cost, and reliability. These optimizations are specific to Vercel but the principles apply to any serverless platform.
- maxDuration: Set to 30-60 seconds for chat Server Actions. Default 10s is too short for complex LLM responses
- Region: Deploy functions in the same region as your primary user base. US East (iad1) for US-centric applications
- Memory: Default 1024MB is sufficient for most AI SDK workloads. Increase only if processing large documents
- Streaming: Always use streamText instead of generateText for user-facing chat. Time to first token matters more than total generation time
- Edge runtime: Consider edge functions for latency-sensitive chat routes. Reduces cold start time significantly
- Bundle size: Import only the providers you use. Each provider package is tree-shakeable
// app/actions/chat.ts
"use server"
import { openai } from "@ai-sdk/openai"
import { anthropic } from "@ai-sdk/anthropic"
import { streamText } from "ai"
import type { CoreMessage } from "ai"
// Vercel function configuration
export const maxDuration = 60
export async function chat(messages: CoreMessage[]) {
// Select model based on conversation complexity
const messageCount = messages.length
const model = messageCount > 10
? anthropic("claude-sonnet-4-6") // Better at long conversations
: openai("gpt-4o") // Faster for short exchanges
const result = streamText({
model,
system: `You are a helpful AI assistant. Be concise and accurate.
When you don't know something, say so clearly.
Format responses with markdown when helpful.`,
messages,
maxTokens: 4096,
temperature: 0.7,
// Track token usage for cost monitoring
experimental_telemetry: {
isEnabled: true,
functionId: "chat",
},
})
return result.toDataStream()
}Before deploying, verify that your environment variables are configured in Vercel's project settings. API keys should be set as encrypted environment variables, not committed to source control. Use separate API keys for development and production to keep cost tracking clear and to enable independent rate limit management.
The combination of AI SDK 6, Next.js, and Vercel provides the most streamlined path from prototype to production for AI-powered applications. The Server Action architecture eliminates infrastructure complexity, the unified model interface prevents vendor lock-in, and Vercel's serverless platform handles scaling automatically. For teams building customer-facing AI features, this stack reduces time-to-production from months to weeks. Our AI integration services help businesses implement these patterns at production scale.
Build AI-Powered Applications
Our team builds production-ready AI features with Next.js and the Vercel AI SDK — from streaming chat interfaces to intelligent automation workflows.
Frequently Asked Questions
Related Guides
Continue exploring these insights and strategies.