Development14 min read

Vercel AI SDK 6: Streaming AI Chat with Next.js

Build production-ready streaming AI chat with Vercel AI SDK 6 and Next.js. Server Actions, structured output, tool calling, and multi-provider setup guide.

Digital Applied Team

March 1, 2026

14 min read

70%

API Boilerplate Reduction

15+

Providers Supported

100%

Type Safety Coverage

<100ms

Streaming Latency

Key Takeaways

AI SDK 6 replaces the REST-based approach with native Server Actions: The biggest architectural change in v6 is the shift from API routes to React Server Actions for AI inference. This eliminates the need for separate /api/chat endpoints, reduces boilerplate, and enables end-to-end type safety between your server-side model calls and client-side UI components. The useChat hook now connects directly to Server Actions instead of REST endpoints.

Structured output with Zod schemas guarantees type-safe AI responses: AI SDK 6 integrates Zod schema validation directly into model calls through the generateObject and streamObject functions. Instead of hoping the LLM returns valid JSON, you define a Zod schema and the SDK constrains the model's output to match it exactly. This eliminates the brittle pattern of generating text and then parsing it, with built-in retry logic for schema validation failures.

Tool calling enables AI to execute functions and interact with external systems: The tool system in AI SDK 6 allows you to define typed functions that the model can invoke during a conversation. Each tool has a Zod-validated parameter schema and an execute function. The SDK handles the multi-step flow automatically: the model decides which tool to call, the SDK executes it, passes the result back to the model, and the model generates a natural language response incorporating the tool output.

Multi-provider support lets you switch between OpenAI, Anthropic, and Google with one line: AI SDK 6 provides a unified interface across all major LLM providers. Change from OpenAI to Anthropic by swapping the model identifier — the rest of your code stays identical. This prevents vendor lock-in and lets you route different queries to different providers based on cost, latency, or capability requirements without rewriting application logic.

Building AI-powered applications with Next.js has gone through three distinct phases. In 2023, developers wrote raw fetch calls to OpenAI's API and manually handled streaming with ReadableStream. In 2024, AI SDK v4 and v5 introduced the useChat hook and API route helpers, dramatically reducing boilerplate. Now, AI SDK 6 represents the third generation: a framework that treats AI inference as a first-class primitive in React's Server Component architecture.

The key shift in v6 is moving from REST API routes to React Server Actions. Instead of creating /api/chat endpoints and configuring streaming middleware, you write a Server Action that calls the model and return the stream directly to your client component. This is not just a convenience improvement — it enables end-to-end type safety, eliminates an entire class of serialization bugs, and integrates naturally with React's concurrent rendering model.

This guide walks through building a production-ready streaming chat application with AI SDK 6 and Next.js. We cover setup, streaming chat, structured output, tool calling, multi-provider configuration, error handling, and deployment. Every code example is production-tested and follows the patterns used in real applications serving thousands of users.

What Changed in AI SDK 6

AI SDK 6 is a major version with significant architectural changes from v5. Understanding what changed and why prevents confusion when migrating existing code or following outdated tutorials. The core philosophy shifted from "make API calls easier" to "make AI a native part of your React application."

Breaking Changes in v6

Server Actions replace API routes — the useChat hook now connects to Server Actions instead of /api/chat endpoints. The old StreamingTextResponse helper is removed in favor of toDataStreamResponse for edge cases
Unified model interface — the LanguageModelV1 interface is now standard across all providers, making model swapping a one-line change with consistent behavior guarantees
Native Zod 4 integration — structured output uses Zod schemas directly in generateObject and streamObject calls, with automatic JSON Schema conversion and validation
Improved tool system — tools now have typed parameters via Zod, automatic execution, multi-step tool calling, and built-in error handling with the tool result passed back to the model
Provider packages restructured — each provider is a separate package (@ai-sdk/openai, @ai-sdk/anthropic, @ai-sdk/google) with tree-shakeable exports

Before (v5 Pattern)

// app/api/chat/route.ts
import { openai } from "@ai-sdk/openai"
import { streamText } from "ai"

export async function POST(req) {
  const { messages } = await req.json()

  const result = streamText({
    model: openai("gpt-4o"),
    messages,
  })

  return result.toDataStreamResponse()
}

// Separate client component
// must match the API route path
const { messages, input, handleSubmit }
  = useChat({ api: "/api/chat" })

After (v6 Pattern)

// app/actions/chat.ts
"use server"
import { openai } from "@ai-sdk/openai"
import { streamText } from "ai"

export async function chat(messages) {
  const result = streamText({
    model: openai("gpt-4o"),
    messages,
  })

  return result.toDataStream()
}

// Client component
// type-safe, no URL to configure
const { messages, input, handleSubmit }
  = useChat({ api: chat })

The Server Action approach eliminates several pain points. You no longer need to maintain separate API route files, manually serialize request bodies, or worry about URL path mismatches between client and server. TypeScript validates the connection between your Server Action and your useChat hook at compile time. If you rename a parameter or change the return type, the compiler catches it immediately rather than failing at runtime with a cryptic serialization error.

Migration note: If you are upgrading from v5, your existing API routes still work. AI SDK 6 supports both patterns during a transition period. You can migrate route by route, converting each API endpoint to a Server Action while keeping the rest unchanged. The useChat hook automatically detects whether it is connected to a Server Action or an API route URL.

Project Setup and Provider Configuration

Setting up AI SDK 6 in a Next.js project requires installing the core package and at least one provider package. The modular architecture means you only install the providers you use, keeping your bundle size minimal.

Installation

# Core SDK (required)
pnpm add ai

# Provider packages (install the ones you need)
pnpm add @ai-sdk/openai      # OpenAI (GPT-4o, GPT-4.1, o3)
pnpm add @ai-sdk/anthropic   # Anthropic (Claude Opus 4.6, Sonnet)
pnpm add @ai-sdk/google      # Google (Gemini 3.1 Pro, Flash)

# For structured output
pnpm add zod                  # Schema validation (v4 recommended)

Environment Configuration

# .env.local
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
GOOGLE_GENERATIVE_AI_API_KEY=AI...

# Provider packages read these env vars automatically.
# No configuration code needed — just set the vars
# and import the provider.

Each provider package exports a factory function that creates model instances. These functions read API keys from environment variables automatically, so there is no configuration boilerplate. You create a model instance by calling the provider function with a model identifier string.

Provider Initialization

import { openai } from "@ai-sdk/openai"
import { anthropic } from "@ai-sdk/anthropic"
import { google } from "@ai-sdk/google"

// Create model instances — each is interchangeable
const gpt4o = openai("gpt-4o")
const gpt41mini = openai("gpt-4.1-mini")
const claude = anthropic("claude-sonnet-4-6")
const gemini = google("gemini-3.1-pro")

// All models implement LanguageModelV1
// Any function that accepts a model works with all of them
import { generateText } from "ai"

const { text } = await generateText({
  model: gpt4o,  // swap to claude, gemini, etc.
  prompt: "Explain RAG in one paragraph",
})

The provider-agnostic model interface is one of AI SDK 6's strongest design decisions. Your application logic, prompts, tool definitions, and streaming infrastructure all work identically across providers. When Anthropic releases a new model, you change one string identifier. When pricing changes make Google more cost-effective for certain queries, you route to Gemini without touching your chat UI, tool definitions, or error handling. This flexibility is critical for production applications where cost optimization and capability matching require using different models for different tasks.

Streaming Chat with Server Actions

The streaming chat pattern is the most common AI SDK use case. A user types a message, the server streams the response token by token, and the client renders each token as it arrives. AI SDK 6 makes this pattern remarkably concise using Server Actions and the useChat hook.

Server Action (app/actions/chat.ts)

"use server"

import { openai } from "@ai-sdk/openai"
import { streamText } from "ai"
import type { CoreMessage } from "ai"

export async function chat(messages: CoreMessage[]) {
  const result = streamText({
    model: openai("gpt-4o"),
    system: `You are a helpful assistant for Digital Applied,
a digital marketing agency. Answer questions about
marketing, SEO, content strategy, and web development.
Be concise and actionable.`,
    messages,
    maxTokens: 2048,
    temperature: 0.7,
  })

  return result.toDataStream()
}

Chat Component (components/chat.tsx)

"use client"

import { useChat } from "ai/react"
import { chat } from "@/app/actions/chat"

export function Chat() {
  const {
    messages,
    input,
    handleInputChange,
    handleSubmit,
    isLoading,
    error,
  } = useChat({ api: chat })

  return (
    <div className="flex flex-col h-screen max-w-2xl mx-auto">
      <div className="flex-1 overflow-y-auto p-4 space-y-4">
        {messages.map((message) => (
          <div
            key={message.id}
            className={
              message.role === "user"
                ? "bg-blue-100 p-3 rounded-lg ml-auto max-w-[80%]"
                : "bg-zinc-100 p-3 rounded-lg mr-auto max-w-[80%]"
            }
          >
            <p className="text-sm whitespace-pre-wrap">
              {message.content}
            </p>
          </div>
        ))}

        {isLoading && (
          <div className="text-zinc-400 text-sm">Thinking...</div>
        )}

        {error && (
          <div className="text-red-500 text-sm">
            Error: {error.message}
          </div>
        )}
      </div>

      <form onSubmit={handleSubmit} className="p-4 border-t">
        <div className="flex gap-2">
          <input
            value={input}
            onChange={handleInputChange}
            placeholder="Ask a question..."
            className="flex-1 border rounded-lg px-4 py-2"
            disabled={isLoading}
          />
          <button
            type="submit"
            disabled={isLoading || !input.trim()}
            className="bg-zinc-900 text-white px-6 py-2 rounded-lg
                       disabled:opacity-50"
          >
            Send
          </button>
        </div>
      </form>
    </div>
  )
}

That is a complete, functional streaming chat application in two files. The useChat hook manages the entire conversation lifecycle: it maintains message history, handles streaming, manages loading states, and provides error handling. When the user submits a message, the hook calls the Server Action, opens a streaming connection, and updates the messages array in real-time as tokens arrive.

Performance tip: The useChat hook automatically deduplicates rapid form submissions during streaming. If the user clicks "Send" multiple times while a response is generating, only the first submission goes through. You do not need to implement debouncing or submission guards manually.

For more complex chat interfaces, the useChat hook provides additional controls: append to programmatically add messages, reload to regenerate the last response, stop to cancel an in-progress stream, and setMessages to modify the conversation history. These primitives let you build features like message editing, response regeneration, conversation branching, and system prompt switching without managing streaming state manually. These are the same patterns used in production web applications serving real users.

Structured Output with Zod Schemas

One of the most powerful features in AI SDK 6 is structured output: the ability to constrain an LLM to generate responses matching a specific schema. Instead of generating free-text and then trying to parse it into structured data (a fragile pattern that breaks unpredictably), you define a Zod schema and the model generates valid JSON that conforms to it on the first attempt.

Structured Output: Blog Post Analysis

"use server"

import { openai } from "@ai-sdk/openai"
import { generateObject } from "ai"
import { z } from "zod"

const BlogAnalysisSchema = z.object({
  title: z.string().describe("The blog post title"),
  seoScore: z.number().min(0).max(100)
    .describe("SEO optimization score out of 100"),
  readability: z.enum(["easy", "moderate", "advanced"])
    .describe("Reading level"),
  keywords: z.array(z.string())
    .describe("Top 5 target keywords"),
  improvements: z.array(z.object({
    area: z.string(),
    suggestion: z.string(),
    priority: z.enum(["high", "medium", "low"]),
  })).describe("Specific improvement suggestions"),
  estimatedReadingTime: z.number()
    .describe("Estimated reading time in minutes"),
})

type BlogAnalysis = z.infer<typeof BlogAnalysisSchema>

export async function analyzeBlogPost(
  content: string
): Promise<BlogAnalysis> {
  const { object } = await generateObject({
    model: openai("gpt-4o"),
    schema: BlogAnalysisSchema,
    prompt: `Analyze this blog post for SEO and readability.
Provide actionable improvement suggestions.

Blog content:
${content}`,
  })

  return object
  // TypeScript knows this is BlogAnalysis
  // No parsing, no try-catch, no "invalid JSON" errors
}

The generateObject function handles schema conversion, model-specific formatting, and validation automatically. For OpenAI models, it uses the structured output API (response_format with json_schema). For Anthropic, it uses tool_use with a single-tool pattern. For models without native structured output support, it falls back to prompt-based extraction with retry logic. You do not need to know or care about these implementation details — the SDK abstracts them behind a consistent interface.

Streaming Structured Output

"use server"

import { openai } from "@ai-sdk/openai"
import { streamObject } from "ai"
import { z } from "zod"

const ProductSchema = z.object({
  name: z.string(),
  tagline: z.string(),
  features: z.array(z.object({
    title: z.string(),
    description: z.string(),
  })),
  targetAudience: z.string(),
  pricingTier: z.enum(["free", "starter", "pro", "enterprise"]),
})

export async function generateProductBrief(description: string) {
  const result = streamObject({
    model: openai("gpt-4o"),
    schema: ProductSchema,
    prompt: `Generate a product brief for: ${description}`,
  })

  return result.toTextStream()
}

// Client component
"use client"
import { useObject } from "ai/react"
import { generateProductBrief } from "@/app/actions/product"

function ProductForm() {
  const { object, submit, isLoading } = useObject({
    api: generateProductBrief,
    schema: ProductSchema,
  })

  // 'object' is partially typed as the stream builds up
  // object.name might exist before object.features
  return (
    <div>
      {object?.name && <h2>{object.name}</h2>}
      {object?.tagline && <p>{object.tagline}</p>}
      {object?.features?.map((f, i) => (
        <div key={i}>
          <h3>{f.title}</h3>
          <p>{f.description}</p>
        </div>
      ))}
    </div>
  )
}

Streaming structured output is particularly useful for UI that displays structured data progressively. As the model generates each field of the JSON object, the useObject hook updates the partial object in real-time. Users see the name appear first, then the tagline, then features populating one by one. This progressive rendering feels significantly faster than waiting for the complete object, even though the total generation time is identical.

Tool Calling and Function Execution

Tools give your AI application the ability to interact with external systems: query databases, call APIs, perform calculations, or trigger workflows. In AI SDK 6, tools are defined with typed parameters (via Zod schemas) and execute functions. The model decides when to call a tool based on the conversation context, and the SDK handles the multi-step execution flow automatically.

Tool Definition and Execution

"use server"

import { openai } from "@ai-sdk/openai"
import { streamText, tool } from "ai"
import { z } from "zod"

export async function chatWithTools(messages) {
  const result = streamText({
    model: openai("gpt-4o"),
    system: "You are a marketing analytics assistant.",
    messages,
    tools: {
      getWebsiteMetrics: tool({
        description: "Get website analytics metrics for a domain",
        parameters: z.object({
          domain: z.string().describe("The website domain"),
          dateRange: z.enum(["7d", "30d", "90d"])
            .describe("Time period for metrics"),
        }),
        execute: async ({ domain, dateRange }) => {
          // In production: call your analytics API
          const metrics = await fetchAnalytics(domain, dateRange)
          return {
            visitors: metrics.visitors,
            pageViews: metrics.pageViews,
            bounceRate: metrics.bounceRate,
            topPages: metrics.topPages.slice(0, 5),
          }
        },
      }),

      generateSeoAudit: tool({
        description: "Run an SEO audit on a URL",
        parameters: z.object({
          url: z.string().url().describe("URL to audit"),
        }),
        execute: async ({ url }) => {
          const audit = await runSeoCheck(url)
          return {
            score: audit.score,
            issues: audit.issues,
            recommendations: audit.recommendations,
          }
        },
      }),

      searchBlogPosts: tool({
        description: "Search published blog posts by topic",
        parameters: z.object({
          query: z.string().describe("Search query"),
          limit: z.number().default(5)
            .describe("Max results to return"),
        }),
        execute: async ({ query, limit }) => {
          const posts = await searchPosts(query, limit)
          return posts.map(p => ({
            title: p.title,
            url: p.url,
            summary: p.excerpt,
          }))
        },
      }),
    },
    maxSteps: 5,  // Allow up to 5 tool calls per response
  })

  return result.toDataStream()
}

The maxSteps parameter controls how many tool-calling rounds the model can perform. With maxSteps set to 5, the model can call a tool, receive the result, decide to call another tool based on the first result, and repeat up to 5 times before generating its final response. This enables complex workflows like: "Get website metrics for example.com, then run an SEO audit on the top-performing page, then search our blog for related content to recommend."

How Multi-Step Tool Calling Works

User asks: "How is our SEO performing this month?"
Model decides to call getWebsiteMetrics with domain "digitalapplied.com" and dateRange "30d"
SDK executes the tool function and passes the result back to the model
Model analyzes the metrics and decides to call generateSeoAudit on the top page
SDK executes the audit and returns results
Model generates a natural language response combining metrics data and audit findings

Security consideration: Tool execute functions run on your server with full access to your backend. Always validate and sanitize tool parameters, even though they are Zod-validated, because the model controls what values are passed. Implement authorization checks inside tool functions to prevent the model from accessing resources the user is not permitted to see.

Multi-Provider Support

Production AI applications rarely rely on a single model provider. Different models excel at different tasks: Claude is strong at analysis and nuanced writing, GPT-4o offers broad general capability with fast inference, and Gemini provides cost-effective performance for simpler tasks. AI SDK 6 makes multi-provider setups straightforward with its unified model interface.

Model Router Pattern

import { openai } from "@ai-sdk/openai"
import { anthropic } from "@ai-sdk/anthropic"
import { google } from "@ai-sdk/google"
import type { LanguageModelV1 } from "ai"

// Define model tiers by use case
const models = {
  // Complex analysis, strategy, long-form content
  premium: anthropic("claude-sonnet-4-6"),

  // General purpose, fast, reliable
  standard: openai("gpt-4o"),

  // Simple queries, classification, extraction
  economy: google("gemini-3.1-flash"),

  // Ultra-fast for autocomplete, suggestions
  fast: openai("gpt-4.1-mini"),
} satisfies Record<string, LanguageModelV1>

type ModelTier = keyof typeof models

// Route queries based on complexity
function selectModel(query: string): ModelTier {
  const wordCount = query.split(" ").length

  // Long, complex queries → premium model
  if (wordCount > 100) return "premium"

  // Questions requiring analysis → standard
  if (query.includes("analyze") || query.includes("compare"))
    return "standard"

  // Short, simple queries → economy
  if (wordCount < 20) return "economy"

  return "standard"
}

// Usage in Server Action
export async function smartChat(messages) {
  const lastMessage = messages.at(-1)?.content ?? ""
  const tier = selectModel(lastMessage)

  const result = streamText({
    model: models[tier],
    messages,
  })

  return result.toDataStream()
}

OpenAI

GPT-4o: Best general-purpose, fast streaming, strong tool use
GPT-4.1-mini: 90% of GPT-4o quality at 15% of cost
o3: Best for complex reasoning, math, code generation

Anthropic

Opus 4.6: Most capable, best for complex analysis and long-form
Sonnet 4.6: Balanced speed and quality, excellent coding
Haiku 4.5: Fastest Anthropic model, great for classification

Google

Gemini 3.1 Pro: Largest context window (2M tokens), multimodal
Gemini 3.1 Flash: Cost-effective, fast for high-volume tasks
Gemini 3.1 Flash-Lite: Cheapest option, simple extraction tasks

The model router pattern is production-critical for cost management. In a typical AI application, 60-70% of queries are simple and can be handled by economy-tier models at 10% of the cost of premium models. By routing intelligently, you can reduce your monthly AI inference bill by 50-60% without any perceived quality degradation for most users. Monitor the classification accuracy of your router and adjust thresholds based on user feedback.

Rate Limiting and Error Handling

AI APIs are expensive and abuse-prone. Without rate limiting, a single bad actor or bug in your frontend can rack up thousands of dollars in API costs in minutes. Production AI applications need rate limiting at the user level, request validation, and comprehensive error handling to be viable.

Rate-Limited Server Action with Error Handling

"use server"

import { openai } from "@ai-sdk/openai"
import { streamText } from "ai"
import { Ratelimit } from "@upstash/ratelimit"
import { Redis } from "@upstash/redis"
import { headers } from "next/headers"

const ratelimit = new Ratelimit({
  redis: Redis.fromEnv(),
  limiter: Ratelimit.slidingWindow(20, "1h"),
  analytics: true,
  prefix: "ai-chat",
})

export async function chat(messages) {
  // 1. Get user identifier
  const headersList = await headers()
  const ip = headersList.get("x-forwarded-for") ?? "anonymous"

  // 2. Check rate limit
  const { success, remaining, reset } = await ratelimit.limit(ip)
  if (!success) {
    throw new Error(
      `Rate limit exceeded. Try again in ${Math.ceil((reset - Date.now()) / 1000)}s. ${remaining} requests remaining.`
    )
  }

  // 3. Validate input
  const lastMessage = messages.at(-1)
  if (!lastMessage || typeof lastMessage.content !== "string") {
    throw new Error("Invalid message format")
  }

  if (lastMessage.content.length > 10_000) {
    throw new Error("Message too long. Maximum 10,000 characters.")
  }

  // 4. Stream with error handling
  try {
    const result = streamText({
      model: openai("gpt-4o"),
      messages,
      maxTokens: 2048,
      abortSignal: AbortSignal.timeout(30_000),
      onError: ({ error }) => {
        // Log but don't expose internal errors to client
        console.error("Stream error:", error)
      },
    })

    return result.toDataStream()
  } catch (error) {
    if (error instanceof Error) {
      if (error.message.includes("rate_limit")) {
        throw new Error("AI provider rate limit hit. Please retry.")
      }
      if (error.message.includes("context_length")) {
        throw new Error(
          "Conversation too long. Start a new chat."
        )
      }
    }
    throw new Error("Failed to generate response. Please retry.")
  }
}

The rate limiter shown above uses Upstash Redis for a distributed sliding window implementation. This works across multiple Vercel serverless function instances because the state is stored in Redis, not in-memory. The sliding window algorithm allows 20 requests per hour per IP, smoothing out bursts rather than allowing 20 requests in the first minute and then blocking for 59 minutes.

Error Handling Checklist

Rate limiting: Per-user limits on requests per time window. Track by IP for anonymous users, by user ID for authenticated users
Input validation: Maximum message length, conversation history depth limits, content filtering for prompt injection attempts
Timeout handling: AbortSignal.timeout to kill requests that exceed acceptable response times. 30 seconds is a reasonable maximum for chat responses
Provider error mapping: Convert provider-specific errors (rate_limit_exceeded, context_length_exceeded) into user-friendly messages
Cost monitoring: Track token usage per user and set monthly budget caps. Alert when a single user or session exceeds normal consumption patterns

For authenticated applications, replace IP-based rate limiting with user ID-based limits. This prevents circumvention via VPNs and provides more accurate per-user tracking. If your application has free and paid tiers, configure different rate limits per tier: 5 requests/hour for free users, 100 requests/hour for Pro users, unlimited for Enterprise. Upstash's Ratelimit library supports prefix-based namespacing to run multiple limit configurations simultaneously.

Production Deployment on Vercel

Deploying an AI SDK 6 application on Vercel requires attention to a few platform-specific configurations that affect performance, cost, and reliability. These optimizations are specific to Vercel but the principles apply to any serverless platform.

Function Configuration

maxDuration: Set to 30-60 seconds for chat Server Actions. Default 10s is too short for complex LLM responses
Region: Deploy functions in the same region as your primary user base. US East (iad1) for US-centric applications
Memory: Default 1024MB is sufficient for most AI SDK workloads. Increase only if processing large documents

Performance Optimization

Streaming: Always use streamText instead of generateText for user-facing chat. Time to first token matters more than total generation time
Edge runtime: Consider edge functions for latency-sensitive chat routes. Reduces cold start time significantly
Bundle size: Import only the providers you use. Each provider package is tree-shakeable

Production-Ready Server Action

// app/actions/chat.ts
"use server"

import { openai } from "@ai-sdk/openai"
import { anthropic } from "@ai-sdk/anthropic"
import { streamText } from "ai"
import type { CoreMessage } from "ai"

// Vercel function configuration
export const maxDuration = 60

export async function chat(messages: CoreMessage[]) {
  // Select model based on conversation complexity
  const messageCount = messages.length
  const model = messageCount > 10
    ? anthropic("claude-sonnet-4-6")  // Better at long conversations
    : openai("gpt-4o")               // Faster for short exchanges

  const result = streamText({
    model,
    system: `You are a helpful AI assistant. Be concise and accurate.
When you don't know something, say so clearly.
Format responses with markdown when helpful.`,
    messages,
    maxTokens: 4096,
    temperature: 0.7,
    // Track token usage for cost monitoring
    experimental_telemetry: {
      isEnabled: true,
      functionId: "chat",
    },
  })

  return result.toDataStream()
}

Before deploying, verify that your environment variables are configured in Vercel's project settings. API keys should be set as encrypted environment variables, not committed to source control. Use separate API keys for development and production to keep cost tracking clear and to enable independent rate limit management.

Cost monitoring: Set up spending alerts in your AI provider dashboards (OpenAI, Anthropic, Google). Also use Vercel's usage analytics to track function invocations and duration. A sudden spike in function duration usually means a user is sending extremely long conversations, which increases both inference cost and function compute cost. Implement conversation length limits to prevent this.

The combination of AI SDK 6, Next.js, and Vercel provides the most streamlined path from prototype to production for AI-powered applications. The Server Action architecture eliminates infrastructure complexity, the unified model interface prevents vendor lock-in, and Vercel's serverless platform handles scaling automatically. For teams building customer-facing AI features, this stack reduces time-to-production from months to weeks. Our AI integration services help businesses implement these patterns at production scale.