SYS/2026.Q1Agentic SEO audits delivered in 72 hoursSee how →
DevelopmentTutorial17 min readPublished May 2, 2026

From npx create-next-app to a deployed Vercel chatbot in eight steps — streaming, tool calls, multi-provider routing, shadcn UI, all production-shaped.

Build a Next.js 16 AI Chatbot: AI SDK Tutorial from Scratch

An end-to-end build of a production-shaped chatbot on Next.js 16 App Router using Vercel AI SDK 6. Streaming, tool calls with Zod, provider switching, shadcn/ui message scaffold, Upstash rate limits, and a one-command Vercel deploy.

DA
Digital Applied Team
Senior engineers · Published May 2, 2026
PublishedMay 2, 2026
Read time17 min
SourcesVercel AI SDK 6 docs
Lines of TSX (full app)
~600
scaffold + UI + route
Time to first deploy
≈ 45min
from clone to prod URL
Cost per 1k chats
~$3–5
Sonnet, ~800-tok turns
P50 first-token latency
~280ms
edge runtime, US-East

A Next.js 16 AI chatbot built on the Vercel AI SDK is the shortest path from a blank repo to a production-shaped agentic interface. Eight steps, roughly 600 lines of TSX, and a deploy that lands on Vercel with streaming, tool calls, and provider switching wired from day one. This tutorial is the canonical reference build.

The reason this stack matters right now: chatbots are the visible tip of the agentic iceberg. The same primitives that power a customer-facing chat surface — streaming responses, tool invocations, structured outputs, provider abstraction — power every higher-order agent built on top. Getting the chat layer right is what compounds when you add retrieval, memory, and multi-turn planning later.

What this guide covers: scaffolding with create-next-app on the App Router; an /api/chat route handler running streamText; the useChat hook on the client; a shadcn/ui-styled message list with auto-scroll; Zod tool schemas executed server-side and rendered client-side; single-env-var provider switching across Anthropic, OpenAI, Google, and xAI; rate limiting, error boundaries, and abuse protection; and a one-command Vercel deploy. Code is copy-pasteable. No prior AI SDK experience assumed.

Key takeaways
  1. 01
    useChat is the hook that does what you'd build yourself — but better.Optimistic UI, streaming parts, message state, abort handling, and tool-call rendering are all wired in. Roll your own only if you have a reason the SDK can't satisfy.
  2. 02
    streamText with Zod tool schemas is the right server-side primitive.Single API surface for streaming, tool execution, structured output, and provider switching. The model-agnostic shape is what keeps the route handler portable across vendors.
  3. 03
    Provider-adapter pattern keeps you portable.One environment variable flips the underlying model between Anthropic, OpenAI, Google, and xAI. Don't hard-code a provider in the route — the cost-quality matrix shifts every quarter.
  4. 04
    Rate-limit at the edge before you hit Vercel function quotas.Upstash Redis plus an Edge middleware costs cents at typical SaaS scale and prevents the abuse cases that drain a budget overnight. Implement on day one, not after the first incident.
  5. 05
    Tool-call UX is half the perceived quality.Show what the agent is doing while it's doing it. Skeleton states during tool execution, structured tool-result cards, and citation chips on retrieval calls are non-negotiable for production polish.

01Scaffoldcreate-next-app, AI SDK, shadcn.

The starting point is a fresh Next.js 16 app on the App Router with TypeScript and Tailwind enabled by default. The non-default choices: turn on the App Router (it is the default in 16, but confirm during the prompt), keep Turbopack as the bundler, and decline the src/ directory unless your team's convention requires it. The chatbot lives entirely under app/ and components/.

The three install commands

Three terminal steps get the dependencies in place. The AI SDK 6 line includes the React bindings, the core, and one or more provider packages. We install all four mainstream providers up front so the provider-switch section is a one-line change later, not a re-installation.

Commands

npx create-next-app@latest chatbot --typescript --tailwind --app --turbopack
cd chatbot && pnpm add ai @ai-sdk/react @ai-sdk/anthropic @ai-sdk/openai @ai-sdk/google @ai-sdk/xai zod
pnpm dlx shadcn@latest init && pnpm dlx shadcn@latest add button card input textarea scroll-area

That is the entire dependency surface. The ai package is the core SDK; @ai-sdk/react exposes useChat; each provider package is a thin adapter; zod drives tool-call validation. Pin the AI SDK to ^6 in package.json so minor bumps land but breaking 7.0 changes are explicit.

One tsconfig adjustment is worth making early: ensure moduleResolution is "bundler" and target is at least "ES2022". The AI SDK uses native ReadableStream APIs and top-level await patterns that older module resolutions stumble on. The create-next-app default already lands here in Next.js 16, but verify after install — a one-line mistake here produces opaque "cannot resolve" errors three sections from now.

Add a single environment variable to .env.local before going further: ANTHROPIC_API_KEY=sk-ant-…. The tutorial defaults to Claude Sonnet 4.7 because the streaming and tool-calling APIs are the most stable across providers right now; swap providers in Section 06.

The three install commands above produce a project tree with roughly forty files — most of them shadcn-generated UI source. The chatbot itself will live in two new files we add over the next two sections: app/api/chat/route.ts (the server-side route handler) and components/chat.tsx (the client component). The tutorial's full surface area is small on purpose; the rest of the repo is scaffolding that ships with create-next-app and shadcn.

Core
The ai package
ai@^6 · ~80 KB gzipped

The model-agnostic core. Exposes streamText, generateText, generateObject, embed, and the streaming primitives that the React bindings wrap. Everything route-handler-side imports from here.

import { streamText } from 'ai'
Client
@ai-sdk/react
@ai-sdk/react@^6

React bindings. The useChat hook owns message state, streaming reassembly, optimistic updates, abort signals, and tool-call rendering. Client-side surface area is small by design.

import { useChat } from '@ai-sdk/react'
Providers
Thin adapters
@ai-sdk/{anthropic,openai,google,xai}

One package per vendor. Each exports a factory that returns an SDK-compatible model object. Switching providers is a one-line change in the route handler — no rewrites of message shape, no per-vendor stream parsing.

anthropic('claude-sonnet-4-7')

One scaffold-time decision worth making consciously: the App Router defaults to colocating components inside app/ rather than under a separate components/ directory. For a chatbot that may grow into a multi-surface product, keep components/chat/ as a sibling directory — the chat will eventually be embedded in marketing pages, dashboards, and possibly a public widget, and a top-level component directory keeps the import paths clean. The shadcn primitives go in components/ui/ by default and should stay there.

02API RoutestreamText in a route handler.

The server-side surface is a single App Router route handler at app/api/chat/route.ts. It accepts POST requests carrying a JSON body with the conversation history, invokes streamText against the chosen model, and returns the stream as a UI-Message response that useChat knows how to consume on the client. Roughly 30 lines for the minimum viable shape.

The full minimum route

app/api/chat/route.ts

import { anthropic } from '@ai-sdk/anthropic';
import { streamText, convertToModelMessages, type UIMessage } from 'ai';

export const maxDuration = 30;

export async function POST(req: Request) {
  const { messages }: { messages: UIMessage[] } = await req.json();
  const result = streamText({
    model: anthropic('claude-sonnet-4-7'),
    system: 'You are a concise, helpful assistant. Cite sources when retrieval tools are used.',
    messages: convertToModelMessages(messages),
  });
  return result.toUIMessageStreamResponse();
}

Two patterns to internalize here. First, convertToModelMessages is the bridge between the client's UIMessage shape (which carries tool-call parts, attachments, and ID metadata for React's reconciliation) and the provider-neutral model message shape that streamText expects. Always convert at the boundary — never pass UI messages directly to the model. Second, result.toUIMessageStreamResponse() returns the Server-Sent-Events stream in the protocol that useChat consumes natively. No manual JSON chunking, no per-provider decoder.

maxDuration = 30 caps the route at 30 seconds, which is comfortable for Sonnet-class responses on the Vercel Hobby and Pro plans. Bump to 60 if you expect long reasoning traces with Think Max mode; bump to 300 only on the Enterprise plan where the function timeout ceiling is raised. The route is a Node.js runtime by default — that is the right choice when calling third-party SDKs, which often rely on Node-only APIs. Switch to the Edge runtime only when you have measured a latency benefit and confirmed no Node dependencies.

"The shortest path from blank repo to streaming chatbot is a 30-line route handler plus a 50-line client component. Everything else is polish."— Our reading of the AI SDK 6 quickstart, May 2026

A note on the system prompt. The AI SDK passes whatever string you put in system as a top-level system message to the provider, ahead of the converted message history. Keep this string tight and behavior-shaping rather than knowledge-loading — the model is already smart, what it needs from you is constraints, tone, and tool-use policy. A useful template is three sentences: (1) the assistant's role and audience, (2) the output style (length, format, citation policy), (3) the tool-use policy (when to invoke search, how to handle ambiguous queries, when to ask a clarifying question). Anything longer tends to drift into knowledge that belongs in the retrieved context, not the static prompt.

For multi-turn conversations, the AI SDK automatically passes the entire messages array on every turn, which gives the model perfect context within its window. For extremely long conversations that approach the context limit, implement a summarization step at, say, turn 30 — generate a structured summary of turns 1–25, replace those turns with a single system message containing the summary, and continue. The AI SDK's generateObject with a Zod schema is the right tool for the summarization step; reserve streamText for the user-facing assistant turns.

03useChatOptimistic UI, streaming, message state.

The client side is a single React component marked 'use client' that calls useChat() and renders the returned message array. The hook owns everything you would otherwise hand-build: streaming reassembly, optimistic user-message insertion before the server responds, abort signals on form re-submit, message ID generation, and tool-call part rendering.

What the hook returns

The shape is small enough to memorize. messages is the ordered array of UI messages, each with a stable id, a role of user or assistant, and a parts array carrying text, tool calls, and tool results. sendMessage(text) appends a user message and triggers the server call. status is one of ready / submitted / streaming / error. stop() aborts an in-flight stream. regenerate() retries the last assistant turn.

components/chat.tsx (minimal)

'use client';
import { useChat } from '@ai-sdk/react';
import { useState } from 'react';

export function Chat() {
  const { messages, sendMessage, status, stop } = useChat();
  const [input, setInput] = useState('');
  return (
    <div>
      {messages.map((m) => (
        <div key={m.id}>{m.role}: {m.parts.map(p => p.type === 'text' ? p.text : null)}</div>
      ))}
      <form onSubmit={(e) => { e.preventDefault(); sendMessage({ text: input }); setInput(''); }}>
        <input value={input} onChange={(e) => setInput(e.target.value)} disabled={status === 'streaming'} />
      </form>
    </div>
  );
}

Two implementation rules to lock in early. First, always render from the parts array, never from a presumed content string. The AI SDK 6 message shape is part-oriented because a single assistant turn can interleave text, tool calls, tool results, and reasoning traces — flattening to a string loses tool-call UX and breaks the moment you add a tool in Section 05. Second, never mutate messages directly; the hook owns the array and exposes setMessages only for advanced cases (initial-history hydration, server-side persistence). Treat it as read-only in everyday rendering.

status is the prop that drives the perceived-quality half of this UX. While the value is streaming, the input should be disabled and a streaming indicator should render on the assistant placeholder. When it flips to ready, re-enable the input and focus it. The reference useChat hook in @ai-sdk/react handles the optimistic user-message insertion before the network round trip completes — that is what eliminates the "did my message send?" perception gap you get from a naive fetch-and-render implementation.

04UI Scaffoldshadcn/ui chat — message list, input, scroll-anchor.

The minimum chat from Section 03 is ugly on purpose — clean primitives are easier to reason about. The production-shaped UI adds five pieces on top: a Card-based message list with role badges, an auto-scroll anchor pinned to the latest message, a multiline Textarea with Cmd+Enter submission, role-aware bubble styling, and a streaming skeleton on the assistant placeholder while status === 'streaming'.

shadcn/ui is the right component library for this surface because it generates source files into your repo rather than installing a black-box dependency — every variant of every bubble is a file you own and can edit. The five components installed in Section 01 cover the entire chat scaffold: Card for message wrappers, Button for the send action, Textarea for input, ScrollArea for the message column, and Input as a fallback.

The three message visual states

The grid below shows the three rendering states the message list cycles through. Get all three right and the chat reads as production polish. Skip the streaming state and the chat feels broken; skip the role badges and the conversation gets confusing past the third turn.

User
User message
right-aligned · zinc-900 bg · white text

Rendered immediately on sendMessage — optimistic insert from useChat. Pin to the right column, no avatar, role badge optional. Markdown rendering off; this is user-authored text.

role === 'user'
Streaming
Assistant streaming
left-aligned · pulsing cursor · skeleton

Renders while status === 'streaming'. Text appears token-by-token as parts flush; show a thin pulsing cursor at the tail. Tool-call parts render as inline skeleton cards mid-stream.

role === 'assistant' · pending
Complete
Assistant complete
left-aligned · markdown · syntax highlight

Final assistant turn. Render markdown with a server-safe renderer (react-markdown + rehype-sanitize). Code blocks pass through shiki for syntax highlight. Tool-result cards land here as structured chips with citation links.

role === 'assistant' · complete
"Tool-call UX is half the perceived quality. Show what the agent is doing while it's doing it."— Internal playbook for production chat surfaces

Auto-scroll is the one piece teams get wrong most often. The wrong implementation re-scrolls on every token, which feels janky and breaks user-initiated up-scroll. The right implementation checks whether the user is currently within ~120 pixels of the bottom on each render; if so, snap to bottom on the next frame, if not, leave the scroll position alone. Use a ref on a sentinel <div /> at the bottom of the list and an IntersectionObserver; do not call scrollIntoView on every message append.

Markdown rendering should be done with a sanitizing pipeline. react-markdown plus rehype-sanitize plus remark-gfm covers tables, task lists, and inline code. For syntax-highlighted code blocks, shiki renders server-side at request time — slower than a raw <pre> but produces the best-looking output and keeps the bundle small. Lazy-load shiki only when an assistant message contains a code fence.

05Tool CallsZod schemas, server-side tools, client-rendered results.

Tools are where the chatbot stops being a Q&A surface and becomes an agentic interface. The AI SDK 6 implementation is the cleanest of any major framework: define each tool with a name, a Zod input schema, a description, and an execute function that runs server-side. The SDK handles routing the model's structured tool-call output to your function and streaming the result back as a part on the assistant message.

Two example tools

A web search tool and a URL fetch tool are the canonical pair — together they make the chatbot research-capable without any RAG infrastructure. Add them to the streamText call as a tools object. The model decides when to invoke either; the SDK serializes the calls and waits for results before continuing generation.

app/api/chat/route.ts — tools added

import { z } from 'zod';
import { tool, stepCountIs } from 'ai';

const result = streamText({
  model: anthropic('claude-sonnet-4-7'),
  system: '…',
  messages: convertToModelMessages(messages),
  stopWhen: stepCountIs(5),
  tools: {
    search: tool({
      description: 'Search the public web. Returns top 5 results.',
      inputSchema: z.object({ query: z.string().min(3) }),
      execute: async ({ query }) => await searchProvider(query),
    })
  }
});

The pitfall to avoid

The most common bug shipping a tool-augmented chat for the first time is forgetting to render the tool parts on the client. The assistant message's parts array now contains tool-call and tool-result entries interleaved with text. If your render loop only matches p.type === 'text', the model appears to silently invoke tools and produce a final answer with no visible intermediate work. Always render every part type — text, tool-call (with a skeleton while pending), tool-result (with a structured card), and reasoning (folded by default).

stopWhen: stepCountIs(5) caps the agent loop at five tool-call rounds per assistant turn. The AI SDK's multi-step tool loop will keep invoking execute functions and feeding results back to the model until the model emits a final text response or the cap is hit. Five is a reasonable default for chat — high enough to handle "search, fetch one result, answer" chains, low enough to bound cost on a runaway loop.

For client-side tool rendering, the part shape is { type: 'tool-call', toolName, args, toolCallId } on the request side and { type: 'tool-result', toolName, result, toolCallId } on the response side. Render the request as a skeleton card ("Searching for <query>…"); render the result as a structured card with click-through. Cite the tool result in any follow-up text rendered by the model. This citation UX is what differentiates a real research agent from a chatbot that hallucinates URLs.

06Provider SwitchOne env flip — Anthropic, OpenAI, Google.

The provider-adapter pattern is the AI SDK's quiet superpower. Every adapter exports a function that returns a model object implementing the same internal interface, which means swapping anthropic('claude-sonnet-4-7') for openai('gpt-5-5') is a one-line change. No re-shaping of messages, no per-provider stream parsing, no per-provider tool-call format.

The adapter pattern in the route handler

Drive provider selection from an environment variable so production deploys can flip providers without a redeploy. Implement a thin getModel() helper that reads process.env.AI_PROVIDER and returns the appropriate model. Cache the result on first call.

lib/model.ts

import { anthropic } from '@ai-sdk/anthropic';
import { openai } from '@ai-sdk/openai';
import { google } from '@ai-sdk/google';
import { xai } from '@ai-sdk/xai';

export function getModel() {
  switch (process.env.AI_PROVIDER) {
    case 'openai': return openai('gpt-5-5');
    case 'google': return google('gemini-3.1-pro');
    case 'xai': return xai('grok-5');
    default: return anthropic('claude-sonnet-4-7');
  }
}

The matrix below summarizes the four mainstream providers as of May 2026. Pick a default based on the workload's quality-cost target, then keep the switch in place so you can re-route when prices shift or a new generation lands. Provider lock-in is the silent killer of long-lived AI products; the SDK eliminates it.

Anthropic
Claude Sonnet 4.7

Best general-purpose default for chat. Strongest tool-call reliability, cleanest streaming behavior, conservative refusal posture. Mid-priced ($3 input / $15 output per 1M). Default for this tutorial.

Pick as default
OpenAI
GPT-5.5

Strong on agentic coding and structured outputs. JSON mode is the most reliable in the field. Slightly higher list price than Sonnet, but generous batch discounts. Route here for code-heavy tools.

Pick for coding agents
Google
Gemini 3.1 Pro

Price-leading on long-context. 1M-token window at meaningfully lower per-token cost than peers. Best when ingesting large documents or long chat histories. Tool-call shape is still maturing — test thoroughly.

Pick for long-context
xAI
Grok 5

Lowest-latency option for chat-style turns and best real-time web grounding. Good fallback when Anthropic or OpenAI are rate-limiting. Smaller ecosystem of evals — your own measurement matters more here.

Pick for latency or fallback

One operational note on Anthropic specifically: the prompt caching feature is materially valuable for chatbots with a stable system prompt. Mark the system message as cacheable in the provider options, and subsequent requests within the five-minute cache window pay only 10% of the input-token cost for the cached prefix. For a chatbot that handles 1,000 sessions a day with a 500-token system prompt, this is the difference between a meaningful and a trivial monthly bill. The other providers have similar but less mature implementations; the pattern is worth implementing the moment you settle on a default provider.

For production deployments, do not hard-code AI_PROVIDERas a single value. Instead, configure a small routing layer that picks the provider per request based on the workload signature — a "default to Sonnet, route code questions to GPT-5.5, route long-document Q&A to Gemini" policy can run inside the route handler with no extra infrastructure. The Vercel AI Gateway is a managed alternative covered in Section 08; both work with the same adapter pattern.

A pragmatic note on tool ergonomics. Keep tool schemas tight — three or four required arguments, no unbounded freeform fields, descriptions written for the model not for the human reader. The model uses the description text to decide when to invoke each tool, so phrase descriptions in terms of the trigger condition ("Search the public web for current information when the question cannot be answered from training data alone") rather than the mechanism ("Calls the Bing Search API"). The difference in tool-call accuracy is large and shows up immediately in production evals.

07Production ShapeRate limits, error boundaries, abuse protection.

The four pieces between "demo running on localhost" and "chatbot deployed in front of real users" are rate limiting, error boundaries on the client, abuse protection on the route, and an auth gate when the chatbot is anything other than fully public. Each is a 20-to-50-line change and the order below is the order to ship them in.

Upstash rate limit at the edge

Add Upstash Redis as a Vercel Marketplace integration (free tier covers small-app traffic), then wrap the route in a per-IP and per-user rate limit using @upstash/ratelimit. A sliding-window limit of 10 requests per 10 seconds and 100 per hour catches both burst-abuse and sustained-abuse vectors without inconveniencing legitimate users. The check runs before the expensive streamText call, so a blocked request costs effectively nothing.

The order in which these four pieces ship matters more than the specific implementations. Rate limiting first because it protects every other piece — without it, the first abuse incident is a multi-thousand-dollar provider bill. Error boundaries second because they prevent a single malformed message from breaking the entire chat surface for every other user in that session. Abuse protection third because by this point you have logging and rate limits to inform the policy. Auth last because every chat that needs persistent identity or per-user policy needs auth, but the chatbot can ship to a limited beta before that is in place. Skipping any of the four is a recoverable mistake; shipping them in the wrong order usually means re-shipping them under deadline pressure during an incident.

Rate limit
10rps
Sliding window

@upstash/ratelimit with sliding-window strategy. Per IP and per authenticated user ID, whichever is stricter. Block at 10 req / 10 s burst and 100 req / hour sustained. Return 429 with Retry-After.

Edge runtime · ~5 ms overhead
Error boundary
1ErrorBoundary
React error boundary

Wrap <Chat /> in a React ErrorBoundary that catches render-time crashes from malformed parts arrays or markdown failures. Show a 'reload conversation' button. Log to Sentry or your observability layer.

Client-side recovery
Auth gate
1middleware
Per-user identification

Even for public chats, identify users via a signed cookie. Required for per-user rate limiting, conversation history, and abuse forensics. Supabase Auth, Clerk, or a homemade cookie all work — pick once and standardize.

Required for per-user limits

Client-side error boundaries are the second piece. The AI SDK's client is robust, but the wider chat surface — markdown rendering, syntax highlight, custom tool-result cards — can crash on malformed input. Wrap <Chat /> in a React ErrorBoundary that catches render-time crashes and offers a "reload conversation" affordance. Pair with an onError callback on useChat to surface server-side failures (rate-limit blocks, provider errors, timeouts) as inline toast messages, not as silent dead-end states.

Abuse protection beyond rate limiting is mostly about input filtering. The simplest effective layer is a tool that refuses requests matching prompt-injection patterns or known jailbreaks before they reach the model. Keep the filter narrow — overly aggressive filtering frustrates legitimate users — and log every block so you can tune the rule set. For higher-stakes deployments, layer a moderation API call (OpenAI's free moderation endpoint or Anthropic's classifier) before streamText on every inbound user message.

For teams interested in the cost-per-token side of the picture before committing to a provider default, our LLM API pricing index for Q2 2026 tracks the input and output rates across the major providers, normalizes batch and caching discounts, and projects monthly spend at realistic chat volumes. Re-check the index quarterly; the cost-quality frontier is the single most volatile variable in this stack and the provider-adapter pattern only pays off if you are actually willing to switch when the math changes.

Authentication is the fourth piece and the one most often deferred. Even for fully public chats, identify the user via a signed cookie so per-user rate limits are meaningful and so conversation history can be persisted later. Supabase Auth ( our preferred default) and Clerk both integrate cleanly with the App Router. The session is read inside the route handler and threaded into the rate-limit key.

08DeployVercel — one command, env vars, function regions.

The deploy step is the easiest of the eight. Run vercel deploy --prod from the repo root after linking the project. Vercel detects Next.js 16, builds with Turbopack, and ships the route handler as a Node.js Serverless Function. The first prod URL is live in under three minutes on an unprimed cache.

The env-var checklist

Before the first prod deploy, set the provider keys and any integration tokens in the Vercel project's Environment Variables settings — or better, pull them locally with vercel env pull after configuring the production environment. The minimum set is ANTHROPIC_API_KEY (or the equivalent for your chosen default), UPSTASH_REDIS_REST_URL / UPSTASH_REDIS_REST_TOKEN for rate limiting, and AI_PROVIDER if you want to override the default.

Region selection matters for latency. By default, the route handler runs in the deployment's primary region. For chat that calls Anthropic or OpenAI endpoints, US-East (iad1) is the lowest-latency choice today — both providers route from US-East datacenters. Pin the function region in vercel.json if your overall deployment lives elsewhere. The P50 first-token latency in the stats card above reflects this configuration.

Optional: Vercel AI Gateway

For production deployments running multiple providers, the Vercel AI Gateway is worth considering as a managed replacement for the routing layer in Section 06. It exposes a single endpoint that routes to underlying providers based on configured policies, with built-in caching, retries, and observability. The adapter pattern still works; you point the SDK at the Gateway base URL instead of the vendor's API and keep the rest of the code unchanged. Costs scale with usage; evaluate against rolling your own.

One-command deploy

vercel link
vercel env pull .env.production.local
vercel deploy --prod

Three commands. The first links the local repo to a Vercel project (one-time). The second pulls the production env vars locally — useful for parity testing. The third deploys to production and prints the URL. Subsequent deploys land via git push on the configured branch.

A second deploy-time consideration: streaming responses behave differently on different hosting platforms. Vercel's Serverless Function runtime supports streaming natively via the platform's edge-aware response transport. On other platforms — particularly self-hosted Node containers behind older reverse proxies — you may need to disable response buffering at the proxy layer (Nginx proxy_buffering off;, Cloudflare's "Buffering" set to off) to see token-by-token output reach the client. Test the streaming behavior end-to-end on the actual deployment target before assuming it works.

Observability is the under-discussed part of running this in production. The minimum useful signal is structured logging of every assistant turn — provider, model, total input tokens, total output tokens, total tool-call count, total latency, cost estimate. Pipe to your existing observability layer (Datadog, Honeycomb, Vercel's built-in analytics, or a custom Postgres table) and alert on latency percentile shifts and on unusual token-consumption patterns. The AI SDK's onFinish callback on streamText is the right hook for this — it fires once per turn with the full usage breakdown.

What this build becomes when you keep going: add retrieval- augmented generation against a pgvector store for grounded answers (we cover that in our self-hosted RAG tutorial); add persistent multi-turn memory in a conversations table; expose the same agent as an MCP server for desktop assistants per our MCP server tutorial; or wire it to a workplace chat surface like Slack via the event subscriptions pattern. The chat surface in this tutorial is the foundation; every subsequent agentic capability composes on top of it.

One more deploy-time note worth internalizing: production chat surfaces should ship behind a feature flag for the first week of real traffic. Vercel Edge Config or a simple Postgres row both work — the goal is to be able to disable the chat without a redeploy if a provider has an outage, a cost-runaway bug ships, or a tool integration breaks. The flag check goes at the top of the route handler and returns a friendly 503 with a "temporarily unavailable" message if disabled. Costs nothing; saves the one bad afternoon every production chatbot eventually has.

Conclusion

Chatbots are the visible tip of the agentic stack — and the cheapest place to learn it.

Eight steps, roughly 600 lines of TSX, and a chatbot is live on Vercel with streaming, tool calls, multi-provider routing, shadcn UI, rate limits, and error boundaries. The full app costs cents per day to run during development and a few dollars per thousand chats at production scale on Sonnet. The shape is the canonical reference for any team adopting the AI SDK as their default agentic-interface stack.

What this becomes when you keep building: add retrieval against a vector store for grounded answers; persist conversations in aconversations table for cross-session memory; add authenticated accounts so the chatbot can act on a user's behalf; layer in agent-style multi-step planning via stopWhen increases and explicit tool sequencing. None of these changes require revisiting the foundations laid in this tutorial — the route handler stays a route handler, the useChat hook stays the client surface, and the adapter pattern keeps the provider question open.

The next milestones for a production rollout are observability, cost controls, and product-side polish. Wire OpenTelemetry into the route handler for token-level metrics; cap monthly spend per user via a Redis counter; add streaming-aware retry logic for transient provider errors; ship a conversation-export feature so users can save what matters. None of that is exotic; all of it compounds. The chatbot you ship today is the scaffold for the agent you ship in three months.

Ship chat into your product

Modern chat is table stakes — get it right early and the rest of the agentic UI compounds on top.

Our agentic engineering team designs and ships chat interfaces — provider-agnostic, multi-modal, tool-augmented — for product teams replacing search or building net-new AI features.

Free consultationExpert guidanceTailored solutions
What we ship

Chat engagements

  • Production chat interfaces on Next.js / Remix / Astro
  • Provider-agnostic stack with cost-quality routing
  • RAG-grounded answers with citation UX
  • Tool-augmented agents with skeleton states and replay
  • Auth, rate limits, abuse protection, observability
FAQ · AI chatbot build

The questions teams ask before shipping their first production chatbot.

Next.js 16 ships Turbopack as the default bundler with materially faster HMR on chat-style UIs, native support for the Cache Components model (PPR plus the use-cache directive), and improved streaming Server Component behavior that benefits the message list rendering in this tutorial. Functionally the chatbot will run on App Router 15 with minor adjustments, but the dev-loop is noticeably tighter on 16. If you are starting fresh in mid-2026, start on 16 — the migration cost from 15 is small but non-zero, and you will spend the difference in the first afternoon of incremental builds saved.