Building an AI Slack bot in 2026 is a production engineering exercise, not a weekend hack. The Slack platform is mature, the constraints are sharp, and the gap between a demo bot and one your team actually relies on comes down to four design choices: how you acknowledge events, how you stream model output, how you scope context, and how you handle retries.

This tutorial walks the complete production path. You will start with an app manifest, wire OAuth and event subscriptions, set up Slack Bolt JS on a Vercel Function, implement the lazy-listener pattern that defeats Slack's three-second timeout, stream Anthropic Claude responses into threads via chat.update, render rich Block Kit messages, and ship idempotent retry handling. The architecture is the same one we use for client engagements — opinionated, but production-honest.

What this guide covers, in order: manifest and scopes, install flow, the Bolt receiver, the ack-fast / work-slow split, streaming mechanics, Block Kit composition, per-thread session context, and the Vercel deploy with signature verification and idempotency. Every section is paired with the trade-off you should know before you ship it.

Key takeaways

01
Acknowledge in under 3 seconds. Always.Slack's 3-second timeout is non-negotiable. The lazy-listener pattern — return 200 immediately, defer the model call via Vercel's waitUntil — is the only durable answer for AI workloads where the first token can take seconds.
02
Streaming to Slack is post-then-update, not real streaming.Slack has no native streaming. You chat.postMessage once, then chat.update on a batched cadence (we land on 600 to 800ms). Faster looks janky, slower feels lifeless, and the rate-limit ceiling caps the upper bound either way.
03
thread_ts is your session id — use it.Per-thread memory is the right unit of context. Per-channel is too broad, per-user is too narrow. Keying conversation history off thread_ts in Redis or Postgres gives every reply the right context without leaking across unrelated discussions.
04
Idempotency on retries is mandatory.Slack retries any event that doesn't get a 200 within three seconds, and again on internal failures. Without an idempotency key (event_id is the natural choice) one slow response becomes three duplicate replies.
05
Block Kit lifts perceived intelligence.Tool-result cards, action buttons, expandable sections — plain markdown reads like a script; Block Kit reads like a product. The composition cost is small and the UX delta is large, especially for bots that do anything beyond echo Q&A.

01 — ManifestApp manifest — one YAML file for the whole config.

Every modern Slack app starts with a manifest. It is a single YAML document that declares display info, OAuth scopes, event subscriptions, the bot user, and every interactive surface the app will expose. Authoring the manifest first — before any code — is the right discipline because Slack uses the manifest to provision the app, and a tight manifest means a small, auditable permission surface.

The four moving parts you will tune the most are display_information (the name, description, and background color visible to admins), oauth_config.scopes.bot (the capabilities the bot token will hold), settings.event_subscriptions (the events Slack will POST to your endpoint), and features.bot_user (the bot identity). The minimum viable AI bot needs four scopes: chat:write to send messages, channels:history to read public channel messages, app_mentions:read to receive mentions, and im:history to read direct messages.

Paste this into App Manifest at api.slack.com/apps when creating the app:

display_information:
  name: Studio Assistant
  description: AI assistant for our team — mention me anywhere.
  background_color: "#0a0a0b"
features:
  bot_user:
    display_name: Studio
    always_online: true
oauth_config:
  scopes:
    bot:
      - chat:write
      - chat:write.public
      - channels:history
      - groups:history
      - im:history
      - im:read
      - im:write
      - mpim:history
      - app_mentions:read
      - users:read
settings:
  event_subscriptions:
    request_url: https://your-app.vercel.app/api/slack/events
    bot_events:
      - app_mention
      - message.im
      - message.channels
  interactivity:
    is_enabled: true
    request_url: https://your-app.vercel.app/api/slack/interactive
  org_deploy_enabled: false
  socket_mode_enabled: false
  token_rotation_enabled: false

Two notes worth pinning. First, chat:write.public lets the bot post in public channels it has not been invited to — useful for app-mention flows, but a real permission expansion; omit if you want strict allowlisting. Second, you can subscribe to either message.channels (every message in joined channels) or just app_mention (only direct mentions). Start with mentions; broaden later only if your workflow genuinely needs it. Every additional event is more cost and more attack-surface.

Manifest workflow

Keep the manifest in version control under slack-manifest.yaml. When you change scopes — and you will — copy the updated manifest into Slack's App Manifest tab, then reinstall the app to refresh the bot token. Without reinstalling, the new scopes are not granted on the existing token.

02 — OAuthInstall flow, workspace scopes, token rotation.

Slack authentication uses standard OAuth 2.0 — the user grants permissions, Slack redirects to your callback with a code, and you exchange that code at oauth.v2.access for two possible tokens. The bot token (xoxb-…) is scoped to the bot user and is what your handler uses for almost every API call. The user token (xoxp-…) acts as the installing user and is only required for specific actions on their behalf — most AI bots never need it.

The storage decision is the one that bites teams later. A bot deployed for one team can hard-code a single bot token in an environment variable. A bot distributed to many workspaces needs a keyed lookup — typically a row per workspace in your database, keyed on team.id, holding the bot token, the bot user id, and (if you opted in) the refresh token plus the token expiry. Get this shape right on day one; migrating from one model to the other later is painful.

Single workspace

Env-var token

One workspace, one bot token in SLACK_BOT_TOKEN. No database for auth state. Token rotation off. Right for internal-only deployments and the first iteration of any project — you can move later without breaking anything because the manifest stays identical.

Internal-only bots

Many workspaces

Keyed store on team.id

Distribution-ready storage — table holds team_id, bot_token, bot_user_id, refresh_token, expires_at. Look up on every event using the team_id in the payload. Bolt's InstallationStore interface formalizes this; back it with Postgres or Redis.

Marketplace distribution

Token rotation is opt-in but recommended for distributed apps. With rotation enabled, bot tokens expire every twelve hours and you refresh them via oauth.v2.access using the stored refresh token. The payoff is a much smaller blast radius if a token leaks; the cost is a strict refresh loop you cannot skip. For single-workspace deployments the simpler non-rotating mode is fine — the token still has narrow scopes and rotates manually whenever you reinstall.

Whichever path you pick, never log the token, never put it in a client-side bundle, and verify on every Slack request that the originating workspace matches a row in your store. The cleanest implementation puts the OAuth callback on its own route — /api/slack/oauth/callback — separate from the event handler.

03 — Bolt SetupSlack Bolt JS — receiver, listener, ack.

Slack Bolt JS is the official Slack SDK for Node.js. It hides the awkward bits — request signing, payload parsing, event routing, retry interpretation — and gives you a tight surface: app.event(...), app.command(...), app.action(...), app.shortcut(...). On a Vercel Function, the right integration uses Bolt's receiver abstraction. The AwsLambdaReceiver is the closest match because both Lambda and a Vercel serverless function expose a request/response handler rather than a long-running Express app.

The handler boilerplate is small. The critical line is the one that calls await ack()— that is what tells Slack you've received the event and prevents a retry. Anything you do before ack runs against the three-second clock. Anything after happens on borrowed time and only completes if the function execution lives long enough.

// app/api/slack/events/route.ts
import { App, AwsLambdaReceiver } from "@slack/bolt";
import { waitUntil } from "@vercel/functions";

const receiver = new AwsLambdaReceiver({
  signingSecret: process.env.SLACK_SIGNING_SECRET!,
});

const app = new App({
  token: process.env.SLACK_BOT_TOKEN,
  receiver,
});

app.event("app_mention", async ({ event, client, ack }) => {
  await ack();                          // 1. ack within 3s
  waitUntil(handleMention(event, client)); // 2. defer the slow work
});

async function handleMention(event: any, client: any) {
  // Claude call, chat.postMessage + chat.update streaming, etc.
}

export async function POST(req: Request) {
  const handler = await receiver.start();
  const body = await req.text();
  const headers = Object.fromEntries(req.headers);
  const result = await handler(
    { body, headers, isBase64Encoded: false, httpMethod: "POST" } as any,
    {} as any
  );
  return new Response(result.body, {
    status: result.statusCode,
    headers: result.headers as any,
  });
}

The ack-timeout failure

The most common production bug we see in client Slack bots is no ack before model call. The handler awaits the Anthropic SDK, the response takes 4 seconds, Slack times out at 3 seconds, retries, and your bot replies twice. The symptom looks like a duplication bug; the cause is the ack contract. Always ack first, defer second.

04 — Lazy ListenerAck fast, work slow.

The lazy-listener pattern is the Slack-recommended shape for any handler that does meaningful work after ack. It is three lines of code conceptually: ack, defer, work. On Vercel, the defer mechanism is waitUntil from @vercel/functions, which extends the function's execution lifetime past the response so background work can complete without blocking the HTTP return.

The reason this matters for AI bots specifically is the latency shape. Anthropic Claude's time-to-first-token on streaming requests typically sits in the 800–2,000 ms range. A model that occasionally takes longer — tool use, long-context, busy region — will blow the 3-second budget. Ack-first guarantees Slack stops counting before the model even returns its first chunk.

For higher-volume bots, the next step beyond waitUntil is queue-based dispatch. The ack handler enqueues the event into Inngest, QStash, or a Vercel Queue; a separate consumer processes the job at its own pace. The trade-off is one more piece of infrastructure for unbounded reliability — if the function dies mid-stream, the queue retries the job without a duplicate ack firing. For most use-cases waitUntil is enough; switch to a queue once your traffic exceeds a few thousand messages a day or your model calls regularly exceed 60 seconds.

Minimal

Direct ack + handler

ack() → await handler()

Synchronous handler completes before HTTP return. Only viable if the entire work — model call, message posts, DB writes — fits inside 3 seconds. Not realistic for streaming AI.

Echo bots only

Default

Lazy listener · waitUntil

ack() → waitUntil(work)

Function returns 200 instantly, the slow work continues on extended execution time. Right shape for the vast majority of AI bots — beats the 3-second timeout cleanly without extra infrastructure.

Production default

High volume

Queue dispatch

ack() → enqueue(job)

Ack pushes a job into Inngest / QStash / Vercel Queue; a separate consumer handles the model call and message posts. Unbounded reliability, retries handled by the queue, costs one extra hop.

Thousands/day

"Ack-first is not a performance trick — it is a contract. Slack assumes you have heard them within three seconds; everything else is your problem to solve."— A Slack platform engineer, paraphrased from the Bolt docs

05 — Streaming to Slackchat.postMessage + chat.update — the bot's streaming hack.

Slack does not have a native streaming API. You cannot open a persistent connection from your function to a Slack message and push token deltas. The workaround the industry has converged on is post-then-update: send the initial message with chat.postMessage, capture the returned ts (timestamp, which doubles as message id), and then call chat.update on a batched cadence as new tokens arrive. The user perceives streaming; the API call pattern is actually a series of edits.

The cadence is the design knob. Update too fast and Slack rate limits will throttle you (the rate limit for chat.update sits at about one call per second per channel on Tier 3, with bursts permitted). Update too slowly and the perceived intelligence drops — text appears in lumpy chunks that feel less alive than a polished UI. The sweet spot in our production bots sits at 600–800 ms per update, which streams faster than reading speed without colliding with the rate limiter.

Update cadence vs perceived streaming smoothness

Source: Digital Applied internal benchmarks across client Slack deployments

200 ms cadenceHits rate-limit, throttled silently

5 / s

400 ms cadenceRight at the rate-limit ceiling — fragile

2.5 / s

700 ms cadenceProduction sweet spot — smooth & safe

1.4 / s

1,500 ms cadenceFeels lumpy; users notice the pause

0.7 / s

3,000 ms cadenceReads as 'thinking' but loses streaming feel

0.3 / s

The implementation shape is: buffer Anthropic streaming chunks locally, flush via chat.update on a timer or when the buffer exceeds N characters, and call a final update on stream completion with the full Block Kit payload (the prior updates can be plain text; the last one upgrades the message to rich blocks). Track the time of the last update so you can defer a flush that would otherwise fall inside the rate-limit window.

One more practical note: chat.update takes either text or a blocksarray. During streaming send plain text — it's cheaper to construct and reads naturally as a progressive reveal. Only on the final update do you replace the message with the polished Block Kit version. We cover the Block Kit composition in the next section.

06 — Block KitMarkdown out, rich blocks in.

Slack's default rendering — markdown with Slack's dialect of asterisks, backticks, and angle-bracket links — is fine for casual text. The moment your bot does anything structured — surfaces a tool result, asks the user to confirm, renders a row of options — plain markdown reads like a script. Block Kitis Slack's structured-message framework: a JSON array of blocks describing sections, dividers, fields, buttons, images, and context lines.

The right authoring workflow is to compose blocks in the Block Kit Builder at api.slack.com/block-kit -builder, copy the JSON, and translate it into a typed factory function in your code. The builder gives you a live preview and the ability to share design iterations with non-engineers. The factory function in code is where you plug in dynamic values from your Claude response — tool name, tool result summary, action buttons, expandable detail sections.

Plain text

Streaming reveal

text only · no blocks

What you post during the chat.update streaming loop. Renders Slack's markdown dialect — *bold*, _italic_, `code`. Cheap to build, fast to update, perfect for the in-flight token stream.

During stream

Tool result

Block Kit card

section + fields + context

Final message replaces the streaming text with a structured card — title, key/value fields, source attribution, optional thumbnail. Used for tool-result rendering where the bot called a function and is presenting structured output.

Tool calls

Interactive

Action buttons & modals

actions block + button elements

Buttons trigger block_actions events posted to your /interactive endpoint. Use sparingly — they only earn their keep when the user genuinely needs to pick a path. Modals (views.open) handle multi-field input gracefully when text input doesn't.

User chooses

A few composition rules that consistently improve perceived quality. Keep each section block under three short lines — Block Kit collapses long sections awkwardly. Use context blocks for source attribution; the smaller type signals provenance without competing with primary content. Prefer actions blocks with two buttons over modals for binary choices; reserve modals for genuine multi-field input. Use the image block sparingly — it loads externally and slows the message render.

For multi-step tool calls, the right pattern is one section block per step, dividers between them, and a final context block summarizing total tokens, duration, and any external citations. That layout reads as a transparent agent log rather than a wall of model output — important for any bot that does multi-step work on behalf of the user.

07 — ThreadsPer-thread context, per-channel scoping.

Slack's thread model maps unusually cleanly onto LLM conversation memory. Every thread is identified by thread_ts — the timestamp of the parent message — and every reply in that thread carries the same thread_ts on its payload. Use thread_ts as your session id: every thread is one conversation, with its own short-term context window, independent of every other thread in the channel.

Per-thread is the right granularity. Per-channel mixes unrelated discussions into a single context — your bot ends up confusing yesterday's deploy thread with today's marketing campaign. Per-user is too narrow — the same user has many simultaneous conversations across threads, and treating them as one history degrades quality. Thread-keyed memory is what your users actually expect from the bot.

// Per-thread session in Redis with a 7-day TTL.
import { Redis } from "@upstash/redis";
const redis = Redis.fromEnv();

type Turn = { role: "user" | "assistant"; content: string };

async function loadThread(threadTs: string): Promise<Turn[]> {
  return (await redis.get<Turn[]>('thread:' + threadTs)) ?? [];
}

async function appendTurn(threadTs: string, turn: Turn) {
  const history = await loadThread(threadTs);
  history.push(turn);
  // Cap context at 40 turns; let downstream summarize if needed.
  const trimmed = history.slice(-40);
  await redis.set('thread:' + threadTs, trimmed, { ex: 60 * 60 * 24 * 7 });
}

// Channel allowlist — refuse to engage outside approved spaces.
const ALLOWED_CHANNELS = new Set(process.env.ALLOWED_CHANNELS?.split(",") ?? []);

function inScope(channelId: string) {
  return ALLOWED_CHANNELS.size === 0 || ALLOWED_CHANNELS.has(channelId);
}

Channel allowlisting is the other half of the scoping story. Most internal bots should not respond in every channel they happen to be invited to. Keep an ALLOWED_CHANNELS environment variable (or a database table for richer rules), check it before invoking the model, and return early if the channel is out of scope. The cost is one map lookup; the benefit is a hard stop on accidental responses in channels where the bot has not been welcomed.

For DMs and group DMs the rule inverts — a user who messages the bot directly is opted in by definition. Skip the channel allowlist for message.im and message.mpim events. Mentions in public channels (app_mention) should always pass the allowlist check, even though Slack will deliver the event regardless.

08 — DeployVercel Function, verification, retry idempotency.

Deployment on Vercel is a regular Next.js route handler at app/api/slack/events/route.ts. The function runs on the Node.js runtime (Bolt JS uses Node-only APIs; edge is not an option here). Set three environment variables in the Vercel project — SLACK_BOT_TOKEN, SLACK_SIGNING_SECRET, and ANTHROPIC_API_KEY — and point the manifest's request_url at the production deployment URL.

Signature verification is non-negotiable. Slack signs every event POST with HMAC SHA-256 using your signing secret and includes the signature in the X-Slack-Signature header and the timestamp in X-Slack-Request-Timestamp. Bolt's receiver handles this verification automatically — that is one of the larger reasons to use Bolt rather than hand-rolling the handler. Reject requests where the signature does not match (a forged event from any random source would otherwise hit your handler).

Signing secret

HMACSHA-256

Signature verification

Slack signs every request with your signing secret. Bolt's AwsLambdaReceiver verifies automatically. Reject mismatches — never trust a payload that has not passed signature check.

Mandatory

Retry budget

3tries

X-Slack-Retry-Num

On any non-200 within 3s, Slack retries up to 3 times with backoff. Inspect X-Slack-Retry-Num and X-Slack-Retry-Reason to detect retries and short-circuit if you already handled the event.

Header

Idempotency

event_id

Dedupe key in Redis

Use the event_id from the payload as the idempotency key. SETNX with a 5-minute TTL. If the key already exists, ack and return without doing the work — Slack already got a response from an earlier attempt.

Required

Retry idempotency is the single most-skipped production concern in Slack bot tutorials. Slack will retry an event up to three times when it does not get a 200 within three seconds, and the retry can land on a different replica of your function. Without an idempotency key, a single slow first attempt produces three duplicate responses. The fix is small: at the start of every event handler, attempt to SETNX slack:event:<event_id> in Redis with a short TTL. If the set fails, the event has already been processed — ack, return, do nothing.

The X-Slack-Retry-Num header is the secondary signal. Slack sets it on every retry and includes X-Slack-Retry-Reason (most commonly http_timeout). Log retries on the way in so you can correlate them with slow original requests. If you see retry rates above one percent, your original ack path is too slow and the lazy-listener pattern needs auditing.

Once these pieces are in place, the deploy itself is one push. The bot will appear in Slack as soon as Vercel reports the deployment ready. Send the bot a test message, confirm the ack-to-first-token latency is under two seconds, then walk through the launch checklist: we run this same checklist on every client engagement because the gap between a working demo and a bot the team trusts is mostly these operational details.

If you want to extend the bot beyond what this guide covers, the adjacent patterns we have written up are worth a read. The same streaming and tool-use mechanics applied to a web chat surface sit in our Next.js 16 AI chatbot tutorial, which uses the Vercel AI SDK rather than the Anthropic SDK directly. To expose your own tools to Claude in a portable shape, our MCP server tutorial walks through the protocol from scratch. For agentic coding workflows that the bot can offload to, see the Claude Code custom subagent guide. And for the broader stack picture — where Slack-native AI fits into a marketing and ops architecture — the agent-first marketing stack audit is the strategic counterpart to this tactical guide.

Conclusion

Slack is where work conversations happen — an AI bot in Slack is closer to the user than any web UI.

The full Slack-bot recipe lands on a small number of design decisions repeated across every section. Acknowledge in under three seconds and defer the slow work. Stream via post-then-update on a 600 to 800 ms cadence. Scope context to thread_ts and the channel allowlist. Verify signatures, dedupe retries by event_id, render in Block Kit. The whole production bot fits in about 400 lines of TypeScript once each of those is settled — most of the engineering effort is the testing and observability around them, not the surface code itself.

The broader pattern is portable. Microsoft Teams has the same shape — bot framework, event-driven webhooks, adaptive cards instead of Block Kit. Discord has its own variant with slash commands and interaction tokens. Telegram bots run on long-poll or webhooks with a simpler message model. The four design decisions — ack contract, streaming shape, session-key, retry idempotency — are the durable parts. Every chat platform asks the same questions in different vocabulary.

From here, the natural extensions are tool augmentation — letting the bot call your internal APIs, your CRM, your ticketing system — RAG grounded in your documents and your team's historical Slack threads, and multi-workspace distribution if you're building a product rather than an internal tool. Each of those is its own engineering exercise, but they all sit on top of the foundation in this guide. Get the four design decisions right first; everything else stacks cleanly on top.

Build an AI Slack Bot with Event Subscriptions: 2026