Multi-agent orchestration has matured past theory in 2026 — five patterns now dominate production systems, each with a different control-flow topology, coordination overhead, and failure mode. The choice between fan-out, pipeline, debate, supervisor, and swarm is not a style preference; it determines your cost structure, your failure surface, and which framework will actually support what you need.

Most published "multi-agent patterns" pieces list three or four, typically collapsing fan-out into pipeline and debate into supervisor. That conflation obscures the operational differences that matter in production. Microsoft's Copilot Council is a debate pattern running at ~2.5× single-model cost — not a supervisor pattern. Kimi K2.6's 300-agent swarms are a genuinely different topology from a supervisor with three subagents, even though both involve a coordinator.

This guide covers all five patterns with the specificity that production deployments require: code sketches using current APIs (the renamed @anthropic-ai/claude-agent-sdk, LangGraph v1.0's create_agent, OpenAI Agents SDK handoffs), a 9-framework compatibility matrix, the failure modes nobody publishes, and a decision tree for picking by use case. Internal links to our Claude Agent SDK production patterns guide and agentic orchestration frameworks comparison cover the framework-specific depth.

Key takeaways

01
Five distinct patterns, not three — each with its own control-flow topology.Fan-out (parallel scatter-gather), pipeline (sequential chain), debate (multi-perspective critique), supervisor (hierarchical delegation), and swarm (dynamic peer agents) are operationally distinct. Collapsing them obscures cost, failure mode, and framework-support differences that matter in production.
02
Supervisor is the 2026 production default.Claude Code subagents (one level deep), LangGraph Supervisor, and OpenAI Agents SDK handoffs all converge on the supervisor topology. For most cross-domain agent tasks — coder + researcher + reviewer — this is the right starting point before considering fan-out extensions or swarm scale.
03
Debate costs ~2.5× single-model and that cost is real.Microsoft Copilot Council runs GPT-5.4 and Claude in parallel, then uses a judge model to arbitrate — adding approximately 2.5× the cost of a single-model call. The two-stage Critique variant adds ~20%. Use debate when the stakes justify the premium, not as a default quality booster.
04
Swarm is the frontier pattern — Kimi K2.6 scales to 300 agents.Kimi K2.5 (released) coordinates up to 100 specialized sub-agents executing 1,500 tool calls in parallel via Parallel-Agent Reinforcement Learning. K2.6 (Apr 20, 2026) advances this to 300-agent swarms with 12-hour autonomous coding sessions. No other framework ships swarm as a first-class native primitive at this scale.
05
LangGraph spans all five patterns natively; Claude Agent SDK excels at supervisor and fan-out.The 9-framework matrix reveals that LangGraph v1.0 is the most broadly capable — native or patternable across all five. The Claude Agent SDK shines on supervisor and fan-out (subagents, one level deep) but requires custom code for debate and swarm. Pick framework by dominant pattern, not by ecosystem familiarity alone.

01 — The FrameworkWhy five patterns? Most pieces list three.

The canonical "multi-agent patterns" literature typically presents three: pipeline, supervisor, and swarm. Fan-out is usually folded into pipeline ("parallel pipeline") and debate is folded into supervisor ("multi-supervisor"). That conflation is expedient for a blog post but misleading for a production decision.

Fan-out and pipeline have the same coordination overhead (low) but completely different failure modes. A pipeline failure is a cascade — a bad mid-stage contaminates everything after it. A fan-out failure is partial — one branch fails, and you must decide how to aggregate the remaining results. Those are different engineering problems.

Debate and supervisor are equally confusable. Both involve a "judge" or coordinator model. But a supervisor delegates non-overlapping tasks to specialist sub-agents and synthesizes their independent results. A debate system sends the same question to multiple agents, waits for disagreement, and then adjudicates between conflicting answers. The cost structure is completely different — debate is inherently at least 2× the single-model cost before you add the judge; supervisor cost scales with the number of distinct subtasks, not the number of perspectives.

Naming five patterns is not pedantry. It is the difference between deploying a debug-ready architecture and inheriting someone else's imprecise metaphor when things break in production.

API names matter

Code sketches in this post use current APIs: @anthropic-ai/claude-agent-sdk (renamed Sept 29, 2025 from @anthropic-ai/claude-code), LangGraph v1.0's create_agent (not the legacy create_react_agent), and OpenAI Agents SDK handoff primitives (successor to archived OpenAI Swarm). Using the old package names in production will hit deprecation warnings before end of 2026.

02 — Pattern 1Fan-Out: parallel branches, aggregated results.

Control flow. A coordinator dispatches the same task — or N specialized subtasks — to multiple agents simultaneously, then aggregates the results when all branches return. The defining characteristic is parallel execution: all branches run concurrently, so wall-clock latency is bounded by the slowest branch, not the sum of all branches.

Best-fit use cases. Parallel research across multiple sources (each agent queries a different database or URL), parallel code review across multiple files (each agent reviews one file independently), and parallel document summarization (each agent processes one chunk of a long document). Any task that can be cleanly partitioned into independent sub-tasks that do not need to see each other's intermediate results is a fan-out candidate.

Claude Agent SDK — subagent fan-out sketch:

# Python — claude-agent-sdk (renamed Sept 29, 2025)
import asyncio
from claude_agent_sdk import ClaudeAgent, ClaudeAgentOptions

async def fan_out(sources: list[str]) -> list[str]:
    tasks = [
        ClaudeAgent.run(
            prompt=f"Summarize key AI developments from: {src}",
            options=ClaudeAgentOptions(max_turns=5),
        )
        for src in sources
    ]
    results = await asyncio.gather(*tasks)
    return [r.last_message for r in results]

LangGraph parallel branches sketch:

# Python — LangGraph v1.0
from langgraph.graph import StateGraph
from langchain.agents import create_agent  # LangChain 1.0 API

def build_fan_out_graph(sources: list[str]):
    graph = StateGraph(dict)
    for i, src in enumerate(sources):
        graph.add_node(f"branch_{i}", create_agent(source=src))
    graph.add_node("aggregator", lambda results: {"summary": "
".join(results)})
    for i in range(len(sources)):
        graph.add_edge(f"branch_{i}", "aggregator")
    return graph.compile()

Failure mode. Partial-failure aggregation — if one branch errors, you must decide whether to fail the whole request, return partial results, or retry only the failed branch. Most implementations fail to specify this at design time, leading to silent partial results that look like complete answers.

Coordination overhead: Low. Branches are independent. The only synchronization point is the aggregator waiting for all branches to return.

03 — Pattern 2Pipeline: sequential stages, each feeding the next.

Control flow. Each agent's output becomes the next agent's input. Execution is strictly sequential. This is the simplest orchestration topology — it is essentially a chain — and also the most widely misapplied. Teams reach for pipeline when tasks are partially parallel; the result is unnecessary latency.

Best-fit use cases. Research → draft → critique → revise → fact-check content workflows. ETL pipelines where each transform requires the prior stage's output. Legal or compliance workflows where each stage must see and respond to the prior stage's conclusions. For worked reference architectures, see our agency workflows as multi-agent graphs mapping research, brief, draft, audit, review, and deploy stages.

LangChain 1.0 sequential chain sketch:

# Python — LangChain 1.0 (create_agent replaces create_react_agent)
from langchain.agents import create_agent

researcher = create_agent(role="researcher", model="claude-sonnet-4-6")
drafter    = create_agent(role="drafter",   model="claude-sonnet-4-6")
critic     = create_agent(role="critic",    model="claude-sonnet-4-6")

async def pipeline(topic: str) -> str:
    research = await researcher.ainvoke({"input": f"Research: {topic}"})
    draft    = await drafter.ainvoke({"input": research["output"]})
    critique = await critic.ainvoke({"input": draft["output"]})
    return critique["output"]

CrewAI sequential Process sketch:

# Python — CrewAI
from crewai import Agent, Task, Crew, Process

researcher = Agent(role="Researcher", goal="Gather key facts", ...)
drafter    = Agent(role="Drafter",    goal="Produce first draft", ...)
critic     = Agent(role="Critic",     goal="Identify weaknesses", ...)

crew = Crew(
    agents=[researcher, drafter, critic],
    tasks=[research_task, draft_task, critique_task],
    process=Process.sequential,
)
result = crew.kickoff()

Failure mode. Cascade failure — a bad mid-stage output poisons every downstream stage. Unlike fan-out, there is no natural aggregation checkpoint. A pipeline with no per-stage validation will happily produce a polished final answer built on a hallucinated first-stage result. Build per-stage output validation into the pipeline harness, not as an afterthought.

Coordination overhead: Low. Linear by definition. The cost of coordination is bounded by stage count.

04 — Pattern 3Debate: multi-perspective critique, ~2.5× single-model cost.

Control flow. Two or more agents are given the same prompt, produce independent answers, and then either critique each other's outputs (the debate variant) or are evaluated by a separate judge model that selects the strongest answer and summarizes divergences (the Council variant). The Microsoft Copilot Council is the canonical 2026 production instantiation — it runs GPT-5.4 and Claude Sonnet 4.6 in parallel, then uses a judge model to summarize agreements, highlight divergences, and surface unique insights from each perspective.

A lighter variant — Copilot Critique — is a two-stage pipeline: one model drafts, a second reviews. According to our analysis of published Microsoft Copilot Cowork enterprise agent workflows, Critique reportedly adds approximately 20% cost over a single model call. The full Council debate pattern reportedly runs at ~2.5× single-model cost.

LangGraph debate sketch:

# Python — LangGraph v1.0 debate pattern
from langgraph.graph import StateGraph
from langchain.agents import create_agent

agent_a = create_agent(model="claude-sonnet-4-6")
agent_b = create_agent(model="gpt-5.4")       # use the released base model, not a fabricated coding variant
judge   = create_agent(model="claude-opus-4-7")

def build_debate_graph():
    graph = StateGraph(dict)
    graph.add_node("agent_a",  agent_a)
    graph.add_node("agent_b",  agent_b)
    graph.add_node("judge",    judge)
    graph.add_edge("agent_a",  "judge")
    graph.add_edge("agent_b",  "judge")
    return graph.compile()

Failure mode. Judge-model bias — if the judge consistently favors the style or confidence of one model over the accuracy of the other, the debate produces higher-confidence wrong answers. Arbitration loops that do not converge are the second failure mode: when both agents disagree and the judge cannot resolve the tie, some implementations loop indefinitely. Set a hard maximum round count in the harness.

Coordination overhead: High. At minimum, 2× single-model cost before adding the judge model. Budget for this explicitly — debate is not a free quality upgrade.

Microsoft’s multi-model approach validates what many enterprise AI teams have concluded: no single model excels at every task. The future of enterprise AI is model orchestration, not model selection.Digital Applied synthesis, May 17, 2026

05 — Pattern 4Supervisor: the 2026 default for production agents.

Control flow. A top-level supervisor agent receives the user request, decomposes it into subtasks, delegates each subtask to a specialized sub-agent, and aggregates the results into a final answer. Sub-agents work on non-overlapping subtasks — they do not see each other's outputs during execution. This is the defining difference from debate.

The supervisor pattern has the most native framework support in 2026. Claude Code subagents (stored in .claude/agents/<name>.md) are the Claude Agent SDK's native supervisor primitive — one level deep, no nesting. LangGraph ships a first-class Supervisor pattern. OpenAI Agents SDK handoffs are a supervisor primitive by design.

Claude Agent SDK supervisor sketch:

# .claude/agents/researcher.md
---
name: researcher
description: "Searches and summarizes research on a given topic"
tools: [WebSearch, WebFetch]
---

# .claude/agents/coder.md
---
name: coder
description: "Writes and reviews production Python or TypeScript code"
tools: [Read, Write, Bash]
---

# Supervisor prompt (in the main agent or CLAUDE.md)
# "Use the researcher subagent to gather context, then
#  the coder subagent to implement the solution."
# Subagents are ONE level deep — they cannot spawn subagents.

OpenAI Agents SDK handoffs sketch:

# Python — openai-agents (successor to archived OpenAI Swarm)
from agents import Agent, handoff, Runner

researcher = Agent(name="Researcher", instructions="Research the topic thoroughly.")
coder      = Agent(name="Coder",      instructions="Write production-ready code.")

supervisor = Agent(
    name="Supervisor",
    instructions="Delegate research to Researcher, coding to Coder.",
    handoffs=[handoff(researcher), handoff(coder)],
)

result = await Runner.run(supervisor, "Build a web scraper for product prices.")

Critical constraint (Claude Agent SDK). Claude Code subagents are one level deep — they cannot spawn subagents. This is a hard architectural boundary in the current implementation (fact-pack §1.3). Do not design supervisor topologies that assume recursive delegation in this environment.

Failure mode. Over-delegation — the supervisor sends a subtask too narrow for the sub-agent to complete meaningfully, the sub-agent returns a partial answer, and the supervisor tries to re-delegate rather than synthesize. Set iteration ceilings (the Claude Agent SDK defaults to approximately 25 turns per sub-agent; respect that boundary in your harness). See our Claude Agent SDK production patterns guide for the full token-budget discipline.

Coordination overhead: Medium. One round-trip per sub-agent. Cost scales with the number of distinct subtasks, not the number of perspectives.

06 — Pattern 5Swarm: dynamic spawning, emergent coordination.

Control flow. An open-ended set of peer agents is spawned dynamically based on workload. Agents coordinate through shared memory or a message bus rather than through a fixed supervisor. The population size is not fixed at design time — the swarm grows and shrinks as the task demands.

The swarm pattern is the frontier of multi-agent systems in 2026. The canonical open implementation is Kimi K2.5 Agent Swarm — trained via Parallel-Agent Reinforcement Learning (PARL) to coordinate up to 100 specialized sub-agents executing 1,500 tool calls in parallel without predefined workflows. K2.6 (released April 20, 2026) extends this to 300-agent swarms and 12-hour autonomous coding sessions.

Important provenance note. OpenAI Swarm — the experimental Python library that popularized the term in late 2024 — was archived in favor of the OpenAI Agents SDK. References to "OpenAI Swarm" in production guides are describing archived code. The Agents SDK's handoffs are the successor multi-agent primitive, and they implement the supervisor pattern, not a peer swarm.

For teams building swarm-adjacent systems without Kimi's native PARL infrastructure, the closest approximation is the Claude Agent SDK combined with MCP servers as the shared tool bus. AutoGen v0.4+'s event-driven actor model is the most mature swarm approximation in a traditional framework.

Kimi K2.5 (released)

Max sub-agents

100+

Kimi K2.5 Agent Swarm coordinates up to 100 specialized sub-agents executing 1,500 tool calls in parallel. Trained via Parallel-Agent Reinforcement Learning (PARL) — no predefined workflows.

1,500 parallel tool calls

Kimi K2.6 (Apr 20, 2026)

Agent swarm scale

300

K2.6 advances K2.5's swarm to 300-agent coordination. Supports 12-hour autonomous coding sessions. Released in Code Preview beta on Apr 13, 2026; GA April 20, 2026 — the current Moonshot flagship.

12-hour coding sessions

AutoGen v0.4+

Open-ended actor pool

AutoGen v0.4 rebuilt around an event-driven actor model. The GroupChat primitive supports open-ended message-passing between N agents with a manager or fixed turn order — the most mature swarm approximation in a traditional framework.

microsoft.github.io/autogen

07 — Compatibility Matrix9 frameworks × 5 patterns: native, patternable, or build-your-own.

The matrix below rates each framework's support for each pattern across three tiers: Native (first-class primitive, ships out of the box), Patternable (achievable with framework primitives but not the default shape), and Build-your-own (requires custom code outside the framework's model). Cells are based on current public documentation and our own agentic orchestration frameworks comparison.

LangGraph v1.0

Most broadly capable

Native: fan-out (parallel branches), pipeline (linear sub-graphs), supervisor (Supervisor pattern). Patternable: debate (multi-agent + judge node), swarm (dynamic node creation — not the default shape). Ships LTS since October 2025 with Checkpointers for time-travel debugging.

5 of 5 patterns supported

Claude Agent SDK

Supervisor + fan-out native

Native: supervisor (subagents, one level deep), fan-out (parallel subagent dispatch), pipeline (sequential tool calls + Skills). Build-your-own: debate (two subagents + supervisor as judge), swarm (requires MCP + workers). The 2026 default for Claude Code deployments.

3 native, 2 build-your-own

OpenAI Agents SDK

Handoffs = supervisor primitive

Native: supervisor (handoffs), fan-out (parallel handoffs), pipeline (sequential handoffs + tool loops). Build-your-own: debate. Patternable: swarm (OpenAI Swarm, the predecessor, was swarm-native but is archived). Agents SDK is the production successor.

3 native, 2 build-your-own

CrewAI

Role-based, sequential-first

Native: pipeline (sequential Process), supervisor (hierarchical Process with manager LLM). Patternable: fan-out (parallel tasks in Process). Build-your-own: debate (define critic role), swarm (not the framework's home pattern). ~10 min from pip install to first multi-agent flow.

2 native, 3 patternable or BYO

AutoGen v0.4+

Event-driven actor model

Native: debate (GroupChat with judge manager), supervisor (manager mode), swarm (open-ended GroupChat actor model). Patternable: fan-out (async actors), pipeline (turn-based GroupChat). v0.4 was a major rewrite — verify current API on microsoft.github.io/autogen.

3 native, 2 patternable

Pydantic AI / Vercel AI SDK / raw SDK

Simple and type-safe

Pydantic AI: pipeline native (type-safe chain), others patternable or build-your-own. Vercel AI SDK 6: pipeline native (stepCountIs + tools), fan-out patternable (parallel generateText calls). Stop conditions: stepCountIs(N) and hasToolCall('name'); token-budget gating is built via a custom predicate, not a built-in helper.

Best for simple type-safe agents

Two patterns you will almost always build yourself regardless of framework: debate and swarm. The investment is proportional to your need — debate requires two agent invocations plus a judge, which you can wire up in an afternoon. True swarm infrastructure (dynamic agent spawning, shared memory, population management) requires significant custom engineering unless you are using Kimi K2.5 or K2.6 as your model layer. See our analysis of enterprise AI agent build vs buy in 2026 for the TCO framework that governs this decision.

08 — Anti-PatternsWhen not to use each pattern — the map nobody else publishes.

Every pattern has a canonical anti-use-case. Deploying a pattern outside its fit range is more expensive and less reliable than a simpler alternative. The failure modes below are derived from production experience and the coordination-overhead analysis in the research base — not from theoretical concerns.

Fan-Out — Avoid when:

Tasks have dependencies

Anti-pattern: parallel tasks that need each other

Fan-out requires branches to be genuinely independent. If branch B needs branch A's output, you need a pipeline or a supervisor with sequential delegation — not fan-out. Teams that use fan-out for dependent tasks get race conditions and silent partial results.

Use pipeline instead

Pipeline — Avoid when:

Stages are parallelizable

Anti-pattern: unnecessary sequential bottleneck

Pipeline latency is the sum of all stages. If research, code review, and documentation can run independently, making them sequential adds latency for no quality gain. Benchmark the actual dependency graph before defaulting to sequential.

Use fan-out or supervisor instead

Debate — Avoid when:

Cost outweighs the stakes

Anti-pattern: ~2.5× cost for low-stakes output

The ~2.5× cost of the debate pattern is real and constant. For routine Q&A, content drafts, or internal summaries, the quality lift from multi-perspective debate does not justify the spend. Reserve debate for high-stakes, externally visible decisions where divergent expert perspectives are genuinely valuable.

Use single-model or pipeline instead

Supervisor — Avoid when:

Sub-tasks need to see each other

Anti-pattern: sub-agents with inter-dependencies

Supervisor delegates non-overlapping subtasks. If sub-agents need to share intermediate state or react to each other's outputs during execution, you need a different topology — a sequential pipeline for ordered sharing, a shared-memory swarm for continuous coordination. Forcing inter-agent dependency into a supervisor creates over-delegation loops.

Use pipeline or swarm instead

Swarm — Avoid when:

You have fewer than 50 parallel tasks

Anti-pattern: swarm overhead for small workloads

Swarm infrastructure — dynamic spawning, shared memory management, population lifecycle, race-condition prevention — carries significant engineering overhead. For 3-10 concurrent agents, a supervisor with fan-out branches is simpler, cheaper, and more debuggable. Swarm earns its overhead only at genuine scale: Kimi K2.5's 100+ agent model or K2.6's 300-agent coding sessions.

Use supervisor + fan-out instead

09 — Cost vs QualityCost vs quality across patterns — single-model baseline at 1.0×.

The bars below represent relative cost multipliers, not absolute figures — absolute cost depends on model choice, prompt length, and output verbosity. The single-model baseline (1.0×) is a single Claude Sonnet 4.6 call at your representative prompt length. These multipliers are derived from the coordination-overhead analysis in this guide and the Microsoft Copilot Council documentation (~2.5× debate cost, ~20% critique premium). Swarm cost is labeled as variable because it scales with the dynamic agent population, which is workload-dependent.

Quality ratings are qualitative signals — they reflect the expected output quality per use case, not a universal ranking. Pipeline quality can exceed supervisor quality for linear workflows; swarm quality can trail supervisor quality when the coordination overhead introduces thrashing.

Relative cost multipliers vs single-model baseline · indicative, not absolute

Source: Digital Applied synthesis — Copilot Council (~2.5×) and Critique (~20%) from Microsoft documentation

Single-model baselineOne model call — no orchestration overhead

1.0×

PipelineSum of stage costs — low coordination overhead

~N×

Fan-outSum of branch costs — parallel, low coordination

~N×

SupervisorSum of subagent costs + coordinator — medium overhead

~(N+1)×

Debate (Critique variant)Two models + judge — ~20% over single-model per Copilot Critique

~1.2×

Debate (Council variant)Two models in parallel + judge — ~2.5× per Copilot Council

~2.5×

SwarmVariable — agent population scales with workload; unbounded by default

Variable

Cost discipline

For every pattern beyond single-model, implement a hard cost cap in the agent harness — not a post-hoc billing alert. In the Claude Agent SDK, the default iteration ceiling is approximately 25 turns per sub-agent. In Vercel AI SDK 6, use stopWhen: stepCountIs(N) or a custom predicate built from the usage object — there is no built-in token-budget stop helper, so it must be assembled from the available primitives. In LangGraph, use Checkpointers with explicit step budgets. Swarm patterns additionally require population-size caps to prevent runaway spawn.

10 — Decision FrameworkDecision tree: pick your pattern by use case.

The matrix below maps use-case characteristics to pattern recommendations. The primary axes are: whether tasks can run in parallel (fan-out vs pipeline), whether the same prompt goes to multiple agents (debate vs supervisor), and whether agent population is fixed or dynamic (supervisor vs swarm). When in doubt, start with supervisor — it is the pattern with the most native framework support and the best-understood failure mode.

For teams evaluating which framework to adopt for a given pattern, we recommend the Q3 2026 agentic AI quarterly outlook as the broader context and our AI transformation service for hands-on architecture support.

Tasks: independent + parallel

Fan-out is the right pick

Tasks can be cleanly partitioned, do not share intermediate state, and can run concurrently. Parallel research across N sources, parallel code review across N files, parallel document summarization. Use LangGraph parallel branches or Claude Agent SDK subagent fan-out. Aggregate with a coordinator that handles partial-failure explicitly.

Fan-out

Tasks: sequential, each feeds next

Pipeline is the right pick

Each stage requires the prior stage's output. Research → draft → critique → revise. ETL-style data workflows. Legal or compliance workflows. Use LangChain 1.0 sequential chains, CrewAI sequential Process, or LangGraph linear sub-graphs. Add per-stage output validation — cascade failure is the failure mode.

Pipeline

High-stakes: multi-perspective needed

Debate is the right pick

The same question benefits from genuinely different model perspectives. Externally visible strategic decisions, research outputs with real-world consequences, or assessments where model diversity catches errors that homogeneous ensembles miss. Budget ~2.5× single-model cost for the Council variant. Build-your-own on LangGraph or use Copilot Council if on Microsoft 365.

Debate

Cross-domain: specialist delegation

Supervisor is the right pick — start here

Different subtasks require different specialist capabilities: coder + researcher + reviewer, or content + fact-checker + editor. The 2026 default. Native in Claude Code subagents, LangGraph Supervisor, and OpenAI Agents SDK handoffs. Remember the one-level-deep limit in Claude Agent SDK — subagents cannot spawn subagents.

Supervisor

50+ parallel independent tasks

Swarm if at genuine scale

The task space is large enough to justify dynamic agent spawning overhead: long autonomous coding sessions, large-scale exploratory research, or workloads that grow unpredictably. Use Kimi K2.5 (100 agents) or K2.6 (300 agents) if native swarm is required. Build with AutoGen v0.4+ actor model otherwise. Implement population-size caps and shared-memory race-condition safeguards.

Swarm

Pattern Selection

Pick supervisor by default. Add swarm when you genuinely have 50+ parallel tasks.

The five-pattern framework is more honest than the three-pattern framework most blogs publish because the fourth pattern — supervisor — is what production looks like in May 2026, and the fifth — swarm — is where the frontier is moving. Collapsing them into a generic "multi-agent" category obscures the cost and failure-mode differences that determine whether an orchestration system survives its first production incident.

The practical recommendation is straightforward: start with supervisor. It has the widest native framework support (Claude Agent SDK, LangGraph, OpenAI Agents SDK, CrewAI hierarchical Process), the best-understood failure mode (over-delegation, bounded by iteration ceilings), and the most production references to learn from. Add fan-out branches when specific subtasks are genuinely independent. Use debate when the stakes justify ~2.5× cost and you need multi-perspective validation. Reach for swarm only when the task population genuinely exceeds 50 concurrent agents and you have the infrastructure to manage it — or when you are running on Kimi K2.6 natively.

The broader signal from 2026's production patterns is that orchestration sophistication should follow workload complexity, not precede it. The teams that deploy swarm-style systems on tasks that a three-subagent supervisor could handle are spending engineering budget on infrastructure that their use case does not require. Measure the task shape first, match the pattern second, and choose the framework third.

Multi-Agent Orchestration: 5 Patterns That Work