By 2026, the agencies that have moved past prompt-engineering have moved into orchestration. The interesting work — research-and-brief, full content drafts, technical audits with actionable findings — is no longer a single agent with a clever prompt. It is a directed graph of specialised agents, each with one job, handing structured outputs to the next agent in the graph, with human review gates placed where they actually catch mistakes.

This playbook is the reference we use across our agency book. It specifies the seven-role agent taxonomy, maps twelve typical agency workflows onto multi-agent graphs, defines the handoff protocols between roles, and recommends the orchestration framework (LangGraph, CrewAI, or Mastra) per workflow shape.

It is not aspirational. Every workflow in the playbook ships in production for at least one agency client at the time of writing.

Key takeaways

01
Multi-agent graphs beat single-agent prompts when the workflow has more than three distinct phases.One agent doing everything degrades quality on each phase. Decomposing into specialised agents with one job each lifts quality consistently. The break-even is around three phases — below that, the orchestration overhead costs more than it adds.
02
The seven-role agent taxonomy keeps the graph readable.Researcher, drafter, auditor, reviewer, deployer, router, escalator. Every agent in every workflow falls into one of these. The shared taxonomy is what makes the playbook shareable across pods and what keeps engineering reviews tractable.
03
Handoffs need structured-output schemas, not prose blobs.Agent A's output becomes Agent B's input. If A outputs prose, B has to parse; parsing is unreliable; the graph becomes brittle. Structured outputs (JSON schemas with required fields) are the boring engineering choice that makes the whole pattern work.
04
Human-in-the-loop gates go after the auditor, not after the drafter.Reviewers add value when there is structured feedback to give. Reviewing a raw draft is exhausting; reviewing an audited draft with surfaced issues is fast. Gate placement is the lever that determines whether HITL becomes a bottleneck.
05
Pick the framework per workflow shape — LangGraph for graph-heavy, CrewAI for role-heavy, Mastra for TS-stack.There is no single 'best' framework. LangGraph wins on graph-structured durable workflows; CrewAI wins on speed-to-scaffold for role-based; Mastra wins on TypeScript stacks. Most agencies standardise on two — primary plus secondary — and pick per project.

01 — PremiseWhy graphs, not chains.

Linear chains (prompt → output → prompt → output) are the natural first move. They scale until the workflow has any one of: a branch, a retry, a long-running step, a step that needs human input, or a step where the output of two earlier steps must merge. Most agency workflows have all five.

Graphs handle all five natively. Nodes are agents; edges are conditional routing decisions; state is persistent and checkpointed. The graph model has more conceptual overhead than the chain model, and the conceptual overhead pays back the moment the workflow has to handle a real-world failure mode.

"We rebuilt our research-and-brief workflow from a 6-step chain into a 4-node graph. Same agents. Half the prompts. The reliability under load was the difference between flaky and shippable."— Lead engineer, agency platform team, March 2026

02 — TaxonomyThe 7-role agent taxonomy.

Every agent in every workflow falls into one of seven roles. The taxonomy is what makes the playbook portable: a researcher in the content workflow looks like a researcher in the support workflow looks like a researcher in the lead-enrichment workflow. Engineering reviews focus on whether the role is implemented correctly, not whether the role is well-defined.

Role 1

RES

Researcher

Gathers raw inputs from external or internal sources. Output is always structured (citations, JSON facts, source URLs). Prompt skill: searching, reading, distinguishing primary from secondary sources.

Source-of-truth

Role 2

DRA

Drafter

Composes prose, code, or structured outputs from researcher inputs. Output is the artifact under review. Prompt skill: voice, structure, claim-fluency.

Artifact producer

Role 3

AUD

Auditor

Scores the drafter's output against a rubric, surfaces issues, suggests revisions. Output is a structured findings list. Prompt skill: rubric application, tight scoring, false-positive avoidance.

Quality gate

Role 4

REV

Reviewer (human or model)

Approves, rejects, or annotates the auditor's findings. Often human-in-the-loop in regulated workflows; model-based for high-volume low-stakes flows. Output is a publish/hold/redraft decision.

Decision authority

Role 5

DEP

Deployer

Pushes the reviewed artifact to its destination — CMS, email tool, CRM, file store, downstream agent. Output is a deployment receipt or error.

Side-effects

Role 6

ROU

Router

Decides which downstream branch the workflow takes. Common in triage/support workflows. Output is a routing decision (always one of N enum values).

Branching

Role 7

ESC

Escalator

Surfaces edge cases that the workflow shouldn't try to handle automatically. Output is an escalation ticket with structured context. The escape hatch that keeps multi-agent graphs from making bad calls under uncertainty.

Safety net

03 — Twelve workflowsTwelve agency workflows mapped.

The twelve workflows below are the ones that recur across the agency book. Each row shows the workflow, the agent roles involved, and the framework pick. Use as a starting point; adapt the role mix to the specific engagement.

Workflow 1

Research-and-brief

Researcher (multi-source) → Drafter → Auditor → Human Reviewer → Deployer (CMS). Long-running, branchy, retries on flaky sources. Durable execution required. Framework: LangGraph.

LangGraph

Workflow 2

Content draft + revision

Researcher (light, internal) → Drafter → Auditor (rubric) → Drafter (revision) → Human Reviewer → Deployer. Loop on auditor findings until rubric ≥ 11. Framework: CrewAI for prototypes, LangGraph for production.

CrewAI / LangGraph

Workflow 3

Technical SEO audit

Researcher (crawler) → Auditor (checklist) → Drafter (findings narrative) → Human Reviewer → Deployer (PDF + CMS). Output is a structured audit report. Framework: LangGraph or Mastra.

LangGraph / Mastra

Workflow 4

GEO scoring (rubric)

Researcher (multi-engine sample) → Auditor (rubric per page) → Drafter (priority list) → Deployer (dashboard). High-volume, periodic. Framework: LangGraph for state persistence.

LangGraph

Workflow 5

Competitive intel

Researcher (competitor watch) → Auditor (signal/noise filter) → Drafter (digest) → Human Reviewer (weekly) → Deployer (Slack + email). Periodic. Framework: CrewAI or Mastra.

CrewAI / Mastra

Workflow 6

Lead enrichment

Researcher (firmographic + technographic) → Auditor (data quality) → Router (tier assignment) → Deployer (CRM). High-volume, structured. Framework: Mastra (TS, low cost).

Mastra

Workflow 7

Paid-ad creative generation

Researcher (audience + brand voice) → Drafter (variants) → Auditor (brand-safety + voice) → Human Reviewer → Deployer (ad platforms). Heavy multimodal use. Framework: LangGraph or CrewAI.

LangGraph / CrewAI

Workflow 8

Lifecycle email composition

Researcher (segment + behaviour) → Drafter → Auditor (compliance + voice) → Reviewer (model or human) → Deployer (ESP). Mass-personalisation. Framework: Mastra (TS-native, Vercel deploy).

Mastra

Workflow 9

Support triage

Router (intent classification) → Researcher (knowledge base) → Drafter (response) → Reviewer (model or human, by severity) → Deployer (helpdesk). High-volume, low-latency. Framework: Mastra or CrewAI.

Mastra / CrewAI

Workflow 10

Reporting digest

Researcher (multi-source data pull) → Drafter (narrative) → Auditor (numbers vs source) → Deployer (PDF + Notion). Periodic. Framework: LangGraph.

LangGraph

Workflow 11

Social listening

Researcher (stream listener) → Auditor (relevance filter) → Drafter (insight summary) → Router (escalate or queue) → Deployer (CRM + Slack). Continuous. Framework: LangGraph or Mastra.

LangGraph / Mastra

Workflow 12

RFP response

Researcher (past RFPs + current ask) → Drafter (sectional) → Auditor (compliance + voice) → Human Reviewer → Deployer (PDF + portal). Long-running, high-stakes. Framework: LangGraph.

LangGraph

04 — HandoffsHandoff protocols.

A multi-agent graph is only as reliable as the handoffs between agents. Three rules consistently separate fragile graphs from reliable ones.

Rule 1

Structured output, not prose

JSON Schema · validated at edge

Every agent's output that becomes another agent's input is a JSON object with a defined schema. Validate at the edge; reject and retry on schema violation. Prose handoffs feel natural for ~3 weeks until the first parsing failure breaks the workflow.

Hardest-won lesson

Rule 2

Idempotent retries by default

deterministic IDs · checkpointed state

Every node should be safe to retry. Use deterministic task IDs so duplicate runs are detected; checkpoint state at each transition so retries resume from the last success. Idempotency is what lets the graph survive transient failures without manual intervention.

Reliability default

Rule 3

Explicit failure modes

outcome enum · always one of N

Each node returns one of a small set of outcomes (success, partial, retry, escalate). Downstream routing is the same enum every time. No 'unknown' outcomes — they collapse the graph into hand-holding.

Routing clarity

Rule 4

Tool calls as side-effects, not data

deployer-only · everywhere else read-only

Side-effecting tool calls (sending email, writing to CMS, charging a card) belong in the deployer node and only the deployer node. Researchers, drafters, auditors, reviewers should be read-only. This rule single-handedly prevents the most common production-incident class.

Safety architecture

05 — HITL gatesWhere to place gates.

Gate placement is the single biggest determinant of whether a human-in-the-loop workflow becomes a quality control or a bottleneck. Two rules.

Rule

Gate after the auditor, not after the drafter

Reviewing a raw draft is exhausting (the reviewer has to identify both what to fix and how to fix it). Reviewing an audited draft with surfaced issues is fast (the reviewer makes accept/reject calls on flagged items). The same human reviewer is 4-6× more productive on audited input than raw input.

After the auditor

Rule

Two gates, not one or three

Most agency workflows benefit from exactly two gates: one before deployment (reviewer approves auditor findings), one before escalation (reviewer triages escalator output). One gate misses production safety; three gates produce reviewer fatigue and stop the workflow.

Two gates

06 — FrameworkFramework pick per workflow.

The framework matrix below summarises the picks across the twelve workflows. Use it as a starting point; standardise on two frameworks across the agency book to keep depth high.

Framework 1

LangGraph — graph + durable

Python · LangSmith default

Workflows 1, 3, 4, 7, 10, 11, 12. Anything graph-heavy, anything long-running, anything that needs durable execution and the deepest observability. Right default for 6-7 of the 12 workflows.

Production default

Framework 2

CrewAI — role-based

Python · fastest scaffold

Workflows 2, 5, 7, 9. Role-based delegation maps cleanly when the workflow reads as 'a crew of specialists collaborating'. Fastest from scratch; lighter on durable execution.

Fast scaffold

Framework 3

Mastra — TypeScript

TS · Vercel-native

Workflows 3, 5, 6, 8, 9, 11. Right default for any workflow that lives in a Next.js / Vercel-native deployment. TS type-safety on tool inputs is invaluable for high-volume structured workflows (lead enrichment, lifecycle email, support triage).

TS-native

Standardise

Two frameworks, not four

agency-wide pick

Most agencies converge on LangGraph + Mastra or LangGraph + CrewAI as their two-framework standard. Picking 3+ frameworks spreads the team's depth too thin; picking 1 forces some workflows into the wrong shape. Two is the sweet spot.

Standard stack

07 — RolloutRolling out the playbook.

Phase 1

30 d

Pick two workflows, pick two frameworks

Don't try to roll out all 12 at once. Pick two workflows that are already painful (research-and-brief and content drafting are typical first picks). Pick two frameworks. Build both workflows on the chosen frameworks. The first workflow is the real cost; the second is mostly framework-template reuse.

Foundation

Phase 2

60 d

Add 4 more workflows

Once two workflows are in production, the next four come fast — most of the cost is the role taxonomy, the handoff schemas, and the deployment pipeline, all of which are now reusable. Six workflows in production at day 90 is a typical milestone.

Scale phase

Phase 3

120 d

Reach 10-12 workflows + retro

By day 120 most agencies have 10-12 workflows in production. The phase-3 retro should focus on which workflows underperformed expectations (usually because the role mix was wrong, not because the framework was wrong) and which workflows surprised on the upside.

Maturity

Ongoing

qtr

Quarterly playbook review

Each quarter, retro the playbook: which roles need expansion, which workflows have been deprecated, which frameworks have shifted competitively. The playbook is a living document; without quarterly review it drifts within 6 months.

Sustain

08 — ConclusionTwelve workflows, seven roles.

Multi-agent orchestration playbook, April 2026

Multi-agent graphs replace single-agent prompts the moment a workflow has more than three phases. The playbook is what makes that transition operable.

The interesting agency work in 2026 is not built on cleverer prompts. It is built on graphs of specialised agents handing structured outputs between each other, with HITL gates placed where they catch mistakes, on a framework chosen for the workflow shape rather than the brand.

Adopt the seven-role taxonomy. Map your workflows to it. Use the handoff rules — structured output, idempotent retries, explicit outcomes, side-effects only at the deployer. Place HITL gates after the auditor, not after the drafter. Standardise on two frameworks; pick per workflow shape.

The playbook is the artifact that keeps multi-agent work shippable instead of brittle. The cost is conceptual overhead; the payoff is reliability under load. By day 120 of a rollout, most agencies have 10-12 workflows in production and have stopped writing single-agent prompts for anything non-trivial.

Multi-Agent Orchestration Playbook

01 — PremiseWhy graphs, not chains.

02 — TaxonomyThe 7-role agent taxonomy.

Researcher

Drafter

Auditor

Reviewer (human or model)

Deployer

Router

Escalator

03 — Twelve workflowsTwelve agency workflows mapped.

Research-and-brief

Content draft + revision

Technical SEO audit

GEO scoring (rubric)

Competitive intel

Lead enrichment

Paid-ad creative generation

Lifecycle email composition

Support triage

Reporting digest

Social listening

RFP response

04 — HandoffsHandoff protocols.

Structured output, not prose

Idempotent retries by default

Explicit failure modes

Tool calls as side-effects, not data

05 — HITL gatesWhere to place gates.

Gate after the auditor, not after the drafter

Two gates, not one or three

06 — FrameworkFramework pick per workflow.

LangGraph — graph + durable

CrewAI — role-based

Mastra — TypeScript

Two frameworks, not four

07 — RolloutRolling out the playbook.

Pick two workflows, pick two frameworks

Add 4 more workflows

Reach 10-12 workflows + retro

Quarterly playbook review

08 — ConclusionTwelve workflows, seven roles.

Multi-agent graphs replace single-agent prompts the moment a workflow has more than three phases. The playbook is what makes that transition operable.

Stop chaining prompts. Run a graph.

Multi-agent engagements

The questions we get every week.

Continue exploring agentic agency operations.

Agent Architecture Patterns: 2026 Taxonomy Guide

Agentic Orchestration: LangGraph vs CrewAI vs Mastra

OpenAI Agents SDK vs LangGraph vs CrewAI: 2026 Matrix