SYS/2026.Q1Agentic SEO audits delivered in 72 hoursSee how →
AI DevelopmentPlaybook3 min readPublished Apr 27, 2026

12 workflows · 7-role agent taxonomy · LangGraph · CrewAI · Mastra picks

Multi-Agent Orchestration Playbook

Single-agent prompts hit a ceiling. The interesting agency work in 2026 is multi-agent: a graph of specialised agents handing work between each other with structured outputs and human gates in the right places. This playbook maps twelve agency workflows to that pattern and ships the handoff protocols and the framework pick for each.

DA
Digital Applied Team
Senior strategists · Published Apr 27, 2026
PublishedApr 27, 2026
Read time3 min
SourcesLangGraph · CrewAI · Mastra · Anthropic essays · DA fieldwork
Workflows
12
agency workflows mapped to multi-agent
Roles
7
researcher, drafter, auditor, reviewer, deployer…
Frameworks
3
LangGraph · CrewAI · Mastra
HITL gates
2-3
per workflow, average
field default

By 2026, the agencies that have moved past prompt-engineering have moved into orchestration. The interesting work — research-and-brief, full content drafts, technical audits with actionable findings — is no longer a single agent with a clever prompt. It is a directed graph of specialised agents, each with one job, handing structured outputs to the next agent in the graph, with human review gates placed where they actually catch mistakes.

This playbook is the reference we use across our agency book. It specifies the seven-role agent taxonomy, maps twelve typical agency workflows onto multi-agent graphs, defines the handoff protocols between roles, and recommends the orchestration framework (LangGraph, CrewAI, or Mastra) per workflow shape.

It is not aspirational. Every workflow in the playbook ships in production for at least one agency client at the time of writing.

Key takeaways
  1. 01
    Multi-agent graphs beat single-agent prompts when the workflow has more than three distinct phases.One agent doing everything degrades quality on each phase. Decomposing into specialised agents with one job each lifts quality consistently. The break-even is around three phases — below that, the orchestration overhead costs more than it adds.
  2. 02
    The seven-role agent taxonomy keeps the graph readable.Researcher, drafter, auditor, reviewer, deployer, router, escalator. Every agent in every workflow falls into one of these. The shared taxonomy is what makes the playbook shareable across pods and what keeps engineering reviews tractable.
  3. 03
    Handoffs need structured-output schemas, not prose blobs.Agent A's output becomes Agent B's input. If A outputs prose, B has to parse; parsing is unreliable; the graph becomes brittle. Structured outputs (JSON schemas with required fields) are the boring engineering choice that makes the whole pattern work.
  4. 04
    Human-in-the-loop gates go after the auditor, not after the drafter.Reviewers add value when there is structured feedback to give. Reviewing a raw draft is exhausting; reviewing an audited draft with surfaced issues is fast. Gate placement is the lever that determines whether HITL becomes a bottleneck.
  5. 05
    Pick the framework per workflow shape — LangGraph for graph-heavy, CrewAI for role-heavy, Mastra for TS-stack.There is no single 'best' framework. LangGraph wins on graph-structured durable workflows; CrewAI wins on speed-to-scaffold for role-based; Mastra wins on TypeScript stacks. Most agencies standardise on two — primary plus secondary — and pick per project.

01PremiseWhy graphs, not chains.

Linear chains (prompt → output → prompt → output) are the natural first move. They scale until the workflow has any one of: a branch, a retry, a long-running step, a step that needs human input, or a step where the output of two earlier steps must merge. Most agency workflows have all five.

Graphs handle all five natively. Nodes are agents; edges are conditional routing decisions; state is persistent and checkpointed. The graph model has more conceptual overhead than the chain model, and the conceptual overhead pays back the moment the workflow has to handle a real-world failure mode.

"We rebuilt our research-and-brief workflow from a 6-step chain into a 4-node graph. Same agents. Half the prompts. The reliability under load was the difference between flaky and shippable."— Lead engineer, agency platform team, March 2026

02TaxonomyThe 7-role agent taxonomy.

Every agent in every workflow falls into one of seven roles. The taxonomy is what makes the playbook portable: a researcher in the content workflow looks like a researcher in the support workflow looks like a researcher in the lead-enrichment workflow. Engineering reviews focus on whether the role is implemented correctly, not whether the role is well-defined.

Role 1
RES
Researcher

Gathers raw inputs from external or internal sources. Output is always structured (citations, JSON facts, source URLs). Prompt skill: searching, reading, distinguishing primary from secondary sources.

Source-of-truth
Role 2
DRA
Drafter

Composes prose, code, or structured outputs from researcher inputs. Output is the artifact under review. Prompt skill: voice, structure, claim-fluency.

Artifact producer
Role 3
AUD
Auditor

Scores the drafter's output against a rubric, surfaces issues, suggests revisions. Output is a structured findings list. Prompt skill: rubric application, tight scoring, false-positive avoidance.

Quality gate
Role 4
REV
Reviewer (human or model)

Approves, rejects, or annotates the auditor's findings. Often human-in-the-loop in regulated workflows; model-based for high-volume low-stakes flows. Output is a publish/hold/redraft decision.

Decision authority
Role 5
DEP
Deployer

Pushes the reviewed artifact to its destination — CMS, email tool, CRM, file store, downstream agent. Output is a deployment receipt or error.

Side-effects
Role 6
ROU
Router

Decides which downstream branch the workflow takes. Common in triage/support workflows. Output is a routing decision (always one of N enum values).

Branching
Role 7
ESC
Escalator

Surfaces edge cases that the workflow shouldn't try to handle automatically. Output is an escalation ticket with structured context. The escape hatch that keeps multi-agent graphs from making bad calls under uncertainty.

Safety net

03Twelve workflowsTwelve agency workflows mapped.

The twelve workflows below are the ones that recur across the agency book. Each row shows the workflow, the agent roles involved, and the framework pick. Use as a starting point; adapt the role mix to the specific engagement.

Workflow 1
Research-and-brief

Researcher (multi-source) → Drafter → Auditor → Human Reviewer → Deployer (CMS). Long-running, branchy, retries on flaky sources. Durable execution required. Framework: LangGraph.

LangGraph
Workflow 2
Content draft + revision

Researcher (light, internal) → Drafter → Auditor (rubric) → Drafter (revision) → Human Reviewer → Deployer. Loop on auditor findings until rubric ≥ 11. Framework: CrewAI for prototypes, LangGraph for production.

CrewAI / LangGraph
Workflow 3
Technical SEO audit

Researcher (crawler) → Auditor (checklist) → Drafter (findings narrative) → Human Reviewer → Deployer (PDF + CMS). Output is a structured audit report. Framework: LangGraph or Mastra.

LangGraph / Mastra
Workflow 4
GEO scoring (rubric)

Researcher (multi-engine sample) → Auditor (rubric per page) → Drafter (priority list) → Deployer (dashboard). High-volume, periodic. Framework: LangGraph for state persistence.

LangGraph
Workflow 5
Competitive intel

Researcher (competitor watch) → Auditor (signal/noise filter) → Drafter (digest) → Human Reviewer (weekly) → Deployer (Slack + email). Periodic. Framework: CrewAI or Mastra.

CrewAI / Mastra
Workflow 6
Lead enrichment

Researcher (firmographic + technographic) → Auditor (data quality) → Router (tier assignment) → Deployer (CRM). High-volume, structured. Framework: Mastra (TS, low cost).

Mastra
Workflow 7
Paid-ad creative generation

Researcher (audience + brand voice) → Drafter (variants) → Auditor (brand-safety + voice) → Human Reviewer → Deployer (ad platforms). Heavy multimodal use. Framework: LangGraph or CrewAI.

LangGraph / CrewAI
Workflow 8
Lifecycle email composition

Researcher (segment + behaviour) → Drafter → Auditor (compliance + voice) → Reviewer (model or human) → Deployer (ESP). Mass-personalisation. Framework: Mastra (TS-native, Vercel deploy).

Mastra
Workflow 9
Support triage

Router (intent classification) → Researcher (knowledge base) → Drafter (response) → Reviewer (model or human, by severity) → Deployer (helpdesk). High-volume, low-latency. Framework: Mastra or CrewAI.

Mastra / CrewAI
Workflow 10
Reporting digest

Researcher (multi-source data pull) → Drafter (narrative) → Auditor (numbers vs source) → Deployer (PDF + Notion). Periodic. Framework: LangGraph.

LangGraph
Workflow 11
Social listening

Researcher (stream listener) → Auditor (relevance filter) → Drafter (insight summary) → Router (escalate or queue) → Deployer (CRM + Slack). Continuous. Framework: LangGraph or Mastra.

LangGraph / Mastra
Workflow 12
RFP response

Researcher (past RFPs + current ask) → Drafter (sectional) → Auditor (compliance + voice) → Human Reviewer → Deployer (PDF + portal). Long-running, high-stakes. Framework: LangGraph.

LangGraph

04HandoffsHandoff protocols.

A multi-agent graph is only as reliable as the handoffs between agents. Three rules consistently separate fragile graphs from reliable ones.

Rule 1
Structured output, not prose
JSON Schema · validated at edge

Every agent's output that becomes another agent's input is a JSON object with a defined schema. Validate at the edge; reject and retry on schema violation. Prose handoffs feel natural for ~3 weeks until the first parsing failure breaks the workflow.

Hardest-won lesson
Rule 2
Idempotent retries by default
deterministic IDs · checkpointed state

Every node should be safe to retry. Use deterministic task IDs so duplicate runs are detected; checkpoint state at each transition so retries resume from the last success. Idempotency is what lets the graph survive transient failures without manual intervention.

Reliability default
Rule 3
Explicit failure modes
outcome enum · always one of N

Each node returns one of a small set of outcomes (success, partial, retry, escalate). Downstream routing is the same enum every time. No 'unknown' outcomes — they collapse the graph into hand-holding.

Routing clarity
Rule 4
Tool calls as side-effects, not data
deployer-only · everywhere else read-only

Side-effecting tool calls (sending email, writing to CMS, charging a card) belong in the deployer node and only the deployer node. Researchers, drafters, auditors, reviewers should be read-only. This rule single-handedly prevents the most common production-incident class.

Safety architecture

05HITL gatesWhere to place gates.

Gate placement is the single biggest determinant of whether a human-in-the-loop workflow becomes a quality control or a bottleneck. Two rules.

Rule
Gate after the auditor, not after the drafter

Reviewing a raw draft is exhausting (the reviewer has to identify both what to fix and how to fix it). Reviewing an audited draft with surfaced issues is fast (the reviewer makes accept/reject calls on flagged items). The same human reviewer is 4-6× more productive on audited input than raw input.

After the auditor
Rule
Two gates, not one or three

Most agency workflows benefit from exactly two gates: one before deployment (reviewer approves auditor findings), one before escalation (reviewer triages escalator output). One gate misses production safety; three gates produce reviewer fatigue and stop the workflow.

Two gates

06FrameworkFramework pick per workflow.

The framework matrix below summarises the picks across the twelve workflows. Use it as a starting point; standardise on two frameworks across the agency book to keep depth high.

Framework 1
LangGraph — graph + durable
Python · LangSmith default

Workflows 1, 3, 4, 7, 10, 11, 12. Anything graph-heavy, anything long-running, anything that needs durable execution and the deepest observability. Right default for 6-7 of the 12 workflows.

Production default
Framework 2
CrewAI — role-based
Python · fastest scaffold

Workflows 2, 5, 7, 9. Role-based delegation maps cleanly when the workflow reads as 'a crew of specialists collaborating'. Fastest from scratch; lighter on durable execution.

Fast scaffold
Framework 3
Mastra — TypeScript
TS · Vercel-native

Workflows 3, 5, 6, 8, 9, 11. Right default for any workflow that lives in a Next.js / Vercel-native deployment. TS type-safety on tool inputs is invaluable for high-volume structured workflows (lead enrichment, lifecycle email, support triage).

TS-native
Standardise
Two frameworks, not four
agency-wide pick

Most agencies converge on LangGraph + Mastra or LangGraph + CrewAI as their two-framework standard. Picking 3+ frameworks spreads the team's depth too thin; picking 1 forces some workflows into the wrong shape. Two is the sweet spot.

Standard stack

07RolloutRolling out the playbook.

Phase 1
30 d
Pick two workflows, pick two frameworks

Don't try to roll out all 12 at once. Pick two workflows that are already painful (research-and-brief and content drafting are typical first picks). Pick two frameworks. Build both workflows on the chosen frameworks. The first workflow is the real cost; the second is mostly framework-template reuse.

Foundation
Phase 2
60 d
Add 4 more workflows

Once two workflows are in production, the next four come fast — most of the cost is the role taxonomy, the handoff schemas, and the deployment pipeline, all of which are now reusable. Six workflows in production at day 90 is a typical milestone.

Scale phase
Phase 3
120 d
Reach 10-12 workflows + retro

By day 120 most agencies have 10-12 workflows in production. The phase-3 retro should focus on which workflows underperformed expectations (usually because the role mix was wrong, not because the framework was wrong) and which workflows surprised on the upside.

Maturity
Ongoing
qtr
Quarterly playbook review

Each quarter, retro the playbook: which roles need expansion, which workflows have been deprecated, which frameworks have shifted competitively. The playbook is a living document; without quarterly review it drifts within 6 months.

Sustain

08ConclusionTwelve workflows, seven roles.

Multi-agent orchestration playbook, April 2026

Multi-agent graphs replace single-agent prompts the moment a workflow has more than three phases. The playbook is what makes that transition operable.

The interesting agency work in 2026 is not built on cleverer prompts. It is built on graphs of specialised agents handing structured outputs between each other, with HITL gates placed where they catch mistakes, on a framework chosen for the workflow shape rather than the brand.

Adopt the seven-role taxonomy. Map your workflows to it. Use the handoff rules — structured output, idempotent retries, explicit outcomes, side-effects only at the deployer. Place HITL gates after the auditor, not after the drafter. Standardise on two frameworks; pick per workflow shape.

The playbook is the artifact that keeps multi-agent work shippable instead of brittle. The cost is conceptual overhead; the payoff is reliability under load. By day 120 of a rollout, most agencies have 10-12 workflows in production and have stopped writing single-agent prompts for anything non-trivial.

Multi-agent design

Stop chaining prompts. Run a graph.

We design and ship multi-agent agency workflows end-to-end — role taxonomy, handoff schemas, HITL gate placement, framework selection (LangGraph, CrewAI, Mastra), and observability. Most engagements ship the first two production workflows within 30 days.

Free consultationExpert guidanceTailored solutions
What we work on

Multi-agent engagements

  • Role taxonomy + handoff schema design
  • Framework selection per workflow shape
  • HITL gate placement for quality control
  • Reference workflows: research, drafting, audit, triage
  • Production observability + variance detection
FAQ · Multi-agent orchestration

The questions we get every week.

When the workflow has fewer than three distinct phases, the orchestration overhead costs more than it adds. Simple workflows — one agent doing one thing with one tool call — should stay simple. The break-even is around three phases (researcher → drafter → reviewer is the smallest workflow we recommend graph-structuring); below that, a single well-prompted agent with structured output and a downstream reviewer is the cheaper, more maintainable choice.