Agentic workflow automation compounds — every workflow you ship makes the next one cheaper to build, because the scaffolding, observability, governance, and operating muscle are already in place. The mistake teams make in their first 90 days is picking too many candidates at once and ending up with five half-shipped prototypes instead of three production agents.

The 30/60/90-day plan below is the shape we've seen work across a dozen engagements. Days 1-30 are inventory and scoring, ending with the top three candidates in active prototype. Days 31-60 are the first production deploy plus the observability and cost-telemetry layers that make the next two cheaper. Days 61-90 are the second and third deploys, the governance hand-off to operations, and the scale-out plan for quarter two.

This guide is a playbook, not a sales pitch. It includes the scoring rubric we use, the inventory and prototype-brief templates, the four failure modes we see most often, and answers to the questions ops leads ask before approving an automation program. Skip to the templates if you want to start a working document today.

Key takeaways

01
Workflow automation compounds.Every shipped agent makes the next one cheaper to build because the scaffolding — orchestration, traces, cost telemetry, governance — is already in place. The first workflow is the expensive one.
02
Pick three workflows to start.Three is the right cardinality for a 90-day plan. Fewer doesn't prove the pattern; more spreads engineering thin and leaves you with half-shipped prototypes instead of production agents.
03
Scoring rubric prevents shiny-object syndrome.Volume times change cadence times value, scored 1-5 on each axis. The top three on the rubric beat the three the loudest stakeholder names every time — and protect against the appeal of the demo-friendly long tail.
04
Observability before scale.Per-workflow traces and per-call cost telemetry land in Sprint 4, before the second workflow ships. Without it, scale-out is a leap of faith and the first cost surprise lands in finance instead of the dashboard.
05
Governance hand-off prevents sprawl.By day 90, operations owns the runbook, the alerting, and the change-review process. Engineering owns the platform. Without that split, every new workflow re-opens the engineering backlog.

01 — Why 90 DaysWorkflow automation compounds — pick three to start.

Ninety days is the right horizon for two reasons. It's long enough to take a workflow from inventory entry to production with observability and a governance hand-off — anything shorter short-changes the operating muscle that makes the second workflow cheap. And it's short enough that the program has a forcing function: the calendar drives prioritization, not the loudest stakeholder.

Three workflows is the right cardinality. One workflow doesn't prove the pattern — a single deploy is a project, not a program. Five workflows spreads engineering across too many concurrent prototypes and you end up shipping none. Three lets you sequence the deploys (one in month two, two in month three), reuse the scaffolding aggressively, and still have time to absorb the inevitable surprises in production.

The compounding effect is the part most teams underestimate. Workflow one costs the most: you're standing up orchestration, tracing, cost telemetry, the governance runbook, and the deploy pipeline alongside the actual automation. Workflow two reuses every piece of that scaffolding and ships at roughly 40-50% of workflow one's effort. Workflow three drops to 25-30%. By quarter two, the marginal cost of workflow four through ten is mostly requirements-gathering and prompt engineering.

The compounding bet

The reason to invest in a 90-day plan rather than a one-off automation is scaffolding leverage. The first workflow pays for the platform; every workflow after that pays back in weeks instead of months. Teams that treat each automation as a standalone project never get the compounding curve.

The cardinal rule of the program is to keep the candidate pipeline longer than the active build pipeline. You'll inventory 30-50 candidates in week one, score them in week two, pick three for the quarter — and put the rest in a backlog you revisit each quarter with fresh scores. Workflows shift in value as the business shifts; the backlog earns its weight by surfacing the next-best three for quarter two.

"The first workflow pays for the platform; every workflow after that pays back in weeks instead of months. Teams that treat each automation as a standalone project never get the compounding curve."— Digital Applied agentic engineering, Q1 2026 retrospectives

02 — Days 1-30Workflow inventory, candidate scoring, top-3 prototype kick-off.

The first thirty days is mostly research and de-risking — not building. The output of month one is a scored inventory, three chosen candidates, and three running prototypes that prove the automation is feasible before you commit production engineering time. Resist the temptation to start with the most exciting candidate; start with the inventory.

The five milestones below are the rhythm we run for month one. The order matters — scoring before prototyping protects against the sunk-cost problem where a half-built prototype keeps a low-value workflow alive past its decision point.

Week 1

Workflow inventory

30-50 candidates · shadow ops · interviews

Shadow operations for two days, interview team leads, and capture every recurring workflow that touches an LLM-friendly task — classification, summarization, drafting, routing, extraction. Output: a single document with one line per candidate.

Foundation

Week 2

Candidate scoring

Volume × cadence × value · 1-5 per axis

Apply the three-axis rubric to every candidate. Volume is throughput; cadence is how often the workflow's rules change; value is the time-or-revenue impact of automation. The top three on the rubric become the quarter's focus.

Decision gate

Week 3

Prototype briefs

Inputs · outputs · success criteria · failure modes

Write a one-page brief per chosen workflow. Inputs, expected outputs, success criteria, known failure modes, the human checkpoint design, and the rollback story. Briefs become the contract between engineering and operations.

Three docs

Week 3-4

Three prototypes

Notebook-grade · single-tenant · happy path

Build three throwaway prototypes — one engineer-week each. Prove the model can do the work, capture the unexpected edge cases, validate the prompts. These prototypes will be rewritten in month two; their value is de-risking.

De-risk

End of week 4

Go / no-go

Each prototype reviewed against its brief

Each prototype gets a go/no-go decision against its brief. Prototypes that hit success criteria proceed to production build in month two. Prototypes that don't get dropped — and the next-highest-ranked candidate from the backlog takes the slot.

Decision

The biggest mistake in month one is skipping the inventory in favor of a candidate someone is already excited about. Excitement is a poor predictor of ROI. Workflows with high volume and stable rules almost always beat workflows with low volume and shiny visibility — but only the inventory surfaces the boring high-volume ones in the first place. Run the rubric; trust the rubric.

The prototypes in week three are throwaway code. Resist the urge to make them production-grade — that's month two's job. A prototype that takes two engineer-weeks instead of one delays month two by a week and risks the whole plan. The prototype proves the model can do the work; the production build proves the system can survive the work happening in front of customers.

03 — Days 31-60First workflow deploy, observability layer, cost telemetry.

Month two is where the platform investment lives. You're shipping the first production workflow alongside the scaffolding — orchestration, tracing, cost telemetry — that the second and third workflows will reuse for free. The deploy and the platform compete for engineering time; the discipline is to insist on both rather than ship the workflow first and the platform never.

The five milestones below sequence the platform build alongside the workflow build. Sprint 4 — observability and cost telemetry — is the load-bearing one. Teams that skip it pay for the next year in debugging time and finance surprises.

Sprint 3

Production rewrite

Workflow #1 · resilience baseline

Rewrite the chosen prototype with production scaffolding: per-stage timeouts, idempotent retries, compensating actions for irreversible steps, surgical human checkpoints. Aim for the resilience checklist's defensive-scaffolding tier minimum.

Two weeks

Sprint 4

Observability + cost telemetry

Traces · per-workflow $/run · alerts

Wire trace coverage on every tool call and LLM invocation. Capture per-workflow cost (token spend plus tool-cost markers) aggregable per tenant. Alerts on retry rate, timeout rate, and cost per run. This is the platform layer that scales.

Platform

Sprint 4-5

Workflow #1 deploy

Shadow mode · canary · cutover

Run workflow #1 in shadow mode for a week (it runs alongside the existing manual process, outputs compared but not shipped). Canary 10% of real traffic for a week. Full cutover at end of sprint if metrics hold.

Ship

Sprint 5

Runbook + on-call

Incident playbook · escalation paths

Write the runbook: what to check first when the workflow misbehaves, who to escalate to, how to force-fail or force-approve a stuck instance. Operations co-authors. The runbook becomes the template for workflows two and three.

Operating

End of week 8

Workflow #2 kick-off

Production build starts · scaffolding reused

Workflow #2 begins its production rewrite using the scaffolding from workflow #1 — traces, cost telemetry, resilience patterns, runbook template. Should ship in ~40-50% of the engineering time of workflow #1.

Compounding

The observability decision

The temptation in month two is to defer observability to make room for the workflow deploy. Don't. The first production weekend without traces costs more in debugging time than building the trace layer would have cost in engineer-days. And the first cost surprise without per-workflow telemetry lands in finance, not on your dashboard.

The shadow-mode and canary phases are non-negotiable. A workflow that demos cleanly on synthetic inputs will surface unexpected cases the first day it sees real traffic; shadow mode lets you catch those without customer impact, canary lets you limit blast radius while the bug-fix loop is still warm. The week of shadow plus the week of canary feels slow; it's the cheapest insurance you can buy.

The operations co-authorship of the runbook is the load-bearing handoff signal. If operations isn't in the room writing the runbook, the workflow will get a runbook that engineering thinks is sufficient and operations finds useless at 2am. Co-authoring the runbook is how engineering and operations build the shared mental model that month three will rely on.

04 — Days 61-90Second + third deploy, governance hand-off, scale-out plan.

Month three is where compounding kicks in. Workflow two ships at roughly half the cost of workflow one; workflow three at roughly a third. The platform stops being a cost center and starts being leverage. The other big shift in month three is the governance hand-off — operations takes ownership of the live workflows, and engineering moves to platform-and-next-quarter mode.

The five milestones below complete the program. By day 90 you should have three production workflows, an operating muscle for running them, and a scored backlog for quarter two.

Sprint 6

Workflow #2 ships

Shadow · canary · cutover · runbook

Workflow #2 follows the same shadow-canary-cutover discipline as workflow #1, with the scaffolding reused. Operations writes the runbook this time, engineering reviews. Total engineering effort: 40-50% of workflow #1.

Compounding

Sprint 6-7

Workflow #3 production

Build in parallel · ship sprint 7

Workflow #3 builds in parallel with #2's deploy and ships a sprint later. The shared scaffolding makes parallel work tractable for the first time. Total engineering effort: 25-30% of workflow #1.

Parallel

Sprint 7

Governance hand-off

Ops owns runbooks · change-review process

Operations formally owns the runbooks, the alerting, and the change-review process for live workflows. Engineering retains the platform — orchestration, traces, cost telemetry — but operations gates day-to-day changes.

Critical

Sprint 8

Workflow #3 deploys

Final shadow + canary · full cutover

Workflow #3 reaches full production. Three workflows are now live; the operating muscle is real. Take a beat to retro the program before kicking off Q2 candidates from the backlog.

Ship

End of week 12

Scale-out plan

Backlog re-scored · Q2 top-3 picked

Re-score the backlog with the lessons learned from the first three workflows. Pick the Q2 top-3 candidates. Document the platform improvements that came out of the program and ship them to the platform roadmap.

Q2 prep

The governance hand-off is the most underestimated milestone in the plan. Without it, every new workflow re-opens the engineering backlog, every alert routes to the engineering on-call rotation, and operations never builds the muscle to own the system. With it, operations runs the workflows day-to-day and engineering focuses on the platform and the next quarter's candidates — which is exactly the split that lets the program scale beyond three workflows.

The scale-out plan at the end of week 12 isn't a victory lap; it's the start of quarter two's 90-day cycle. The re-scored backlog and the lessons-learned doc are the inputs to quarter two's top-3 selection. Some teams find that quarter two's targets are the boring high-volume candidates the inventory surfaced in week one; some find that the business has shifted and new candidates take the lead. Either way, the cadence holds: inventory, score, pick three, ship.

For a deeper view on the resilience layer that the platform scaffolding has to deliver, the agentic workflow resilience audit covers the seventy specific checks that distinguish a resilient workflow from a happy-path script. Sprint 3's production rewrite should aim for at least the defensive-scaffolding tier on that checklist.

05 — Scoring RubricVolume × change cadence × value.

The rubric is three axes, each scored 1-5. Volume is throughput — how many times per week or month does the workflow run. Change cadence is the inverse of stability — how often do the rules of the workflow change. Value is the time-or-revenue impact of automating it. Multiply the three for a total out of 125; rank candidates by total.

The non-obvious axis is change cadence. Workflows whose rules change weekly are expensive to automate — every rule change is a prompt update and a test pass. Workflows whose rules are stable for months at a time are cheap to maintain after the initial build. High volume plus stable rules plus high value is the sweet spot; the top of the inventory.

High volume · stable rules · high value

The sweet spot

Score 4-5 on all three axes. Total 64-125. These are the workflows the rubric exists to surface — boring high-volume tasks where the rules are stable enough that the maintenance cost is low. Almost always wins the quarter.

Pick these first

High volume · stable rules · low value

Cheap but underwhelming

Score 4-5 on volume and cadence, 1-2 on value. Total 8-50. Tempting because they're cheap to ship; usually not worth the slot. Run them in quarter three or four once the program is mature and the marginal cost is near-zero.

Defer to Q3+

High volume · volatile rules · high value

Expensive but worthy

Score 4-5 on volume and value, 1-2 on cadence. Total 8-50 (low cadence score drags the total). These workflows are worth automating but the maintenance cost is real — budget for ongoing prompt and rule updates as part of the program.

Pick with eyes open

Low volume · any cadence · any value

Almost never the right pick

Score 1-2 on volume. Even with high value per run, total stays below 40. The compounding bet doesn't pay off for low-volume workflows — the scaffolding cost is amortized too thinly. Manual processes with documentation are usually the right answer.

Skip

One refinement we've added on recent engagements: weight the cadence axis 1.5× when scoring for the first quarter. The rationale is that workflow one is paying for the platform, so anything that introduces ongoing rule volatility eats into the compounding gains for workflows two and three. Once the platform is established, the cadence weight goes back to 1× and volatile high-value workflows become viable picks for quarter two onward.

The rubric is deliberately simple. We've seen teams build ten-axis scoring frameworks that produce false precision and obscure the obvious. Three axes, multiplied, ranked. The discussion that surrounds the scoring — debating where to draw the lines, surfacing tacit knowledge about the workflows — is where the value actually lives. The number is the artifact; the conversation is the work.

06 — TemplatesInventory template, scoring rubric, prototype brief.

The three templates below are the working documents we hand to clients at the start of an engagement. Copy them into your own doc tool; the templates are intentionally minimal so you can adapt them to your team's vocabulary. The shape matters more than the formatting.

1. Workflow inventory (one row per candidate)

# Workflow inventory · <team> · <date>

| ID | Workflow              | Owner    | Volume/wk | Cadence | Value     | Score | Notes                       |
|----|-----------------------|----------|-----------|---------|-----------|-------|-----------------------------|
| 01 | Inbound lead triage   | Sales    | 250       | Stable  | High      | 100   | Forms + email source        |
| 02 | Renewal note drafting | CS       | 80        | Stable  | High      | 64    | Customer-facing, irrev.     |
| 03 | Ticket classification | Support  | 1200      | Quart.  | Med       | 60    | High volume, low value/run  |
| 04 | Invoice extraction    | Finance  | 400       | Stable  | High      | 100   | OCR + LLM hybrid            |
| 05 | RFP first-pass        | Sales    | 12        | Weekly  | High      | 24    | Volume too low for slot     |
| .. | ...                   | ...      | ...       | ...     | ...       | ...   | ...                         |

Notes:
- Volume = runs per week, observed not estimated
- Cadence = how often workflow rules change (Stable / Quarterly / Monthly / Weekly)
- Value = qualitative time/$ impact of automation (Low / Med / High)
- Score = volume_score × cadence_score × value_score (each 1-5)

2. Scoring rubric (the three axes)

# Scoring rubric

Volume score:    1 = <10/wk    2 = 10-50    3 = 50-200    4 = 200-1000   5 = 1000+
Cadence score:   1 = weekly    2 = monthly  3 = quarterly 4 = 6-monthly  5 = stable
Value score:     1 = trivial   2 = low      3 = medium    4 = high       5 = critical

Total = Volume × Cadence × Value (max 125)

Q1 weighting: Cadence weighted 1.5× to favor stable workflows
              while the platform is being established.

Decision rule: Rank candidates by total. Top 3 are quarter's targets.
               Backlog is everything else, re-scored each quarter.

3. Prototype brief (one page per chosen workflow)

# Prototype brief · <workflow name>

## Inputs
- What does the workflow receive? Source system, format, volume.

## Expected outputs
- What does the workflow produce? Format, destination, downstream consumer.

## Success criteria
- Quantitative: accuracy threshold, latency budget, cost per run cap.
- Qualitative: tone, format compliance, completeness rules.

## Known failure modes
- What goes wrong in the manual version today?
- What's the worst case if the agent gets it wrong?

## Human checkpoint design
- Which steps require human approval?
- What's the approval timeout and default action?
- Who is the approver, and what's their SLA?

## Rollback / compensation
- Which steps are irreversible?
- What's the compensating action for each mutating step?

## Observability requirements
- Per-run trace ID, captured inputs/outputs, cost per run.
- Alerts on: timeout rate, retry rate, compensation rate, cost anomalies.

## Hand-off plan
- Who owns the runbook?
- Who owns the alerting?
- Who gates changes after launch?

Templates are starting points

The shape of the templates is what matters — the columns in the inventory, the axes in the rubric, the sections in the brief. Adapt the language to your team's vocabulary. Resist the urge to add columns and axes; the value of the templates is forced minimalism that surfaces the decisions without burying them.

07 — PitfallsFour workflow-automation failure modes.

Across engagements, the same four failure modes appear in roughly this order of frequency. Knowing them in advance is the cheapest insurance against them.

1. Shiny-object syndrome (picking the exciting candidate)

The first failure mode is picking the most exciting workflow instead of the highest-scoring one. The exciting candidate is usually customer-facing, demo-able, and politically visible — which is exactly what makes it dangerous. Customer-facing workflows have higher blast radius, demo-able ones are usually low-volume long-tail, and politically visible ones get cut at the first incident. Run the rubric; trust the rubric. If the rubric's top three disappoint the loudest stakeholder, that's a stakeholder-management conversation, not a scoring problem.

2. Skipping observability for speed

The second failure mode is treating observability as a phase-two concern. It isn't. Without per-workflow traces and per-run cost telemetry, the first production weekend is a guessing exercise and the first cost surprise lands in finance. The 1-2 engineer-weeks observability costs in sprint 4 saves multiples of that in debugging time across the program. Don't defer it.

3. No governance hand-off (engineering owns everything forever)

The third failure mode is engineering retaining ownership of live workflows past quarter one. The signal is that every new workflow re-opens the engineering backlog and every production alert routes to engineering on-call. The fix is the explicit hand-off in sprint 7: operations owns runbooks and day-to-day changes, engineering retains the platform. Without it, the program never scales past three workflows.

4. Building five at once (the cardinality trap)

The fourth failure mode is widening the slate from three to five because "we have the bandwidth." You don't. The bandwidth math assumes workflows ship sequentially with scaffolding reuse; five concurrent prototypes means none of them gets enough engineering attention to clear the production bar. The discipline is to stay at three for quarter one, no matter how confident the team feels — and add the fourth slot in quarter two only after three are shipped.

"The exciting candidate is usually customer-facing, demo-able, and politically visible — which is exactly what makes it dangerous. Run the rubric; trust the rubric."— Common failure mode #1, across engagements

If you want the same plan applied to your team's workflows with the scoring done in the room and the prototype briefs written collaboratively, our AI transformation engagements run this exact 90-day cadence as a standard line item. The companion resilience audit grades the production builds against seventy checks before cut-over, and the agent vs Zapier TCO calculator helps frame the build-vs-buy decision per workflow.

Conclusion

Workflow automation compounds — pick three to start and the rest follow.

Ninety days is the right horizon to take a workflow automation program from inventory to three production agents with an operating muscle to run them. The compounding effect is what makes the program work: the first workflow pays for the platform; the second ships at half the cost; the third at a third. By quarter two, the marginal cost of new workflows is mostly requirements-gathering and prompt engineering.

The discipline that holds the program together is the scoring rubric and the three-workflow cap. Volume times change cadence times value, ranked, top three picked. The rubric protects against the loudest stakeholder picking the slate; the cap protects against widening the slate past what the team can actually ship. Both feel restrictive at the start of quarter one; both are what make quarter two and quarter three possible.

Practical next step: this week, run the inventory exercise. Shadow operations for two days, interview three team leads, write down every recurring workflow that touches an LLM-friendly task. Score the top fifteen against the rubric. The list that comes out of that exercise is the start of the program — and the conversation that surrounds the scoring is usually the most valuable artifact of the first month.

Agentic Workflow Automation: 30/60/90-Day Plan 2026