SYS/2026.Q1Agentic SEO audits delivered in 72 hoursSee how →
BusinessPlaybook10 min readPublished May 7, 2026

Stage 1 of 10 — readiness assessment templates. The cheapest stage, the highest-leverage one, and the one most programs skip.

Agentic AI Readiness Assessment: Stage 1 Pipeline Kit

Stage 1 of the agentic AI implementation pipeline — readiness assessment. A repeatable 100-point maturity audit, a gap-analysis matrix, a four-stage maturity model, decision frameworks, and a stakeholder interview guide. The cheapest stage to run and the highest-leverage one to skip.

DA
Digital Applied Team
Senior strategists · Published May 7, 2026
PublishedMay 7, 2026
Read time10 min
SourcesField engagements 2024-2026
Audit points
100
across five domains
Domains
5
infra · governance · data · ops · skills
Maturity tiers
4
ad-hoc → optimised
Typical duration
≈ 6h
internal pass · single auditor

An agentic AI readiness assessment is Stage 1 of a ten-stage implementation pipeline — a structured, low-cost, repeatable audit that measures whether an organisation can operate agentic AI in production. It produces five artefacts: a 100-point maturity audit, a gap-analysis matrix, a four-stage maturity score per domain, a set of decision frameworks, and a stakeholder interview guide. It is the cheapest stage of the pipeline, and the one programs most often regret skipping.

Why now matters: agentic AI programs are crossing from pilot to production across mid-market and enterprise, capital is being committed at pace, and most steering committees still cannot answer the question "are we ready to operate this?" with anything other than a qualitative sentence. Stage 1 turns that sentence into a severity-ranked gap report — six hours of focused effort against a binary-pass rubric, with a remediation roadmap attached.

This guide covers why Stage 1 is the highest-leverage stage in the pipeline, the ten stages and how Stage 1 hands off to Stage 2, the 100-point audit template across five domains, the gap-analysis matrix, the four-stage maturity model, three decision frameworks (build-vs-buy, now-vs-later, scope-vs-depth), and a 25-question stakeholder interview guide. Each template is reproduced below in the format you would actually use it.

The 10-Stage Pipeline

You are reading Stage 1 — Readiness Assessment. Continue the series: Stage 2 — Strategy Roadmap · Stage 3 — Data Foundation · Stage 4 — Vendor Selection · Stage 5 — Prototype · Stage 6 — Production Deploy · Stage 7 — Team Enablement · Stage 8 — Governance · Stage 9 — Scale · Stage 10 — Continuous Improvement.

Key takeaways
  1. 01
    Readiness is observable, not aspirational.Each of the 100 points has a binary pass criterion. Either the artefact exists, the control runs, the metric is tracked — or it does not. Stage 1 scores aspirational programs as zero and removes the qualitative-sentence problem.
  2. 02
    Infrastructure is the cheapest domain to fix.Infra gaps map to procurement and configuration — fixable in weeks. Governance and skills gaps require cultural change and typically dominate the multi-quarter remediation timeline. Plan the budget shape accordingly.
  3. 03
    Governance is where Stage 1 surfaces the highest-severity findings.Policy holes, missing risk registers, and untested incident-response runbooks cluster as critical-severity. A program with strong infrastructure and weak governance is one incident away from a board-level event.
  4. 04
    Skills lag procurement by 6-9 months.Tools arrive faster than enablement. Plan training and operating-model changes alongside — not after — infrastructure investment, or expect a sustained capability gap that hollows out the next two stages of the pipeline.
  5. 05
    Quarterly re-audit beats annual deep-dive.Drift in tools, models, vendors, and team composition is faster than the annual procurement cycle. A lighter quarterly cadence catches regressions before they compound into a remediation project of their own.

01Why Stage 1Readiness assessment is cheap — and the lift compounds.

Stage 1 costs roughly one engineering-leader week — six hours of focused audit, a half-day of report write-up, and a handful of 45-minute stakeholder interviews. That is the cheapest stage in the entire ten-stage pipeline. Stage 5 prototype builds run weeks; Stage 6 production deploys run quarters; Stage 8 governance retrofits, when done under regulatory pressure, can run a year. A properly-run Stage 1 either prevents those costs or correctly scopes them up front.

The compounding effect is the part most teams miss. Every subsequent stage in the pipeline assumes the artefacts Stage 1 produces. The strategy roadmap (Stage 2) needs the gap report to decide where to invest. The data foundation work (Stage 3) needs the source inventory the audit forces. The vendor selection (Stage 4) needs the decision-framework criteria. Production deploy (Stage 6) needs the named ownership the audit demands. Skip Stage 1 and the later stages quietly reinvent its outputs at ten times the cost — usually after an incident makes it unavoidable.

The most common pattern we see on engagements where Stage 1 was skipped: the program has committed capital to vendors and infrastructure that the business does not yet have the operating model to consume. Tools sit unused. Pilots ship but never reach production. Governance is bolted on after a near-miss. The fix is always the same — pause, run the readiness audit, replan from the gap report. Running Stage 1 first removes that detour.

The skip-Stage-1 tax
Across roughly thirty audit engagements, programs that skipped Stage 1 spent the equivalent of 2-4 quarters unwinding governance gaps, retrofitting evaluation harnesses, or rebuilding team enablement. Stage 1 costs one engineering-leader week. The arithmetic is straightforward.

One framing worth borrowing from architecture practice: Stage 1 is to an agentic AI program what a structural survey is to a building renovation. You do not begin renovation without one, you do not regret commissioning one, and the cost of skipping it is never visible until something fails. The readiness audit is a survey. Run it before you commit capital to the next nine stages.

02Pipeline OverviewThe ten-stage agentic AI implementation pipeline.

The ten-stage pipeline below is the operating shape we run on agentic AI engagements. Each stage has a primary deliverable, a named owner, a duration band, and a hand-off contract to the next stage. The whole pipeline typically runs over twelve to eighteen months end-to-end for a mid-market organisation; an enterprise with multiple lines of business runs them per-business-unit, with shared Stage 3, Stage 8, and Stage 9 work.

Stages are sequenced for a reason. Earlier stages produce the artefacts later stages depend on. You can compress the timeline, but you cannot reorder it without paying for the reinvention later. The capsule grid below shows all ten stages, with Stage 1 highlighted as your current position.

Stage 01
Readiness assessment

100-point audit, gap analysis, maturity scoring, decision frameworks, stakeholder interviews. Output: severity-ranked gap report and roadmap brief. Owner: engineering or platform lead. Duration: ≈ 1 week.

You are here
Stage 02
02
Strategy roadmap

12-month rolling roadmap, quarterly OKRs, executive memo, capability-prioritisation matrix, investment phasing. Output: board-ready roadmap. Owner: program lead. Duration: ≈ 2 weeks.

Next
Stage 03
03
Data foundation

Source inventory, lineage, classification, retention, ground-truth datasets, drift monitoring. Output: data-readiness artefacts. Owner: CDO or data-engineering lead. Duration: 4-8 weeks.

Substrate
Stage 04
04
Vendor selection

RFP shape, decision-framework scoring, due diligence, contract negotiation, exit clauses. Output: signed vendor stack. Owner: procurement plus program lead. Duration: 4-6 weeks.

Procurement
Stage 05
05
Prototype

Two or three candidate workloads, evaluation harness, internal demos, go/no-go decisions. Output: prototype evidence pack. Owner: engineering. Duration: 4-8 weeks per workload.

Evidence-gathering
Stage 06
06
Production deploy

Canary rollouts, rollback runbooks, on-call setup, SLOs, observability stack live. Output: production-grade agent in one surface. Owner: SRE plus engineering. Duration: 6-12 weeks.

Production floor
Stage 07
07
Team enablement

Engineering training, governance literacy, stakeholder fluency, knowledge management. Output: trained team and documented playbooks. Owner: people plus engineering management. Duration: 8-12 weeks.

Capacity-building
Stage 08
08
Governance

AI use policy, risk register, model-approval workflow, incident-response runbooks, drills. Output: defensible control surface. Owner: legal plus risk plus AI governance. Duration: 8-16 weeks.

Control surface
Stage 09
09
Scale

Multi-surface rollout, cost discipline, charge-back model, capacity planning, reliability hardening. Output: program at scale across business units. Owner: program lead. Duration: 2-4 quarters.

Roll-out
Stage 10
10
Continuous improvement

Quarterly re-audit, evaluation cadence, drift detection, regression suites, external audit annually. Output: sustained Optimised-stage program. Owner: program lead. Duration: ongoing.

Steady state

Each stage's hand-off contract is explicit. Stage 1 produces the artefacts Stage 2 needs to build the roadmap: the gap report, the maturity score per domain, the decision-framework outputs, and the stakeholder-interview synthesis. The Stage 2 strategy roadmap turns those into a sequenced investment plan and quarterly OKRs — which is then the input to Stage 3 data foundation work and Stage 4 vendor selection. Skip the hand-off and Stage 2 starts on a blank page.

03100-Point AuditFive domains, twenty checks each.

The audit template is reproduced below as a working rubric. Five domains of twenty binary-pass checks, with severity weights (critical / high / medium). Each check should map to an observable artefact — a config file, a metric, a runbook, a signed policy. If the artefact does not exist, the point fails. Aspirational programs score zero on the points they are aspiring towards.

The full per-domain decomposition of every check is covered in the agent-stack 100-point readiness checklist companion post. The shape below is the working template you copy into your audit document.

AGENTIC AI READINESS AUDIT · 100-point rubric
Severity: C = critical (weight 3), H = high (weight 2), M = medium (weight 1)
Score each line as PASS (1) or FAIL (0). Sum and weight separately.

DOMAIN 01 — INFRASTRUCTURE (20 points)
  Sub-area: LLM access & routing (5)
    [ ] 01  Multi-provider LLM access (≥ 2 vendors)              [H]
    [ ] 02  Explicit model-version pinning, no 'latest' aliases   [C]
    [ ] 03  Automatic fallback routing on error/timeout           [H]
    [ ] 04  Cost-aware and latency-aware routing policy           [M]
    [ ] 05  API key rotation + per-environment isolation          [C]
  Sub-area: Retrieval & vectors (5)
    [ ] 06  Production-grade vector store provisioned             [H]
    [ ] 07  Embedding pipeline owned, versioned, re-runnable      [H]
    [ ] 08  Chunking strategy documented and tunable              [M]
    [ ] 09  Hybrid retrieval (vector + keyword/BM25)              [M]
    [ ] 10  Retrieval-quality eval harness with baselines         [H]
  Sub-area: Agent runtime & tooling (5)
    [ ] 11  Centralised tool registry with versioning             [H]
    [ ] 12  Sandboxed execution for code/shell-running tools      [C]
    [ ] 13  Function-calling schema lint across tools             [M]
    [ ] 14  MCP or equivalent transport for tool composition      [M]
    [ ] 15  Per-tool allow-list + least-privilege permissions     [H]
  Sub-area: Observability & cost (5)
    [ ] 16  Per-call tracing (prompts, outputs, latency, tokens)  [H]
    [ ] 17  Token-spend dashboards by team/feature/model          [H]
    [ ] 18  Budget-burn alerts tied to monthly targets            [M]
    [ ] 19  Structured eval logs queryable by prompt version      [M]
    [ ] 20  p95 latency tracked per surface with SLOs             [M]

DOMAIN 02 — GOVERNANCE (20 points)
  Sub-area: Policy & standards (5)
    [ ] 21  Written AI use policy, signed and current             [H]
    [ ] 22  Model approval workflow with sign-off authority       [C]
    [ ] 23  Prohibited-use list and acceptable-data register      [C]
    [ ] 24  Third-party vendor due-diligence template             [H]
    [ ] 25  Policy refresh cadence (≥ annual)                     [M]
  Sub-area: Risk & compliance (5)
    [ ] 26  AI risk register with named owners                    [C]
    [ ] 27  Regulatory mapping (EU AI Act + sector-specific)      [H]
    [ ] 28  Data-protection impact assessments per surface        [H]
    [ ] 29  Model-card discipline on production models            [M]
    [ ] 30  Fairness/bias evaluation cadence                      [H]
  Sub-area: Incident response (5)
    [ ] 31  Prompt-injection incident runbook                     [C]
    [ ] 32  Data-leak-via-model-output runbook                    [C]
    [ ] 33  Escalation paths to legal and communications          [H]
    [ ] 34  Postmortem template + retention                       [M]
    [ ] 35  Incident-response drill ≥ biannual                    [C]
  Sub-area: Audit & reporting (5)
    [ ] 36  Decision log for production model changes             [H]
    [ ] 37  Sign-off authority documented per risk tier           [H]
    [ ] 38  Board / steering-committee reporting cadence          [M]
    [ ] 39  Quarterly internal audit                              [M]
    [ ] 40  External review every 12-18 months                    [M]

DOMAIN 03 — DATA (20 points)
  Sub-area: Source inventory & classification (5)
    [ ] 41  Every in-scope data source registered with owner      [H]
    [ ] 42  Classification (public / internal / restricted)       [H]
    [ ] 43  Retention policy documented per source                [M]
    [ ] 44  Consent / provenance recorded                         [C]
    [ ] 45  Deprecated sources removed from live indexes          [H]
  Sub-area: Lineage & provenance (5)
    [ ] 46  Training-data lineage traceable to source             [C]
    [ ] 47  Synthetic-generation provenance documented            [H]
    [ ] 48  License compliance per source                         [C]
    [ ] 49  Retrieval-index lineage to embeddings + sources       [H]
    [ ] 50  Index rebuild reproducible from source                [M]
  Sub-area: Quality & evaluation (5)
    [ ] 51  Ground-truth datasets versioned per surface           [H]
    [ ] 52  Evaluation cadence on each dataset                    [H]
    [ ] 53  Accepted error thresholds documented                  [M]
    [ ] 54  Drift monitoring + regression alerts                  [H]
    [ ] 55  Eval harness output queryable by version              [M]
  Sub-area: Privacy & minimisation (5)
    [ ] 56  PII handling policy enforced in pipelines             [C]
    [ ] 57  Redaction at retrieval boundary                       [H]
    [ ] 58  Encryption in transit and at rest                     [C]
    [ ] 59  Deletion paths tested per source                      [H]
    [ ] 60  Data-subject-rights workflow                          [H]

DOMAIN 04 — OPERATIONS (20 points)
  Sub-area: Deployment & rollback (5)
    [ ] 61  Canary or equivalent rollout for model changes        [H]
    [ ] 62  Tested rollback within 5 minutes                      [C]
    [ ] 63  Blast-radius controls (rate / scope)                  [H]
    [ ] 64  Feature flags on every agent surface                  [M]
    [ ] 65  Deploy log retention                                  [M]
  Sub-area: On-call & SLOs (5)
    [ ] 66  Named on-call rotation                                [H]
    [ ] 67  Paging thresholds documented                          [M]
    [ ] 68  Error-budget policy                                   [M]
    [ ] 69  p95 latency SLOs per surface                          [M]
    [ ] 70  Weekly operational review                             [M]
  Sub-area: Evaluation & regression (5)
    [ ] 71  Regression suite on every model/prompt change         [H]
    [ ] 72  Eval gating in CI                                     [H]
    [ ] 73  Golden-prompt set maintained                          [M]
    [ ] 74  Periodic full-suite re-runs                           [M]
    [ ] 75  Red-team cadence (≥ quarterly)                        [H]
  Sub-area: Cost & capacity (5)
    [ ] 76  Per-team budget dashboards                            [M]
    [ ] 77  Per-feature cost attribution                          [M]
    [ ] 78  Quarterly capacity review                             [M]
    [ ] 79  Escalation when projected spend > budget              [H]
    [ ] 80  Charge-back model where applicable                    [M]

DOMAIN 05 — SKILLS (20 points)
  Sub-area: Engineering enablement (5)
    [ ] 81  Prompt engineering training completed                 [M]
    [ ] 82  Evaluation training completed                         [H]
    [ ] 83  Retrieval design training completed                   [H]
    [ ] 84  Tool authoring training completed                     [M]
    [ ] 85  Observability literacy training                       [M]
  Sub-area: Governance literacy (5)
    [ ] 86  Risk-awareness briefing completed                     [H]
    [ ] 87  Prohibited-use familiarity                            [H]
    [ ] 88  Incident-response role understanding                  [C]
    [ ] 89  Policy refresh attended                               [M]
    [ ] 90  Leadership briefing cadence                           [M]
  Sub-area: Business stakeholder fluency (5)
    [ ] 91  Product leaders can describe capability + limits      [H]
    [ ] 92  Finance leaders understand cost shape                 [H]
    [ ] 93  Operating-unit leaders briefed on constraints         [M]
    [ ] 94  Customer-facing teams trained on disclosure           [M]
    [ ] 95  Executive sponsor active and current                  [C]
  Sub-area: Continuity & depth (5)
    [ ] 96  Documented succession on key roles                    [H]
    [ ] 97  Contractor-to-employee ratio inside targets           [M]
    [ ] 98  Internal wiki current                                 [M]
    [ ] 99  Knowledge-management discipline                       [M]
    [ ] 100 Exit-interview review for AI-program roles            [M]

SCORING
  Raw score    : sum of PASS / 100
  Weighted     : Σ (pass × severity_weight) / 200
  Domain split : score per 20-point domain
  Maturity     : assign per domain (Ad-hoc / Reactive / Proactive / Optimised)
How to use the rubric
Score each point against an observable artefact — a config file, a runbook, a dashboard, a signed policy. If the artefact cannot be produced in 30 seconds, the point fails. The discipline is the value: it removes the "we're working on it" answer from the audit conversation.

04Gap AnalysisScore, gap, target — the matrix.

The gap-analysis matrix turns the raw audit output into a decision artefact. For each of the five domains, record the current score (out of 20), the 12-month target score, the gap (target minus current), and the priority severity profile of the failed points. The matrix is what the steering committee actually reads — the 100-point rubric is the working evidence behind it.

The horizontal-bars view below shows a representative gap profile from a Q1 2026 audit on a mid-market B2B SaaS company. Infrastructure is in good shape; governance and data are under-invested; ops and skills sit in the middle band. The shape is typical — engineering moves faster than control functions, and the audit makes the asymmetry visible.

Gap analysis · domain-by-domain score vs target

Source: Digital Applied Stage 1 readiness audit · representative composite from Q1 2026 engagement.
InfrastructureCurrent 18/20 · target 19/20 · gap 1 · 0 critical fails
18/20
GovernanceCurrent 9/20 · target 16/20 · gap 7 · 3 critical fails
9/20
DataCurrent 11/20 · target 17/20 · gap 6 · 2 critical fails
11/20
OperationsCurrent 14/20 · target 17/20 · gap 3 · 1 critical fail
14/20
SkillsCurrent 12/20 · target 16/20 · gap 4 · 1 critical fail
12/20

Reading the matrix: the highest gap is the priority, modulated by the count of critical-severity fails in that domain. Governance with three criticals and a seven-point gap is the obvious first investment; data with two criticals and a six-point gap is the second. Operations and skills sit behind. Infrastructure, despite being the most expensive line item in the budget, is already in good shape and does not need more money — it needs the artefacts that prove the existing capability is operable.

The matrix also forces an explicit conversation about targets. Twenty out of twenty in every domain is rarely the right target — diminishing-returns kick in past sixteen or seventeen, and the marginal cost of the last few points is usually best deferred to Stage 10 continuous improvement. The audit lets the steering committee pick the target on evidence rather than ambition.

"The gap matrix is what removes the order-of-operations argument from the steering committee. Once the criticals and the gaps are on one page, the conversation moves to resourcing and timeline."— Digital Applied audit kit · field engagement note

05Maturity ModelAd-hoc → reactive → proactive → optimised.

Each domain is also assigned a maturity stage. Four stages, deliberately simple — more stages produce more granular labels and less actual consensus on what stage a program is in. The point of a maturity model is not precision; it is shared language between engineering, governance, and the executive layer. Use the score bands below as the assignment rule and keep the stage names verbatim across audits so the longitudinal view stays consistent.

Stage 1
Ad-hoc

Pilots exist, no consistent policy, no central inventory, no production observability. Capability lives in 2-3 engineers. Typical raw score per domain: under 8/20. Action: pause scaling; build the foundation.

0-7 per domain
Stage 2
Reactive

Most infrastructure in place, governance is partial, incidents drive policy updates rather than the other way around. Capability is spreading beyond the founders. Typical raw score: 8-12/20. Action: shift to proactive on the criticals.

8-12 per domain
Stage 3
Proactive

All domains covered by named owners, regular cadences for evaluation and review, governance artefacts in place, drills exercised. Capability scaled across the engineering org. Typical raw score: 13-16/20. Action: invest in optimisation and cost discipline.

13-16 per domain
Stage 4
Optimised

Continuous evaluation, automated regression and cost guardrails, governance pre-staged for regulation, skills depth across multiple teams, audit cadence internal-then-external. Typical raw score: 17+/20. Action: maintain, re-audit quarterly.

17-20 per domain

Stage assignment is per-domain, not global. A program can be Proactive on infrastructure and Reactive on governance, and the remediation roadmap should reflect that asymmetry. The global maturity stage, when reported, should be the minimum across domains — a program is only as mature as its weakest control surface, because that is where the next incident will find it.

One forward projection. Regulatory pressure on agentic AI is increasing across jurisdictions, and the artefacts a Proactive-or-Optimised governance domain produces (model cards, risk register, decision logs, drill records) are almost exactly what emerging regulations expect to see. Reaching Proactive on governance now is not just risk mitigation — it is regulatory pre-staging that buys runway.

06Decision FrameworksBuild-vs-buy, now-vs-later, scope-vs-depth.

Three decision frameworks come out of Stage 1. They are not meant to make the decision for you — they make the trade-offs explicit so the steering committee can decide on evidence. Each framework maps to a set of questions in the stakeholder interview (next section) and to specific points in the audit rubric.

Framework 01
Build vs buy
Per-capability decision

Score each candidate capability on five axes: differentiation, defensibility, vendor lock-in cost, internal skill depth, and time-to-value. Build wins when ≥ 3 axes favour custom and the skill depth exists. Buy wins when 2 or fewer axes favour custom or skill depth is shallow.

Per-capability
Framework 02
Now vs later
Quarterly horizon

For each candidate workload, score: business value (1-5), readiness gap to ship (1-5), and reversibility (1-5). Now = high value, low gap, high reversibility. Later = high value, high gap. Never = low value regardless of gap. Reversibility matters because agentic surfaces are easy to launch and hard to retire gracefully.

Workload-level
Framework 03
Scope vs depth
Portfolio shape

Decide whether the program covers many surfaces shallowly or one or two surfaces deeply. Shallow-and-wide is appropriate when learning is the goal and the cost of low-quality surfaces is bounded. Deep-and-narrow is appropriate when reputation cost is high or when a single surface drives the bulk of business value.

Portfolio strategy

Worked examples from a recent engagement: the build-vs-buy decision on a customer-facing copilot surfaced that vendor lock-in cost was the deciding axis — the company chose to build a thin orchestration layer over multiple model providers rather than commit to a single vendor's agent framework. The now-vs-later decision deferred two low-reversibility surfaces (a public-facing chat and an embedded sales-call summariser) until governance reached Proactive; both were high value but high reputation cost on failure. The scope-vs-depth decision concentrated the first two quarters on a single internal copilot to build operational muscle before going wide.

For a deeper view on the customer-experience tradeoffs that feed the now-vs-later framework, the AI transformation service page covers how we run these frameworks as workshops with cross-functional leadership. The frameworks work as self-service if the program has a strong internal facilitator; they work better as facilitated workshops if the room contains strong opinions and no agreed referee.

07InterviewsThe interview guide with 25 questions.

The audit rubric covers the artefacts. The interview guide covers the operating model around those artefacts — who owns what, what is rewarded, where the program is politically supported and where it is contested. Twenty-five questions across five stakeholder groups, 30-45 minutes per interview, conducted by the auditor with no other audience. The synthesis becomes the qualitative half of the gap report.

The template below is the working interview guide. Five questions per group: executive sponsor, engineering lead, governance / legal / risk lead, business-unit operator, and front-line user. Each question is open-ended; the auditor records direct quotes verbatim where possible. The closing question in every interview is the same: "what is the one thing the program would do differently if it were starting over?"

STAKEHOLDER INTERVIEW GUIDE · 25 questions across 5 groups
Conducted 30-45 min, auditor + interviewee only, notes verbatim where possible.

GROUP 01 — EXECUTIVE SPONSOR (5 questions)
  E1  In one sentence, what business outcome should the agentic AI program
      deliver in 12 months?
  E2  What would convince you to pause or kill the program?
  E3  Where does this program sit in the company's top three priorities,
      and how is that communicated to the org?
  E4  Which two leaders most need to be aligned for this to succeed, and
      are they?
  E5  What is the one thing the program would do differently if it were
      starting over?

GROUP 02 — ENGINEERING LEAD (5 questions)
  T1  Walk me through the deployment path for a model or prompt change
      from commit to production. Where are the manual steps?
  T2  How do you know when a production agent is misbehaving?
  T3  Which infrastructure gap costs the team the most time week-to-week?
  T4  What evaluation runs on a model change, and what runs only at
      release? Where is the gap?
  T5  What is the one thing the program would do differently if it were
      starting over?

GROUP 03 — GOVERNANCE / LEGAL / RISK LEAD (5 questions)
  G1  Walk me through how a new agentic workload gets approved for
      production. Who signs off, on what evidence?
  G2  When was the last incident-response drill, and what did it find?
  G3  Which regulatory regime worries you most in the next 12 months,
      and what artefacts are missing for it?
  G4  Where is policy ahead of practice, and where is practice ahead of
      policy?
  G5  What is the one thing the program would do differently if it were
      starting over?

GROUP 04 — BUSINESS-UNIT OPERATOR (5 questions)
  B1  Describe the agentic capability your team consumes today, in your
      own words. What does it do well, what does it fail at?
  B2  Where does the capability replace work, and where does it create
      new work?
  B3  How do you escalate when the capability gets something wrong, and
      who fixes it?
  B4  What capability would change your team's work most if it shipped
      next quarter?
  B5  What is the one thing the program would do differently if it were
      starting over?

GROUP 05 — FRONT-LINE USER (5 questions)
  U1  In a typical day, when do you actually use the agentic capability?
  U2  When do you not use it, and why?
  U3  What is the most useful thing it has done for you this month?
  U4  What is the most frustrating thing it has done for you this month?
  U5  What is the one thing the program would do differently if it were
      starting over?

SYNTHESIS
  Cluster answers by recurring theme across groups.
  Cross-reference themes against the audit rubric — note where qualitative
  signal contradicts the artefact-based score (this is high-value evidence).
  Quote sparingly in the report; anonymise unless the interviewee approves.
Why the same closing question for every group
The fifth question is identical across all five groups by design. Recurring answers across groups are the strongest signal Stage 1 produces — when the executive sponsor, the engineering lead, and the front-line user all name the same regret, the gap report has its headline finding.

08Next StageHand-off to strategy roadmap (Stage 2).

The hand-off contract from Stage 1 to Stage 2 is explicit. Stage 2 picks up five artefacts and turns them into a board-ready 12-month roadmap, quarterly OKRs, an executive memo, a capability-prioritisation matrix, and an investment phasing template. None of those Stage 2 deliverables can be built on a blank page — they all assume the Stage 1 outputs below.

Hand-off 01
Severity-ranked gap report

List of all failed audit points, sorted by severity weight then by domain. Stage 2 uses this to prioritise the 90-day, 6-month, and 12-month horizons in the roadmap. Without it, Stage 2 prioritises on opinion.

Roadmap input
Hand-off 02
Maturity score per domain

Four-stage assignment per domain (ad-hoc / reactive / proactive / optimised). Stage 2 uses this to set the 12-month target stage per domain and to communicate program shape to the board in language they recognise.

Target-setting input
Hand-off 03
Decision-framework outputs

Build-vs-buy, now-vs-later, scope-vs-depth conclusions per candidate workload. Stage 2 uses these to populate the capability-prioritisation matrix and the investment phasing template.

Prioritisation input
Hand-off 04
Interview synthesis

Cross-group themes, recurring regrets, qualitative contradictions to the rubric score. Stage 2 uses this to brief the executive sponsor and to surface the operating-model risks that the artefact-based score cannot detect.

Narrative input
Hand-off 05
Roadmap brief

One-page summary: current maturity per domain, target maturity per domain, top 5 criticals to close inside 90 days, recommended investment shape. The brief is the input the Stage 2 facilitator works from on day one.

Stage 2 starter

For the full Stage 2 playbook — the roadmap template, the OKR framework, the executive memo, the capability matrix, the investment phasing — continue to Stage 2 — Strategy Roadmap Templates. Stage 2 typically runs over two weeks of focused work and produces the artefacts the board signs off on before any Stage 3 data-foundation capital is committed.

The cadence we recommend across the pipeline: Stage 1 in week 1, Stage 2 in weeks 2-3, Stage 3 starting week 4 with board approval secured. That compresses the time from program-decision to capital-commitment to roughly a month, which is fast enough to maintain executive momentum and slow enough to produce the artefacts that make the program defensible. Programs that try to compress further usually skip Stage 1, and we have already covered the cost of that.

Conclusion

Stage 1 is the cheapest stage of the pipeline — and the one programs regret skipping.

A readiness assessment costs one engineering-leader week. Skipping it costs 2-4 quarters of retrofitting governance, rebuilding team enablement, and unwinding vendor commitments that the operating model could not consume. Across every engagement we have run, that arithmetic has held — and the programs that ran Stage 1 first are the ones still on plan twelve months later.

The practical next step is to run Stage 1 internally first. Block six hours, work through the 100-point rubric against the artefacts in your repo, your wiki, and your control plane, score each domain, draft the gap matrix, run the five stakeholder interviews, and write a 1-page roadmap brief. Then hand the brief to Stage 2 and start the strategy roadmap. The pipeline begins here; everything downstream is easier when it does.

Start the pipeline right

The ten-stage pipeline starts with readiness — most programs skip it and regret it.

Our team runs Stage 1 readiness assessments — 100-point audit, gap analysis, maturity scoring, decision frameworks — and hands off the strategy roadmap brief to Stage 2.

Free consultationExpert guidanceTailored solutions
What we deliver

Stage 1 readiness engagements

  • 100-point readiness audit
  • Stakeholder interview program
  • Gap-analysis matrix and remediation roadmap
  • Decision-framework workshops
  • Hand-off package to Stage 2 (strategy roadmap)
FAQ · Stage 1 readiness

The questions teams ask before committing capital.

Start internal. A single platform or engineering leader can run the full 100-point rubric in roughly six focused hours, plus the five stakeholder interviews across two weeks of calendar time. The internal pass is usually enough to surface the four to six gaps that drive the next quarter's roadmap. External pass becomes worthwhile once the internal gaps are closed and the program is targeting the Proactive band — an external auditor brings benchmark data across peer engagements, more rigorous evidence-gathering for governance, and the independence that some boards or regulators expect. The most common pattern is internal quarterly, external annually, with the external pass timed to the budget cycle so findings can be funded the same quarter they surface.