A Fortune 500 industrial-and-services group stood up a cross-functional AI program across six functions in eighteen months, anchored by a shared platform that owned model routing, observability, governance, and the sandbox, and operated through function-specific implementation teams that delivered use cases inside marketing, operations, finance, HR, customer service, and risk. The program cleared four stage gates, sustained a quarterly executive review, and produced outcomes that aggregated across functions into a single board narrative.
The board mandate that started the program was specific. Pick six functions, fund a shared platform, run the work through stage gates that bound spend before scale, and produce a quarterly review that an audit committee can defend. The mandate was deliberately operational rather than aspirational — no five-year transformation strategy, no slide deck of moon-shot use cases, just a phased plan with named owners and a cadence that the executive layer could hold without bespoke air cover.
This case study walks the shape that produced the outcomes — the shared-platform composition, the function-team RACI, the four stage gates, the quarterly review structure, the measured outcomes across the six functions, and the lessons that replicate down-market into mid-market and large-mid-market programs. The company is unnamed at the company's request; the numbers and shape have been reviewed for accuracy by the program lead and the executive sponsor. The audience is the executive sponsor or program lead designing a comparable cross-functional rollout, not the analyst evaluating a vendor pitch.
- 01Shared platform beats per-function stacks.A single platform team owning model routing, observability, governance, and the sandbox replaces six parallel function-stacks that would each have re-bought the same capabilities and produced six different audit surfaces. The shared platform is the lever that makes the program defensible at the board layer.
- 02Function-specific teams own implementation.The shared platform does not deliver use cases — the function teams do. Each function names a lead, an engineering pair, and a business owner; the platform provides shared infrastructure and the function team provides domain context. Centralising delivery under the platform team is the failure mode the case study most warns against.
- 03Stage gates prevent runaway spend.Four gates — discovery, pilot, scale, embed — each with explicit pass criteria, named approvers, and a budget envelope, prevent the standard anti-pattern where a promising pilot absorbs ten times its original budget before anyone re-evaluates. The gates also slow down failing pilots before they consume resources better spent elsewhere.
- 04Quarterly reviews secure board confidence.A half-day quarterly executive cadence with a standing agenda — outcome aggregation, gate-progress audit, platform health, governance fitness, next-quarter plan — converts the program from a project the board has to chase into a rhythm the board can rely on. The review is what sustains funding past the first leadership transition.
- 05Outcomes aggregate across functions.Function-level wins are real but not board-legible on their own. The program design aggregates outcomes through a shared framework — cycle-time reduction, deflection rate, cost-to-serve, capacity reclaimed — that lets the executive layer compare a marketing use case against a finance use case against an HR use case using comparable metrics, even when the raw activity looks different.
01 — SituationA board mandate, six functions, and the failure modes already on the table.
The company entered the program with the shape most large enterprises share by 2026 — pockets of AI work inside individual functions, no shared infrastructure across them, three different vendor relationships that the procurement team could enumerate and two more it could not, an internal audit finding from the prior quarter that surfaced a data-handling exposure inside a marketing pilot, and an executive team that had agreed in principle that AI mattered but disagreed in practice about who owned what. The board mandate that started the program was the response to that finding — fund a real program, six functions, eighteen months, gated.
Three failure modes were already visible from the prior state. The first was tool sprawl — each function had selected its own vendor, sometimes its own model, often its own evaluation approach. The procurement team counted seventeen distinct AI spend lines across the six functions; the platform team that stood up later was able to consolidate that to a managed roster of four routed models plus two specialty vendors, with the rest either retired or routed through the platform sandbox.
The second was governance theatre. The company had a written AI policy that nobody referenced in real decisions, an ethics statement that lived on the intranet, and a privacy review process that operated only for projects that someone proactively routed to it. The audit finding that started the program had surfaced precisely because the governance documents existed but did not bite — a marketing pilot had used customer data in a model evaluation without routing the use case through privacy review, and the postmortem cited the policy as documented but unenforced.
The third was outcome incoherence. Each function reported its AI outcomes in the units that function happened to favour — marketing in lift, operations in cycle time, finance in cost recovery, HR in time-to-fill. The board could not aggregate the outcomes into a single picture; the executive layer could not tell whether the spend was producing value commensurate with the risk. The case study's outcome framework — five comparable metrics across all six functions — exists because of this specific failure mode.
The program shape that emerged from the mandate had two halves that the case study returns to repeatedly. The shared platform owned the capabilities every function needed once — model routing, observability, governance, the sandbox — so that no function had to re-buy them. The function-specific implementation teams owned the use cases that depended on domain context — the marketing knowledge of which campaigns matter, the operations knowledge of which workflows are bottlenecks, the finance knowledge of which controls cannot be bypassed. The shape generalises down-market; the specifics in the rest of this case study explain why.
02 — Approach · Shared PlatformOne platform team, four shared capabilities.
The shared platform was the program's most consequential structural decision. Rather than letting each function build its own routing layer, its own evaluation harness, its own data governance, its own development sandbox, the program funded a single platform team that owned all four capabilities and published them as services the function teams consumed. The platform team was small — six engineers, one product lead, one security partner — and remained small throughout the eighteen months. Scaling the platform team was an explicit non-goal; the platform was infrastructure, not headcount.
The four capabilities below were the platform's scope from the charter. Each had a named lead inside the platform team, a published service definition, and a consumption pattern that the function teams could rely on without bespoke negotiation. The platform did not deliver use cases; it delivered the capabilities that made use case delivery cheaper, safer, and faster for the function teams.
Model routing
Owner: Platform engineering leadManaged roster of four routed models (one premium reasoning, one general-purpose, one cost-efficient, one specialty) plus two narrow specialty vendors. Function teams call a single platform endpoint; the platform team owns model selection, version pinning, fallback routing, and cost allocation back to the calling function. Replaces seventeen distinct spend lines with one routed surface.
Foundation capabilityObservability
Owner: Platform engineeringUnified tracing, eval pass-rate tracking, latency p95, cost-per-request, error-class distribution, and prompt-injection detection across every function-team call. The dashboard is the standing artefact the weekly engineering health cadence walks; the same data feeds the quarterly executive review without re-instrumentation.
Operations capabilityGovernance
Owner: Platform security partnerEmbedded governance — data-classification routing, automated PII detection on prompt and response paths, audit log capture, retention policy enforcement, vendor data-processing agreements managed centrally. Function teams inherit governance by calling the platform endpoint; routing around the platform is the only way to break governance, which the audit log surfaces.
Compliance capabilityDevelopment sandbox
Owner: Platform product leadSelf-serve environment for function teams to prototype use cases against synthetic or de-identified production data, with eval harnesses pre-wired, deployment templates pre-built, and stage-gate evidence captured automatically. Reduces the path from idea to gate-one evidence from quarters to weeks.
Velocity capabilityThe platform's economic case rested on a simple counterfactual. If each of the six functions had built its own routing layer, its own observability, its own governance, and its own sandbox, the company would have funded six parallel platform efforts at a total cost the program estimated at roughly five times the shared platform's actual cost — and produced six different audit surfaces, six different vendor relationships per capability, and six different data-governance interpretations. The shared platform was cheaper, safer, and faster than any plausible decentralised alternative.
The platform team's discipline on what it did not do was as important as its scope. The platform did not write the marketing team's segmentation prompts. It did not own the operations team's ticket-routing workflow. It did not adjudicate the finance team's control choices. Centralising delivery inside the platform team is the failure mode the program lead names most often when other companies ask for replication advice — once the platform starts delivering use cases for the function teams, the platform becomes the bottleneck, the function teams stop owning their outcomes, and the program loses the distributed-ownership property that makes it scale.
"The platform is infrastructure, not headcount. The moment the platform starts delivering use cases, the platform becomes the bottleneck and the function teams stop owning their outcomes."— Program lead, month-12 review
03 — Approach · Function TeamsSix function teams, local ownership, shared backbone.
The function-specific implementation teams were the program's second structural half — the half that produced the use cases that produced the outcomes that the quarterly review aggregated. Each of the six functions named a function-AI lead, an engineering pair seconded from the function's existing technical capacity, and a business owner who reported into the function's leadership. The function team owned the use case end to end — discovery, evaluation, pilot operation, scale decision, embed into business-as-usual.
The shared backbone made the function teams smaller than they would otherwise have been. A typical function team across the six was four to six people total — the lead, the engineering pair, the business owner, and one or two domain partners on partial allocation. Without the shared platform, each function team would have been ten to fifteen people because each would have absorbed its own routing, observability, governance, and sandbox responsibilities. The platform's leverage came from reducing the per-function headcount cost, not from removing the function teams themselves.
The choice matrix below summarises how the program drew the line between the platform and the function teams across the four decision categories where the line was most contested. Each row states the contested decision, the chosen owner, and the reasoning that the program lead defended in monthly committee meetings when functions periodically pushed for the alternative.
Use-case selection
The function team selects which use cases to pursue. The platform team does not adjudicate marketing's priorities versus operations'. The committee receives the function-team backlog at the quarterly review and reviews it against the company's strategic priorities, but does not select use cases on the function's behalf.
Function team decidesModel selection
The platform team decides which models live on the managed roster. The function team picks from the roster. Function-team requests to add a model to the roster route through the platform team's evaluation harness, which produces evidence the committee can review at the next monthly meeting.
Platform team curates rosterEvaluation design
The function team designs the use-case-specific evaluation — task pass-rate, customer-impact metric, business-outcome attribution. The platform team provides the eval harness, the data infrastructure, and the safety, bias, and prompt-injection evals that run on every use case regardless of function.
Shared design, function ownershipProduction deployment
The function team owns the deployment decision once the stage gates clear. The platform team owns the rollback authority and the production-traffic ramp. Splitting these two responsibilities prevents the function team from rolling out faster than the eval evidence supports while keeping the deployment decision close to the business owner.
Function deploys, platform rampsThe most contested decision in the first six months was use-case selection. Two of the six functions initially pushed for a centralised use-case selection process where the platform team would adjudicate which use cases met the bar for funded engineering support. The program lead refused on the same reasoning as the platform's no-use-case-delivery rule — centralising selection would make the platform team responsible for outcomes it could not own, and would convert the platform from infrastructure into a gatekeeper. The function teams retain selection; the committee reviews backlogs quarterly; the program has held the line.
The function-team composition also evolved. By month nine, two of the six functions had absorbed a dedicated AI product manager into the function team because the function's use-case volume justified the role. Two others remained with the original four-person team because their use-case volume did not. The program design did not prescribe a standard function-team size; it prescribed a standard function-team shape, and let the size scale with the function's actual demand.
04 — Approach · Stage GatesFour gates that bound spend before they bound work.
The four stage gates were the program's spend-control mechanism. Every use case across the six functions cleared the same four gates in sequence — discovery, pilot, scale, embed — with explicit pass criteria, named approvers, and a budget envelope per gate that capped spend before scale. The gates also slowed down failing pilots before they consumed resources better spent elsewhere; the program retired roughly one-third of pilots at gate two, which the program lead consistently named as one of the program's most important features.
The four gates below name the milestone, the owner, the budget envelope as a percentage of total program spend, and the pass criteria the committee evaluated. The percentages are the company's actual averages across the eighteen-month program; individual use cases varied around the means within ranges the committee documented at each gate.
Discovery → Pilot
Budget: ~5% of total · Approver: Function leadUse case is named, scoped, and evaluated against the outcome framework. Discovery deliverable is a one-page brief — business problem, outcome metric, baseline measurement, hypothesis, evaluation design. Approval to enter pilot does not commit production resources; it commits sandbox time and platform-team support.
Cheap to clearPilot → Scale
Budget: ~20% · Approver: Platform + function leadPilot ran against eval harness, produced measurable outcome against baseline, cleared safety, bias, and prompt-injection evals, and the business owner signed off on the customer-impact assessment. About one-third of pilots are retired at this gate. The gate's purpose is to kill promising-but-flat pilots before they absorb scale-stage resources.
Highest-leverage gateScale → Embed
Budget: ~50% · Approver: CommitteeUse case scaled to production traffic — typically a 5/25/100 percent canary ramp over four to six weeks, with explicit rollback triggers tied to eval pass rate, latency p95, cost per request, and incident attribution. The committee reviews scale evidence at the next monthly meeting; embed approval is the committee's decision rather than the function's.
Committee gateEmbed → Business-as-usual
Budget: ~25% · Approver: Function business ownerUse case becomes part of the function's standing operations. Engineering responsibility transfers from the function-AI lead to the function's standing engineering capacity. The platform team retains observability and governance; the function-AI lead is free to pick up the next use case in the backlog. The transition is what makes the program scalable.
Closing gateGate two — pilot to scale — was the program's most consequential spend-control point. The roughly one-third of pilots retired at this gate represented work that would have absorbed scale-stage resources without producing outcomes proportional to the spend. The gate's pass criteria were strict by design — measurable outcome against a documented baseline, eval evidence across safety and bias dimensions, and an explicit customer-impact assessment from the business owner. The program lead defended the strictness in every monthly review against pressure to soften the criteria for promising-but-flat pilots.
The gate four transition to business-as-usual was the program's scalability lever. Use cases that completed embed transferred engineering responsibility from the function-AI lead — a finite resource — to the function's standing engineering capacity, which freed the function-AI lead to pick up the next use case. Without the embed transition, the function-AI lead would have accumulated maintenance load across every use case the function shipped, and the function's capacity to deliver new use cases would have collapsed by month twelve.
05 — Approach · Quarterly ReviewsA half-day, six items, board-legible output.
The quarterly executive review was the cadence that converted the program from a project the board had to chase into a rhythm the board could rely on. The review was a half-day standing meeting, chaired by the executive sponsor, attended by the program lead, the platform product lead, all six function-AI leads, and the governance partner. The output was a quarterly governance review document that the audit committee received without re-instrumentation, with a standing structure that made quarter-over-quarter comparison straightforward.
The standing agenda had six items in a fixed order. The order mattered — the review opened with outcome aggregation so the executive layer entered the rest of the agenda with the value-side context, then moved through gate audit, platform health, governance fitness, function backlogs, and next-quarter commitments. The committee resisted re-ordering the agenda even when individual quarters had specific issues that would have been faster to address out of order; the order was the artefact, not just the content.
Outcome aggregation
Owner: Program lead · 60 minutesAll six functions report against the shared outcome framework — cycle-time reduction, deflection rate, cost-to-serve, capacity reclaimed, customer-impact score. Aggregated into a single dashboard that the audit committee receives the following month. The outcome layer opens the review because it is the value-side context the rest of the agenda needs.
Lead itemGate progress audit
Owner: Program lead · 45 minutesEvery use case in flight is reviewed against its current gate — what cleared this quarter, what is queued for next quarter, what was retired at gate two and why. The audit produces the data the committee uses to evaluate the function-team backlogs and the program's overall pace.
Audit itemPlatform health
Owner: Platform product lead · 30 minutesThe platform's observability dashboard walked at the committee level — eval pass rates across functions, latency p95, cost-per-request trends, incident attribution. The walk is shorter at the quarterly than at the weekly engineering cadence because the quarterly focuses on trends and structural issues rather than week-to-week noise.
Operations itemGovernance fitness
Owner: Governance partner · 30 minutesRegister walk, model-update queue review, incident runbook rehearsal verdict (rehearsals happen quarterly), ethics-forum decisions in window, audit-finding remediation status. The governance item is short because the platform's embedded governance does most of the work — the review checks fitness rather than reconstructs state.
Compliance itemFunction backlogs
Owner: Six function leads · 45 minutesEach function lead presents the function's next-quarter backlog — use cases queued for discovery, pilots ready for gate two, scaling work in progress, embed transitions planned. The committee reviews against the strategic priorities but does not adjudicate the function's priorities.
Backlog itemNext-quarter plan
Owner: Program lead · 30 minutesSynthesises the prior items into a single one-page next-quarter plan — gate commitments, platform investments, governance work, board-narrative themes. The plan becomes the standing document the audit committee references between quarterly reviews; the next quarterly review opens by walking the prior quarter's plan against actual.
Closing itemThe quarterly review's most under-rated feature was its durability. The program survived a CFO transition in month fourteen and a CIO transition in month seventeen, in both cases because the quarterly review had already established a rhythm and an audience that did not depend on either executive's personal attention. Programs that depend on bespoke executive sponsorship rather than a standing cadence consistently collapse at the first leadership transition; the quarterly review was the cadence that prevented that failure mode in this program.
The next-quarter plan also served as the program's commitment-tracking artefact. Each quarterly review opened by walking the prior quarter's plan against actual — what committed, what cleared, what slipped, what was retired. The walk was honest about misses; programs that hide misses lose board confidence faster than programs that name them and assign owners to address them. The program lead defended the honest-misses convention in every quarterly review.
06 — OutcomesFive comparable metrics, six functions, one board narrative.
The outcomes framework was the layer that converted function-level wins into board-legible aggregate value. Each of the six functions reported its outcomes against the same five metrics — cycle-time reduction, deflection rate, cost-to-serve, capacity reclaimed, customer-impact score — even when the raw activity inside the metric looked different. Customer service's deflection rate meant something different from finance's deflection rate, but both were the same shape of metric, comparable at the framework level if not at the unit level.
The chart below is the aggregate eighteen-month outcome across the six functions, reported in the units the program's quarterly review used. The numbers are illustrative averages drawn from the case study's final quarterly review; individual functions varied around the aggregate within ranges the program documented at the function level. The chart is the shape the audit committee received, redacted for this case study.
Aggregate outcomes · 6 functions · 18 months · illustrative averages
Source: Program quarterly governance review · final quarterThe aggregate numbers above are the shape that mattered for the audit committee, but the function-level decomposition mattered more for the program itself. Customer service led on deflection rate; operations led on cycle-time reduction; finance led on cost-to-serve; HR led on capacity reclaimed; marketing led on customer-impact score. Risk did not lead on any single metric but contributed across all five through the embedded governance and the audit-trail capabilities that the platform delivered. No function led on every metric; the framework was designed to surface comparative strength rather than rank winners.
The honest framing the program lead defended at every quarterly review was that the aggregate outcomes were not attributable entirely to the AI program. The functions had concurrent transformation efforts — process redesign, sourcing changes, tooling investments — and isolating AI's contribution from those concurrent efforts was not possible at the precision the audit committee might have wanted. The program reported AI as a contributing factor against documented baselines rather than as the sole driver of the outcomes; the audit committee accepted the framing, in part because it was honest about its limits.
"The aggregate numbers are not attributable entirely to the AI program. AI contributed against documented baselines; concurrent transformation efforts contributed alongside. Honest about the limits is how the audit committee stayed bought in."— Program lead, final quarterly review
07 — LessonsWhat replicates, what does not, and the move down-market.
The shape this case study describes — shared platform plus function-specific teams plus stage gates plus quarterly review — replicates down-market with adjustments. The platform's scope shrinks at smaller scale; the function teams collapse into fewer functions; the stage gates remain but operate with lighter artefacts; the quarterly review compresses into a monthly executive cadence. The structural property that holds across all scales is the separation between shared infrastructure and local implementation; the specifics scale up or down with the company's actual demand.
Four lessons hold across replications. Each names what the Fortune 500 program lead and the executive sponsor identified as the highest-leverage learning, and how it generalises to programs at different scales. The lessons are the shape that other companies asking for replication advice most often need; companies that try to replicate without these specific adjustments tend to reproduce the failure modes the program originally avoided.
Platform discipline is non-negotiable
The platform team does not deliver use cases. The platform team does not adjudicate function priorities. The platform team owns shared infrastructure and stops there. Companies replicating the shape that let the platform team absorb use-case delivery consistently produce the bottleneck failure mode by month twelve. Hold the line on platform discipline even when the function teams ask for help that crosses it.
Platform = infrastructure, not deliveryGate two strictness saves the program
Roughly one-third of pilots should be retired at gate two. Softening the criteria for promising-but-flat pilots is the standard anti-pattern — the program looks more productive at the quarterly review and is materially less productive at the eighteen-month mark. The gate's strictness is the program's most consequential spend-control feature; defend it explicitly.
~1/3 retirement at gate twoOutcome framework precedes outcomes
The shared five-metric framework was designed in month two, before the first pilot cleared gate one. Companies that try to design the framework after outcomes start landing produce incoherent aggregations and lose board confidence in quarter three. Design the framework first, accept that the framework will evolve, and use the framework to discipline what counts as a measurable outcome at gate two.
Design framework month twoMid-market replication
Mid-market companies (roughly $500M–$5B revenue) replicate with two adjustments — the platform team collapses to two or three engineers plus an external partner for routing and observability, and the function teams collapse to three functions rather than six. The gates and the quarterly review stay intact; the rhythm is what makes the program defensible at the board layer regardless of scale.
2 platform engineers + 3 functionsThe replication work is where the case study's real value to other companies sits. The Fortune 500's specific numbers matter less than the program shape and the four lessons that held across the eighteen-month run. Companies that try to replicate the numbers without the shape consistently under-deliver; companies that replicate the shape with their own numbers consistently produce comparable outcomes at their own scale. The shape is the replicable artefact, not the numbers.
For teams considering a comparable rollout, our AI transformation engagements include the program design, the platform-team stand-up, the function-team RACI, and the first two stage gates' worth of cadence operation — so the team inherits a working program shape rather than a slide deck it has to convert into operating artefacts while also delivering the use cases. The companion 90-day governance plan and the 100-point agent-stack readiness checklist are the artefacts the program teams in this case study referenced most often when standing up the governance and platform layers.
Cross-functional AI programs work when the platform is shared and the implementation is local.
The Fortune 500 case study describes a shape — not a vendor stack, not a strategy deck, not an organisational chart. The shape has two halves that depend on each other. The shared platform owns the capabilities every function needs once so the functions do not re-buy them; the function-specific implementation teams own the use cases that depend on domain context the platform team cannot have. Neither half works without the other. Centralised platforms that also deliver use cases produce bottlenecks; decentralised function teams without a shared platform produce tool sprawl, audit-surface fragmentation, and the seventeen-vendor procurement problem the program originally started by retiring.
The four stage gates and the quarterly review are the program rhythm that converts the shape into board-defensible outcomes. Gate two's strictness — retiring roughly one-third of pilots — is the program's most consequential spend-control feature; the quarterly review's durability is what carried the program through two C-suite transitions without losing cadence. Both depend on the program lead holding the line explicitly against the standard pressures to soften criteria and re-order the agenda. The rhythm is the artefact, not the content of any single meeting.
The replication move down-market does not require Fortune 500 scale. Mid-market companies collapse the six functions to three and the platform team to two or three engineers plus an external partner; the gates and the quarterly review remain intact and continue to do the spend-control and board-confidence work they did at the larger scale. The shape generalises; the specific numbers do not. Companies that replicate the shape and design their own numbers consistently produce comparable outcomes at their own scale.