Building an enterprise AI agent business case means navigating a landscape where vendor-commissioned ROI studies claim anywhere from 106% to 396% three-year returns — a 10× spread that tells finance teams nothing actionable. This template separates what you can defensibly model (independent RCT benchmarks, BLS wage data, standard 10% discount rate) from what belongs in an input-range discussion (Forrester TEI studies commissioned by Microsoft and Salesforce), and outputs a 3-year NPV that your CFO can audit.

The stakes are measurable. McKinsey's State of AI 2025 finds that 88% of organizations regularly use AI but only 6% report enterprise-wide impact of 5%+ EBIT contribution. The gap is not technology — it is scope and measurement. The same report estimates roughly two-thirds of organizations are stuck in “pilot purgatory”: experiments that never scale. An ROI framework that discloses its assumptions is the difference between a budget that gets approved and a pilot that gets buried.

This guide walks through the complete calculator architecture: 11 input fields with sourced defaults, 7 computed outputs with transparent formulas, a three-scenario sample table (Small / Mid / Enterprise), a sensitivity analysis identifying which inputs dominate the output, the pilot-purgatory scope-multiplier problem, and an action plan for presenting the business case to a finance committee. For context on how individual agent tools price their tokens, see our May 2026 AI agent pricing landscape comparison.

Key takeaways

01
Vendor TEI studies diverge by 10× — use them as input ranges, not defaults.Forrester TEI studies commissioned by Microsoft show 106%–314% three-year ROI for Copilot Studio and 116%–353% for M365 Copilot. Salesforce's commissioned study shows 396% ROI and <6-month payback for Agentforce. These are all vendor-commissioned, best-case composite organizations. They belong in the 'input range' column of your spreadsheet, not the default column. Using them as defaults produces fantasy-math that finance teams will immediately reject.
02
Three independent benchmarks are the only defensible defaults.The Kalliamvakou et al. GitHub Copilot RCT (55% faster on a coding task, n=95 professional developers, peer-reviewed 2022) is the most rigorous independent productivity benchmark available. The BLS Employment Cost Index (+42% fully-loaded burden over wages) is government data. The 10% discount rate matches Forrester's own TEI methodology. These three inputs should anchor your model — everything else is a sensitivity dial.
03
The scope multiplier matters more than per-task math.McKinsey 2025 shows 88% adoption but only 6% enterprise-wide impact. The math is simple: if only 5% of employees are in the rollout, your NPV is 5% of the full-deployment number. The 'scope multiplier' cell in the calculator is more decisive than whether the task speedup is 55% or 84%. Most AI ROI templates ignore scope — which is why most AI budgets fail their first-year review.
04
Anthropic's 84% time-savings figure is self-measured with disclosed limitations.Anthropic's November 2025 research paper reports 84% median time savings across 100,000 Claude.ai conversations using Claude's own pre/post time estimates. The methodology is transparent (Hulten's theorem, BLS O*NET integration), but it is Anthropic measuring Anthropic's product on narrow tasks. It is not an independent study and should not be used as a calculator default. It is a useful upper-bound sensitivity dial.
05
The three-scenario sample table is illustrative — disclose that explicitly.The Small / Mid / Enterprise sample outputs (payback 5.5/3.2/2.4 months; NPV $8.18M/$69.0M/$550M) use the calculator's conservative defaults at 100% rollout scope. A 25% scope multiplier — typical for organizations in pilot purgatory — divides every output by four. Finance committees will ask about scope. Have the answer ready before the presentation, not during it.

01 — Vendor ROI LandscapeWhat the Forrester TEI studies actually say — and what the ⚠️ flags mean.

Every major AI platform vendor has commissioned a Forrester Total Economic Impact study. These studies share a consistent methodology: a 10% annual discount rate, a three-year model horizon, a composite organization built from customer interviews, and a benefits calculation that includes productivity gains, error reduction, and risk-mitigation value. The problem is not the methodology — it is the commissioning structure.

Forrester TEI studies are commissioned and paid for by the vendor whose product is being evaluated. Forrester publishes them on its TEI microsite under the vendor's branding. The composite organization is constructed from interviews with the vendor's own customers — who are self-selected success cases. Forrester discloses this in the fine print; most coverage does not repeat it. The ROI bands below are accurate representations of what the studies say. They are also all ⚠️ vendor-commissioned.

The table below draws on Forrester TEI: Microsoft Copilot Studio (Sept 2025), Forrester TEI: Salesforce Agentforce for Customer Service (Nov 2025), and Microsoft blog: M365 Copilot drove up to 353% ROI for SMB (Oct 2024). All use 10% discount rate; all are vendor-commissioned.

Copilot Studio TEI

106%–314% ROI · $25.7M–$76.4M NPV ⚠️

⚠️ Vendor-commissioned by Microsoft (Sept 2025). Composite org: $6.25B revenue, 25,000 employees, 7.6% net margin. Low scenario: 106% ROI / $25.7M NPV. Mid: 216% / $52.6M. High: 314% / $76.4M. Forrester explicitly uses 10% annual discount rate. Use as input-range example — not as a calculator default. Real deployments at smaller orgs, different industries, or partial rollouts will not hit the composite's figures.

Upper-bound enterprise example

Agentforce CS TEI

396% ROI · \$2.2M NPV · <6 mo ⚠️

⚠️ Vendor-commissioned by Salesforce (Nov 2025). Composite org: multimillion-dollar company, global operations, 50 customer-service reps. 396% ROI, $2.2M NPV, payback under 6 months — the highest ROI claim in this landscape. Customer-service deflection is a high-value, narrow use case: fewer reps needed for the same contact volume. Quote from study: 'With Agentforce, we can scale to meet our growing customer needs without having to increase our base human resources requirements.' Do not transfer this to a multi-function enterprise rollout.

CS-deflection narrow use case

M365 Copilot SMB TEI

132%–353% ROI · \$358K–\$955K NPV ⚠️

⚠️ Vendor-commissioned by Microsoft (Oct 2024). Based on 200+ company interviews, organizations up to 300 employees. Low: 132% / $358K NPV. Mid: 243% / $658K. High: 353% / $955K. This is the SMB equivalent — same methodology, smaller composite org. More applicable as an input-range example for the 'Small' scenario in your model than the enterprise Copilot Studio TEI.

SMB reference scenario

M365 Copilot Enterprise TEI

116% ROI · \$19.7M NPV ⚠️

⚠️ Vendor-commissioned by Microsoft. $36.8M benefits vs $17.1M costs over three years, $19.7M NPV, 116% ROI — notably lower than Copilot Studio's 314% high-end because M365 Copilot serves general knowledge-worker tasks, not specialized autonomous-agent workflows. Shows that not all Microsoft Copilot products produce the same ROI — important for presentations that cite 'Microsoft Copilot shows 314% ROI' without product specificity.

Baseline M365 knowledge-worker scenario

Vendor attribution discipline — ⚠️ flags explained

Every ROI figure in the table above carries a ⚠️ tag because every Forrester TEI study cited was commissioned and paid for by the vendor whose product it evaluates. Forrester discloses this in the study fine print. This does not mean the numbers are fabricated — Forrester is a reputable research firm. It means the composite organizations are constructed from self-selected customer interviews, the benefit calculations assume successful implementations, and the studies are not independent peer review. When presenting to a finance committee, label these as “vendor-reported benchmark ranges” — not as industry standards. The independent benchmarks in §02 are the defensible defaults.

02 — Independent BenchmarksThe three independent benchmarks that belong as calculator defaults.

There are exactly three data points in the AI productivity landscape that combine independence, methodological rigor, and direct applicability to an enterprise ROI model. Everything else — Anthropic customer stories, Salesforce self-reports, vendor-curated case studies — is illustrative context, not a default value.

1. Kalliamvakou et al. GitHub Copilot RCT: 55% faster (n=95). The September 2022 GitHub Next study (updated May 2024) assigned 95 professional developers to write an HTTP server in JavaScript — either with GitHub Copilot or without. The verbatim finding: “developers who used GitHub Copilot completed the task significantly faster — 55% faster than the developers who didn't use GitHub Copilot.” Wall-clock times: 1 hour 11 minutes with Copilot versus 2 hours 41 minutes without. This is a randomized controlled trial published on a primary source with full methodology disclosure. Critical caveat: the task was narrow (HTTP server, JavaScript, single session). It does not generalize to “enterprise rollout produces 55% time savings across all workflows.” Use it as the conservative productivity default for coding tasks, tunable upward for knowledge-worker tasks with explicit justification.

2. BLS Employment Cost Index: +42% fully-loaded burden. The US Bureau of Labor Statistics data shows benefits for private industry workers increase the fully-burdened cost by 42% over base salary and wages. Q1 2026 data confirms total compensation rose 3.4% year-over-year, which feeds the calculator's wage-inflation default for year-2 and year-3 projections. These are government statistics — the most defensible source in any finance committee room.

3. Standard 10% enterprise NPV discount rate. Forrester explicitly discloses this in every TEI study: “Forrester assumes a yearly discount rate of 10% for this analysis.” The 10% rate is consistent with standard enterprise IT business-case practice for low-to-medium-risk technology investments. Startups with higher cost of capital should use 15–20%. Public-sector entities typically use 3–7%. The calculator defaults to 10% with the rate exposed as a user-editable input.

Productivity benchmark comparison: independent vs vendor-reported

Sources: Kalliamvakou et al. / GitHub Next (github.blog); BLS ECI (bls.gov); McKinsey QuantumBlack (mckinsey.com). ⚠️ Anthropic figure is vendor self-measured — not a calculator default.

Kalliamvakou RCT task speedup (peer-reviewed)GitHub Copilot n=95 · coding task · Sept 2022 · independent

55%

Anthropic median time savings (self-measured) ⚠️100K Claude.ai conversations · Nov 2025 · vendor self-report

84%

BLS loaded-rate burden over base wages (gov. data)Bureau of Labor Statistics ECI · private industry · 2026

+42%

McKinsey: organizations regularly using AIMcKinsey State of AI 2025 · November 2025

88%

McKinsey: organizations with significant enterprise impact5%+ EBIT contribution · McKinsey State of AI 2025

03 — Calculator Input Schema11 input fields — what each drives and where the defaults come from.

The template uses 11 user-editable input cells. Each has a sourced default (where one exists) or a placeholder range that prompts the user to enter workflow-specific data. The key discipline: no cell defaults to a vendor case-study number. If you want to model the Agentforce 396% scenario, enter those inputs manually — the template won't assume them.

For the license-cost inputs (field 10), see our AI agent pricing landscape comparison for May 2026 — that post tracks per-seat and per-task costs across the major platforms and is the companion to this calculator for the cost side of the model. For the build-vs-buy decision that affects ramp-up cost (field 9), see our enterprise AI agent build vs buy decision matrix.

Field 1–2

Employees impacted · Loaded hourly rate

50→5K

Field 1: number of employees in the rollout scope (not total headcount — apply the scope multiplier separately). Field 2: hourly fully-loaded rate. Default: $71/hr ($50 base × 1.42 BLS burden). Tune per role — engineers run $100–$150/hr loaded; data-entry operators $35–$55/hr loaded. Source: BLS ECI (bls.gov).

BLS government data default

Field 3–5

Tasks/week · Current task time · Agent task time

30min

Field 3: tasks per employee per week (5–40 is typical range). Field 4: current time per task in minutes. Field 5: agent-assisted time per task in minutes — default derived from Kalliamvakou 55% speedup applied to the current-task-time. For coding tasks, the RCT supports this. For knowledge-worker tasks, 50–70% reduction is reasonable as a conservative estimate; tune per workflow. Do NOT default to Anthropic's 84% without explicit justification.

Kalliamvakou RCT default for coding

Field 6–8

Error rate manual · Error rate agent · Cost per error

Field 6: manual data-entry error rate — industry benchmark is 1–4%; default 2%. Field 7: agent error rate — default 1% (conservative; verify per workflow). Field 8: cost per error in dollars — the 1-10-100 rule (fixing at entry: $1, at processing: $10, at customer/compliance: $100) informs the range; default $100 for customer-facing processes, $10 for internal. These three fields compound: a high error rate + high cost per error can rival the labor-savings line.

Industry-standard 1-10-100 rule

Field 9–11

Ramp-up cost · Monthly license · Sunk setup cost

$50K–$1.5M

Field 9: training + integration ramp-up — $50K (SMB) to $1.5M (enterprise) per vendor and SI quotes. Field 10: monthly agent license — varies by platform and tier; use May 2026 pricing data. Field 11: initial setup/sunk cost — $50K (SMB) to $500K+ (enterprise). Fields 9 and 11 together determine the numerator of the payback-period formula. Low ramp-up cost is the strongest lever for short payback periods — one reason Agentforce's <6-month claim uses a 50-rep CS team rather than a 5,000-person enterprise.

Vendor + SI quote ranges

04 — Calculator OutputsSeven computed results — the formulas made explicit.

Finance teams approve business cases that they can audit. Hiding formulas in a spreadsheet — or citing a Forrester figure without exposing the formula — will produce a follow-up request for the model. The seven outputs below include the full formula so any analyst can replicate the calculation independently. For related methodology on measuring agent ROI beyond task completion rate, see our post on per-task vs per-user agent cost framework.

Output 1

Annual time savings (hours)

employees × tasks/week × 52 × (cur_time − agent_time) ÷ 60

The raw productivity gain in person-hours per year. This is the foundation of the cost-savings line. A 50-person team, 15 tasks/week, 25 minutes saved per task = 54,167 hours/year. At $60/hr loaded rate, that is $3.25M in annual labor value released.

Foundation metric

Output 2–3

Annual cost savings + error reduction

hours × hourly_rate + employees × tasks × 52 × (err_manual − err_agent) × cost_per_error

Output 2: time savings monetized at the fully-loaded rate. Output 3: error-cost reduction — employees × tasks/week × 52 × difference in error rates × cost per error. For a 500-person mid-market deployment at 2% vs 1% error rate and $100/error, this adds $312K/year — meaningful but typically secondary to the labor-savings line.

Two-component benefit total

Output 4

Net annual benefit (post-license)

annual_cost_savings + error_reduction − (monthly_license × 12)

Total annual benefit net of ongoing license cost. This is the recurring annual cash flow that feeds the NPV and payback calculations. License cost has an outsized effect at small team sizes — a $30/seat platform costs $18K/year for 50 seats but $3M/year for 5,000 seats. Always model license cost as a percentage of net benefit, not an afterthought.

Key cash-flow line

Output 5

Payback period (months)

(ramp_up + sunk_cost) ÷ (net_annual_benefit ÷ 12)

The metric finance cares most about in year 1. Note: vendors who report <6-month payback typically use small, high-value, narrow deployments (50 CS reps on a deflection use case). The sample table's conservative defaults produce 5.5 months for a 50-person SMB, 3.2 months for 500-person mid-market, 2.4 months for 5,000-person enterprise — the paradox of scale: larger deployments have better payback because license cost per net-benefit dollar falls.

Finance committee priority metric

Output 6–7

3-year NPV @ 10% + break-even volume

Σ(year_t_benefit ÷ 1.10^t) for t=1..3 − initial_outlay | license ÷ ((cur−agent)/60 × rate + Δerr × cost_err)

Output 6: 3-year NPV at 10% discount rate — the standard enterprise IT business-case metric, matching Forrester TEI methodology for apples-to-apples comparison. Output 7: break-even task volume — the minimum weekly task volume per employee for the license cost to be justified by savings alone. Useful for identifying which teams are below break-even and should not be in the rollout's first phase.

Capital-budgeting and sizing metrics

05 — Three-Scenario Sample TableSmall / Mid / Enterprise — the sample outputs at conservative defaults.

The table below applies the calculator's independent-benchmark defaults to three representative deployment sizes. These are illustrative scenarios — not vendor case studies, not guaranteed outcomes. All figures assume 100% rollout scope (every employee in the team uses the agent for the modeled tasks). A 25% scope multiplier — typical for organizations in pilot purgatory — divides all output figures by four. The scope-multiplier problem is covered in §07.

Note on vendor-range comparison: Salesforce's Agentforce TEI ⚠️ reports $2.2M NPV for a 50-rep CS team — higher than the Small-SMB row below for a similar headcount because (a) the TEI uses a customer-service deflection use case with very high per-task value, and (b) the TEI's composite org was constructed from self-selected successful deployments. The table below uses conservative, general-purpose defaults. Both numbers belong in a well-constructed business case: one as the conservative baseline, one as the vendor-reported upper bound with ⚠️ attribution.

Small SMB

Payback · $8.18M 3-yr NPV

5.5mo

50 employees impacted · $60/hr loaded · 15 tasks/week · 30 min current / 5 min agent · 3%/1% error rates · $50/error · $30/seat ($18K/yr) · $50K ramp-up. Annual time savings: 54,167 hrs ($3.25M). Error reduction: $58.5K. Net annual benefit: $3.29M. Payback: ~5.5 months. 3-yr NPV @ 10%: $8.18M.

Conservative general-purpose defaults

Mid-Market

Payback · $69.0M 3-yr NPV

3.2mo

500 employees impacted · $71/hr loaded (BLS default) · 12 tasks/week · 25 min current / 5 min agent · 2%/1% error rates · $100/error · $30/seat ($180K/yr) · $250K ramp-up. Annual time savings: 390,000 hrs ($27.7M). Error reduction: $312K. Net annual benefit: $27.8M. Payback: ~3.2 months. 3-yr NPV @ 10%: $69.0M.

BLS default loaded rate

Enterprise

Payback · $550M 3-yr NPV

2.4mo

5,000 employees impacted · $85/hr loaded (senior knowledge workers) · 10 tasks/week · 20 min current / 5 min agent · 1.5%/0.5% error rates · $300/error (customer-facing) · $50/seat ($3M/yr) · $1.5M ramp-up. Annual time savings: 2.6M hrs ($221M). Error reduction: $3.9M. Net annual benefit: $221M. Payback: ~2.4 months. 3-yr NPV @ 10%: $550M.

Senior knowledge-worker loaded rate

Scope at 25%

Pilot-purgatory adjustment

÷4

Apply a 25% scope multiplier (25% of employees in rollout, typical for 'pilot purgatory' orgs per McKinsey 2025). Small: $2.05M NPV. Mid: $17.25M NPV. Enterprise: $137.5M NPV. Payback periods lengthen accordingly. This is the most important sensitivity check — and the one most ROI templates skip.

McKinsey 67% stuck in pilot purgatory

The scope multiplier is more decisive than whether the task speedup is 55% or 84%. If only 5% of employees are in the rollout, your NPV is 5% of the full-deployment number. Most AI ROI templates don't surface this — which is why most AI budgets fail their first-year review.Digital Applied analysis, May 23, 2026

06 — Sensitivity AnalysisWhich inputs move the needle — and which are noise.

Sensitivity analysis ranks inputs by their effect on the output when varied ±20%. For most enterprise AI agent deployments, the ranking is consistent regardless of scenario size. The insights below reflect the calculator's structural behavior — not empirical data from specific deployments.

Dominant inputs (high elasticity): Scope multiplier (employees in rollout vs total headcount) and hourly loaded rate are the two inputs that dominate the output at every deployment size. A 20% increase in scope produces a 20% increase in NPV by definition. A 20% increase in loaded rate (e.g., $71/hr to $85/hr) has the same proportional effect on the benefit side. These are the first two cells to stress-test in a sensitivity table.

High-impact inputs (moderate elasticity): Agent time per task (field 5) and tasks per employee per week (field 3) are multiplicative in the annual-hours formula. Halving the agent task time doubles the time savings. But both inputs require workflow-level evidence — finance teams will challenge generic claims about 55% speedup applying to every task in a heterogeneous workforce.

Lower-impact inputs (in most scenarios): Error rate differential and cost per error are secondary unless the use case is specifically high-volume, high-error-cost (compliance, customer-facing finance processes). For a general knowledge-worker deployment, the error-reduction line typically contributes 2–8% of total annual benefit — meaningful but not a lead argument.

Ramp-up cost and setup cost affect payback period but not the steady-state NPV (except through the initial outlay deduction). A 2× increase in ramp-up cost roughly doubles the payback period for SMB deployments where ramp-up cost is a large fraction of annual benefit — but barely moves the payback needle for enterprise deployments where annual benefit is 100× the ramp-up cost.

For a complete view of how per-task pricing interacts with these sensitivity levers, see our coding-agent cost calculator for 10 tools — that post handles the token-cost input side that this template abstracts into a monthly-license field.

Input sensitivity ranking: relative effect on 3-year NPV

Digital Applied calculator sensitivity analysis — illustrative rankings based on output elasticity to ±20% input variation.

Scope multiplier (% of employees in rollout)Dominant lever — directly proportional to all output metrics

Highest

Hourly fully-loaded rateDirectly multiplies annual time savings — tune per role and region

Very high

Agent task time per task (vs current task time)Multiplicative with tasks/week and scope — key productivity input

High

Tasks per employee per weekMultiplicative with time delta — requires workflow measurement

High

Error rate differential (manual vs agent)Secondary unless high-volume / high-error-cost use case

Moderate

Ramp-up + sunk costAffects payback period; minimal effect on steady-state NPV at enterprise scale

Low–moderate

07 — Pilot Purgatory Warning88% adoption, 6% impact — why scope multiplier is the real business case.

McKinsey's State of AI 2025 contains the most important number in enterprise AI: only 6% of organizations attribute 5% or more of EBIT impact to AI use. Meanwhile, 88% of organizations report using AI regularly. The gap between adoption and impact is not a technology failure — it is a scope failure. Only about one-third of organizations report scaling AI across the enterprise; the remaining two-thirds are in pilot purgatory, running experiments that never graduate to production at scale.

The calculator's scope multiplier makes this visible. A 5,000-person enterprise rolling out an agent to 10% of the workforce (500 employees) produces the same NPV as the mid-market 500-person scenario — $69.0M over three years. Not the $550M enterprise projection. The business case was built for 5,000 employees and approved for that deployment. The actual deployment reached 500. The organization reports its AI investment “underperformed expectations.” This is not a math problem — it is a governance and change-management problem that the business case must anticipate.

The recommendation: build three versions of the business case explicitly — 25% scope (pilot), 50% scope (departmental), 100% scope (enterprise). Show the finance committee all three. Tie budget approval to milestone-gated scope expansion: initial funding for the 25% pilot, with additional budget released when pilot metrics hit defined thresholds. This structure has a higher approval rate with finance committees than a single full-deployment projection because it de-risks the initial outlay. For the governance framework that enables this milestone-gating, see our AI agent governance and compliance framework.

The week of May 19, the Code with Claude London conference disclosed a concrete real-world data point: enterprise teams at the event reported a 4:1 ROI, with cost per incremental pull request at $37.50 versus $150 of developer time saved. ⚠️ This figure came from Anthropic-curated conference presentations — it is a vendor-disclosed benchmark, not an independent study. Its value is as a directional anchor for coding-specific use cases, not as a universal default. For broader AI agent adoption context, our agentic AI week in review for May 19–23 covers the full landscape of announcements that week.

Earlier this month, Microsoft Copilot Studio computer-use agents reached general availability (May 13, 2026), per the Microsoft Tech Community blog. Computer-use GA changes the ramp-up cost profile for Microsoft customers — agents can now interact with legacy applications through screen-reading rather than API integration, potentially reducing the $250K–$1.5M integration cost range for enterprise deployments. For the full Copilot Studio computer-use analysis, see our Copilot Studio computer-use agents GA deep-dive.

08 — Methodology DisciplineHow to present the business case to a finance committee.

A finance committee reviewing an AI agent business case will ask three questions that most technology teams are not prepared for. The preparation below turns a “pilot project” budget request into a capital allocation decision.

Question 1: What are the assumptions? The committee will want to see the input table with sourced defaults. Present the 11-cell input table with three columns: the value you used, the source for that value, and the sensitivity range. Use the Kalliamvakou RCT 55% as your productivity default, the BLS 42% as your loaded-rate multiplier, and 10% as your discount rate — explicitly labeled as “consistent with Forrester TEI methodology.” For vendor TEI comparisons, include a footnote: “Vendor-commissioned studies (⚠️) represent vendor-selected best-case deployments and are provided as upper-bound benchmarks. Our model uses independent data as defaults.”

Question 2: What if the rollout takes longer?Ramp-up delays are the most common reason AI projects miss their business-case projections. Model year 1 at 25% of steady-state benefit to account for training, change management, and integration delays. Apply full benefits from year 2 onward. The payback period extends by several months but the 3-year NPV barely changes — making this a more defensible assumption while demonstrating that you have thought through implementation risk.

Question 3: How do you measure success? Define three KPIs that can be measured at the 90-day pilot mark: (1) actual time per task (vs modeled), (2) actual adoption rate as a fraction of scope, and (3) user satisfaction (proxy for sustained adoption). If the 90-day pilot shows actual task time at or below model and adoption above 70%, the scope-expansion budget request becomes a data-backed ask, not a projection.

For the methodology behind Anthropic's own time-savings research, the Anthropic productivity research paper (Nov 2025) explains Hulten's theorem and the O*NET occupational-category integration. The median $54 in professional labor value per Claude.ai task is a useful benchmark for quantifying the individual-task benefit, not the enterprise rollout benefit — the two measurements operate at different levels of abstraction.

09 — Action PlanFrom spreadsheet to approved budget — a 6-step sequence.

The ROI calculator is a finance tool, not a technology tool. Its purpose is to translate an agent deployment decision into the capital-allocation language that CFOs and budget committees use. The six steps below move from data collection to board approval.

Step 1: Select a pilot use case with measurable task volume. The model requires task count and time-per-task data. Choose a use case where these are already tracked or easily sampled — ticket resolution, invoice processing, code review, customer email responses. Avoid broad “knowledge work” framing that cannot produce input data for fields 3 and 4.

Step 2: Run a 2-week task timing sample. Before building the business case, time 20–50 task completions manually. Record current task time, error rate, and cost-per-error for your specific workflow. These measured inputs are 10× more defensible than benchmark defaults and will withstand finance-committee scrutiny. The Kalliamvakou 55% is a floor, not a target — your workflow may produce a different speedup depending on task structure.

Step 3: Build the model with conservative defaults. Enter your sampled inputs. Use the BLS loaded rate for your region and role mix. Apply the 10% discount rate. Set agent time per task at 50% of current (lower than Kalliamvakou's 55% to be conservative). Set scope at 25% for the pilot row.

Step 4: Build the vendor-range comparison section. Add a second tab or appendix with the relevant Forrester TEI figures ⚠️ labeled as vendor-commissioned upper bounds. Show the gap between your conservative model and the vendor range — and explain it. Finance committees distrust presentations that show only upside; showing the gap and explaining it builds credibility.

Step 5: Present three scenarios, not one. 25% scope (pilot), 50% scope (departmental), 100% scope (enterprise). Include a milestone-gate structure: “We request pilot funding for scenario 1. If 90-day KPIs are met, we return for scenario 2 funding.” This de-risks the initial outlay and matches how finance committees think about technology investment.

Step 6: Monitor and update the model quarterly. The business case is a living document. Actual task-time data, actual error rates, and actual adoption percentages should update the model inputs at 30, 60, and 90 days. The model's credibility with finance grows as actual results track (or improve on) the conservative projections. Our AI transformation services include business-case development and quarterly ROI reviews for enterprise agent deployments — contact us if you need a structured engagement to run this process.

For the broader context on how the agent-first marketing operations layer connects to these ROI projections, see our agent-first marketing ops post-I/O 2026 playbook.

Conclusion

The business case that wins budget is the one that discloses its assumptions.

AI agent ROI frameworks fail in finance committees not because the numbers are wrong but because the assumptions are invisible. Presenting a Forrester TEI 396% ROI figure without ⚠️ labeling it as vendor-commissioned, or defaulting to Anthropic's 84% median time savings without disclosing the narrow-task-scope methodology, produces a business case that looks like a vendor sales deck. Finance teams have seen vendor sales decks. They do not approve them.

The framework in this post does the opposite: independent benchmarks as defaults (Kalliamvakou 55% RCT, BLS +42% burden, 10% discount rate), vendor ranges as labeled upper bounds, a scope multiplier that makes the pilot-purgatory risk visible, and milestone-gated budget asks that de-risk the initial approval. The three-scenario sample outputs — 5.5/3.2/2.4 months payback and $8.18M/$69.0M/$550M 3-year NPV — are illustrative at 100% scope. At the more realistic 25% pilot scope, those figures divide by four. Presenting both is how you build the credibility to expand from pilot to enterprise.

McKinsey's finding that 88% of organizations use AI but only 6% see meaningful enterprise impact is the most important number in this post. The gap is not technology — it is measurement, governance, and scope discipline. A business case built on the framework above is the first step toward moving your organization from the 88% to the 6%.

AI Agent ROI: Build the Business Case with Defensible Numbers.