SYS/2026.Q1Agentic SEO audits delivered in 72 hoursSee how →
AI DevelopmentCost Playbook3 min readPublished Apr 27, 2026

4 steps · 3 client tiers · 100-row forecast template

Token Budget Planning Framework

By 2026, the gross margin of an agency's AI-services line is decided in the spreadsheet, not the pitch deck. Most agencies are running tokens through retainers without a forecast and finding the margin gone. This framework is the four-step plan we run for our own book and ship to client agencies.

DA
Digital Applied Team
Senior strategists · Published Apr 27, 2026
PublishedApr 27, 2026
Read time3 min
SourcesProvider pricing · OpenAI/Anthropic dashboards · DA fieldwork
Steps
4
tier · decompose · forecast · reconcile
Reconciliation
weekly
variance flag at >15%
Worked example
$24.8K
12-client agency monthly forecast
Target margin
71%
gross margin on AI services
field median

Two years into the agentic-AI services category, the agencies making money on AI work are not the ones with the cleverest prompts. They are the ones with a token budget. Without a budget, an account manager who promised "weekly AI-generated SEO briefs" on a $4,000 retainer can quietly burn $1,800 of that in tokens before anyone notices. The retainer is technically still profitable; the gross margin on AI services is gone.

The framework below is the four-step plan we run for our own agency book and ship to client agencies that have asked us to fix their margin. It is not a prompt-engineering trick or a model swap. It is the spreadsheet, the cadence, and the variance flags that turn AI-services from a margin risk into a tracked line item with a target.

Key takeaways
  1. 01
    Token budgets are spreadsheet work, not engineering work — and that's why agencies skip them.Engineering teams build clever cost optimisations; agency operations teams build forecasts. The framework lives in the operations org, not the engineering org. Most agencies fail at this because they assume the engineering team will solve it.
  2. 02
    Three client tiers (light, standard, intensive) capture 90% of agency mix.Light: ≤ 50K tokens/client/month, mostly retrieval and audit. Standard: 50-300K tokens, mixed workflows. Intensive: 300K-2M tokens, heavy drafting and agentic workflows. Tier each client; price the tier; budget the tier.
  3. 03
    Decompose tokens by workflow type AND model class, not just by client.Research, drafting, audit, ops are the four workflow categories. Frontier reasoning, mid-tier, cheap workhorse are the three model classes. The 4×3 matrix is what makes forecasts actionable — moving a workflow from frontier to mid-tier is where the margin lives.
  4. 04
    Reconcile weekly, with a 15% variance flag.Monthly reconciliation is too late — by the time the variance shows up, two weeks of margin are gone. Weekly reconciliation with a 15% variance flag (per client, per workflow) catches drift in time to course-correct on the next sprint.
  5. 05
    Target 71% gross margin on AI services; structurally below 65% means the model mix is wrong.71% is the field median across our agency book and 18 client engagements. Margin below 65% almost always traces to using a frontier model where a mid-tier model would do; above 78% usually means the agency is under-serving the workflow.

01ContextWhy agencies need a token budget.

For the first 18 months of the agentic-AI services category, tokens were a rounding error. A typical retainer ran $20-40 of tokens against $4,000 of fee. Margin was 95%+. Nobody needed a forecast.

That stopped being true in late 2025. The shift came from three directions: clients started asking for higher-fidelity outputs (drafts instead of outlines, full audits instead of spot-checks), agentic workflows became multi-step (each step billable), and frontier reasoning models became the default for quality-led work (10-30× more expensive per token than the workhorses).

By Q1 2026, the median AI-services line on agency P&Ls is 6% of revenue with margins running 60-75%. That is a real line item — and a real risk. Without a forecast, the line drifts upward silently and the margin is the first thing leadership notices.

"We were profitable on the retainer, sure. We were unprofitable on the AI-services portion of the retainer, and nobody had noticed because the line was unmarked."— Managing partner, mid-market agency, Feb 2026

02FrameworkThe four-step framework.

Step 1
Tier the client book
annually + on engagement start

Classify each client into one of three tiers (light, standard, intensive) based on workflow intensity. Tiering drives both the price the client pays and the budget the agency commits.

Foundation
Step 2
Decompose tokens by workflow + model
per-engagement scoping

Map each client's monthly workflow into the 4×3 matrix (workflow type × model class). The matrix is what tells you where the cost actually lives.

Visibility layer
Step 3
Build the forecast (100-row template)
monthly · spreadsheet-native

100 rows: one per client × workflow × model class triplet. Estimate monthly tokens and cost per row. Sum to portfolio-level forecast. Update on engagement changes.

Forecast layer
Step 4
Reconcile weekly + variance flag
weekly cadence · 15% threshold

Pull actual usage weekly. Compare to forecast at the row level. Flag any row variance >15% for review. Course-correct on the next sprint.

Control layer

03Step 1Three client tiers.

The three tiers (light, standard, intensive) capture the vast majority of agency mix. The cut-points are tokens-per-month and workflow-intensity bands; the price band associated with each tier should reflect the budget commitment, not just the time commitment.

Tier 1
Light · ≤ 50K tokens/month

Retrieval-heavy workflows: weekly performance summaries, simple SEO audits, content briefs from existing material. Mostly cheap workhorse models with occasional frontier-reasoning use. Typical client fee $2-5K/month; AI-services token cost $5-25/month; gross margin 99%+.

Margin-rich, low-risk
Tier 2
Standard · 50-300K tokens/month

Mixed workflows: full content drafts, GEO audits, multi-source research, competitive intel. Mix of mid-tier and frontier models. Typical client fee $4-12K/month; AI-services token cost $50-300/month; gross margin 95-98%.

Sweet spot
Tier 3
Intensive · 300K-2M tokens/month

Heavy drafting, agentic workflows, multi-step pipelines, daily deliverables. Heavy frontier-reasoning use. Typical client fee $10-30K/month; AI-services token cost $300-3,000/month; gross margin 85-95%.

Margin-watch

04Step 2Decompose by workflow + model.

The 4×3 matrix below is the visibility layer. Without it, you can see the total token spend per client but not where the spend is coming from. With it, you can see that 72% of the spend on a specific client is coming from one workflow on a frontier reasoning model — and that swapping that workflow to a mid-tier model recovers half the cost.

Workflow A
Research
Multi-source research workflows

Multi-step retrieval, synthesis, fact-checking. Highest tokens-per-task class — 8-25K tokens per output. Frontier reasoning preferred for synthesis; mid-tier for retrieval. Avoid running pure retrieval through frontier reasoning.

8-25K tokens/task
Workflow B
Drafting
Content drafting + revision

First-draft generation against a brief, multi-pass revision, voice tuning. 4-12K tokens per output. Mid-tier model is the right default; frontier reasoning only for high-stakes pieces (PR, exec voice, regulated content).

4-12K tokens/task
Workflow C
Audit
Audit + analysis workflows

Structured reviews — content audits, GEO audits, technical SEO sweeps. Most are checklist-driven and run cleanly on cheap workhorse models with strict structured output. Avoid running audits on frontier reasoning unless the audit is opinion-led.

1-4K tokens/task
Workflow D
Ops
Operations + classification

Tagging, classification, routing, lightweight summarisation. Cheap workhorse models exclusively; structured output mandatory. Highest task volume; lowest unit cost. Often the largest single line by token count even though margin is best here.

0.2-1K tokens/task

05Step 3The 100-row forecast template.

One row per client × workflow × model class triplet. For a typical 12-client agency that maps to roughly 80-120 rows depending on the mix; the template is sized to fit comfortably in a single spreadsheet view.

Column 1
Client + tier + workflow + model class
left side · descriptive

Identifies the row. Column-1 stays editable as engagements change; columns 2-7 are formula-driven from the row identifier.

Anchor columns
Column 2-3
Tasks/month + tokens/task
estimates · adjustable

Estimated tasks per month for the workflow on this client and the median tokens per task for the workflow type. Both are editable; both feed into column 4.

Volume + unit
Column 4-5
Total tokens + cost
formula · per-row

Total = tasks × tokens. Cost = tokens × per-token rate (looked up from a separate model-rate table). Auto-recalculates when columns 2-3 change.

Calculated
Column 6-7
Actual + variance
filled weekly during reconcile

Actual tokens and cost pulled from provider dashboards weekly. Variance computed automatically; conditional formatting highlights any row above 15% variance for review.

Reconciliation

06Step 4Weekly reconciliation.

Reconciliation is the control layer. Without it, the forecast is a once-and-done document that drifts. With weekly reconciliation and a 15% variance flag, the agency catches drift in time to course-correct on the next sprint instead of finding it at the month-end review.

Cadence
Weekly · Monday morning · 30 minutes

Pull actual usage from each provider (OpenAI, Anthropic, Google). Match to row in the forecast template. Conditional formatting flags rows above 15% variance. Total time: 30 min/week.

Monday cadence
Variance flag
15% per row, 8% portfolio-level

Per-row 15% catches workflow-specific drift; portfolio-level 8% catches mix shifts that show up across multiple rows. Both flags trigger a 15-minute next-sprint conversation.

Two-level flag
Course correction
Three standard moves

1) Swap workflow to a cheaper model class (frontier→mid-tier, mid-tier→workhorse). 2) Cap workflow volume on the next sprint. 3) Reprice the engagement at renewal. Use moves in this order; pricing only after model swaps and volume caps.

Standard plays
Escalation
30-day variance > 25% → renewal review

If variance persists above 25% for 30 days despite course corrections, escalate to renewal review. Either reprice the engagement or restructure the AI-services scope. Persistent variance is a pricing signal.

Pricing signal

07Worked example12-client agency forecast.

The example below is anonymised but real — a 12-person mid-market agency, March 2026 forecast. Total monthly token cost: $24,800. Total monthly AI-services revenue: $86,400. Implied gross margin: 71%.

Tier mix
12
2 light + 7 standard + 3 intensive

Two retainers in the light tier ($35/month tokens). Seven retainers in the standard tier ($120-380/month tokens, $1,800 total). Three retainers in the intensive tier ($1,400-4,800/month tokens, $8,400 total).

Tier distribution
Workflow mix
$24.8K
Research 38% / Drafting 41% / Audit 12% / Ops 9%

Drafting is the largest workflow line by cost. Research is second. Audit and ops together make up 21%. The mix is typical for a content/SEO-heavy agency book; agencies heavier on paid media or analytics shift mix toward audit and ops.

Workflow distribution
Model mix
%
Frontier 53% / Mid-tier 31% / Workhorse 16%

Frontier reasoning is 53% of cost despite being only 22% of token volume. The forecast highlighted three intensive-tier engagements where moving research synthesis from frontier to mid-tier would save $4,200/month at no quality cost.

Where margin lives
Margin
71%
Gross margin on AI services

71% is the field median across our 18-engagement sample. After the proposed model-mix change, the same agency forecasts 78% margin — the difference between a healthy AI-services line and a great one.

Field benchmark

08ConclusionMargin lives in the spreadsheet.

Token budget framework, April 2026

The agencies making money on AI services run a forecast, reconcile weekly, and catch margin drift before it shows up in the P&L.

Token spend is now a real line item on agency P&Ls. Without a forecast and weekly reconciliation, the line drifts and the margin disappears quietly. The framework above is the operations work that turns AI-services into a tracked line item with a target margin instead of a margin risk.

Adopt the four steps in order. Tier the client book this week. Decompose tokens by workflow and model class on the next planning cycle. Build the forecast template; share it with the operations team. Reconcile weekly; flag variance above 15%; course-correct on the next sprint.

Target 71% gross margin on AI services. Persistent margin below 65% almost always traces to using frontier reasoning where a mid-tier model would do; persistent margin above 78% usually means the agency is under-serving the workflow. Either signal is useful. Both require the spreadsheet.

Agency cost engineering

Stop guessing at token spend. Run a forecast.

We help agencies stand up token-budget programs end-to-end — tiering the client book, decomposing the workflow×model matrix, building the forecast template, and instrumenting the weekly reconciliation cadence. Most engagements recover 6-12 percentage points of gross margin within 90 days.

Free consultationExpert guidanceTailored solutions
What we work on

Token-budget engagements

  • Client tier classification + pricing realignment
  • Workflow×model decomposition matrix
  • 100-row forecast template + weekly reconciliation
  • Provider rate-card maintenance + alerting
  • Margin-recovery playbook for intensive-tier clients
FAQ · Token budget planning

The questions we get every week.

Not always. Smaller agencies (5-8 retainers) often run cleanly on 30-50 rows. The 100-row template is sized for a typical 10-15 retainer book where each retainer averages 6-8 distinct workflow×model triplets. The principle that matters is one row per workflow×model combination per client; the template can flex up or down. Avoid simplifying to one row per client — that loses the visibility into where the cost actually lives, which is the whole point of step 2.