Two years into the agentic-AI services category, the agencies making money on AI work are not the ones with the cleverest prompts. They are the ones with a token budget. Without a budget, an account manager who promised "weekly AI-generated SEO briefs" on a $4,000 retainer can quietly burn $1,800 of that in tokens before anyone notices. The retainer is technically still profitable; the gross margin on AI services is gone.

The framework below is the four-step plan we run for our own agency book and ship to client agencies that have asked us to fix their margin. It is not a prompt-engineering trick or a model swap. It is the spreadsheet, the cadence, and the variance flags that turn AI-services from a margin risk into a tracked line item with a target.

Key takeaways

01
Token budgets are spreadsheet work, not engineering work — and that's why agencies skip them.Engineering teams build clever cost optimisations; agency operations teams build forecasts. The framework lives in the operations org, not the engineering org. Most agencies fail at this because they assume the engineering team will solve it.
02
Three client tiers (light, standard, intensive) capture 90% of agency mix.Light: ≤ 50K tokens/client/month, mostly retrieval and audit. Standard: 50-300K tokens, mixed workflows. Intensive: 300K-2M tokens, heavy drafting and agentic workflows. Tier each client; price the tier; budget the tier.
03
Decompose tokens by workflow type AND model class, not just by client.Research, drafting, audit, ops are the four workflow categories. Frontier reasoning, mid-tier, cheap workhorse are the three model classes. The 4×3 matrix is what makes forecasts actionable — moving a workflow from frontier to mid-tier is where the margin lives.
04
Reconcile weekly, with a 15% variance flag.Monthly reconciliation is too late — by the time the variance shows up, two weeks of margin are gone. Weekly reconciliation with a 15% variance flag (per client, per workflow) catches drift in time to course-correct on the next sprint.
05
Target 71% gross margin on AI services; structurally below 65% means the model mix is wrong.71% is the field median across our agency book and 18 client engagements. Margin below 65% almost always traces to using a frontier model where a mid-tier model would do; above 78% usually means the agency is under-serving the workflow.

01 — ContextWhy agencies need a token budget.

For the first 18 months of the agentic-AI services category, tokens were a rounding error. A typical retainer ran $20-40 of tokens against $4,000 of fee. Margin was 95%+. Nobody needed a forecast.

That stopped being true in late 2025. The shift came from three directions: clients started asking for higher-fidelity outputs (drafts instead of outlines, full audits instead of spot-checks), agentic workflows became multi-step (each step billable), and frontier reasoning models became the default for quality-led work (10-30× more expensive per token than the workhorses).

By Q1 2026, the median AI-services line on agency P&Ls is 6% of revenue with margins running 60-75%. That is a real line item — and a real risk. Without a forecast, the line drifts upward silently and the margin is the first thing leadership notices.

"We were profitable on the retainer, sure. We were unprofitable on the AI-services portion of the retainer, and nobody had noticed because the line was unmarked."— Managing partner, mid-market agency, Feb 2026

02 — FrameworkThe four-step framework.

Step 1

Tier the client book

annually + on engagement start

Classify each client into one of three tiers (light, standard, intensive) based on workflow intensity. Tiering drives both the price the client pays and the budget the agency commits.

Foundation

Step 2

Decompose tokens by workflow + model

per-engagement scoping

Map each client's monthly workflow into the 4×3 matrix (workflow type × model class). The matrix is what tells you where the cost actually lives.

Visibility layer

Step 3

Build the forecast (100-row template)

monthly · spreadsheet-native

100 rows: one per client × workflow × model class triplet. Estimate monthly tokens and cost per row. Sum to portfolio-level forecast. Update on engagement changes.

Forecast layer

Step 4

Reconcile weekly + variance flag

weekly cadence · 15% threshold

Pull actual usage weekly. Compare to forecast at the row level. Flag any row variance >15% for review. Course-correct on the next sprint.

Control layer

03 — Step 1Three client tiers.

The three tiers (light, standard, intensive) capture the vast majority of agency mix. The cut-points are tokens-per-month and workflow-intensity bands; the price band associated with each tier should reflect the budget commitment, not just the time commitment.

Tier 1

Light · ≤ 50K tokens/month

Retrieval-heavy workflows: weekly performance summaries, simple SEO audits, content briefs from existing material. Mostly cheap workhorse models with occasional frontier-reasoning use. Typical client fee $2-5K/month; AI-services token cost $5-25/month; gross margin 99%+.

Margin-rich, low-risk

Tier 2

Standard · 50-300K tokens/month

Mixed workflows: full content drafts, GEO audits, multi-source research, competitive intel. Mix of mid-tier and frontier models. Typical client fee $4-12K/month; AI-services token cost $50-300/month; gross margin 95-98%.

Sweet spot

Tier 3

Intensive · 300K-2M tokens/month

Heavy drafting, agentic workflows, multi-step pipelines, daily deliverables. Heavy frontier-reasoning use. Typical client fee $10-30K/month; AI-services token cost $300-3,000/month; gross margin 85-95%.

Margin-watch

04 — Step 2Decompose by workflow + model.

The 4×3 matrix below is the visibility layer. Without it, you can see the total token spend per client but not where the spend is coming from. With it, you can see that 72% of the spend on a specific client is coming from one workflow on a frontier reasoning model — and that swapping that workflow to a mid-tier model recovers half the cost.

Workflow A

Multi-source research workflows

Research

Multi-step retrieval, synthesis, fact-checking. Highest tokens-per-task class — 8-25K tokens per output. Frontier reasoning preferred for synthesis; mid-tier for retrieval. Avoid running pure retrieval through frontier reasoning.

8-25K tokens/task

Workflow B

Content drafting + revision

Drafting

First-draft generation against a brief, multi-pass revision, voice tuning. 4-12K tokens per output. Mid-tier model is the right default; frontier reasoning only for high-stakes pieces (PR, exec voice, regulated content).

4-12K tokens/task

Workflow C

Audit + analysis workflows

Audit

Structured reviews — content audits, GEO audits, technical SEO sweeps. Most are checklist-driven and run cleanly on cheap workhorse models with strict structured output. Avoid running audits on frontier reasoning unless the audit is opinion-led.

1-4K tokens/task

Workflow D

Operations + classification

Ops

Tagging, classification, routing, lightweight summarisation. Cheap workhorse models exclusively; structured output mandatory. Highest task volume; lowest unit cost. Often the largest single line by token count even though margin is best here.

0.2-1K tokens/task

05 — Step 3The 100-row forecast template.

One row per client × workflow × model class triplet. For a typical 12-client agency that maps to roughly 80-120 rows depending on the mix; the template is sized to fit comfortably in a single spreadsheet view.

Column 1

Client + tier + workflow + model class

left side · descriptive

Identifies the row. Column-1 stays editable as engagements change; columns 2-7 are formula-driven from the row identifier.

Anchor columns

Column 2-3

Tasks/month + tokens/task

estimates · adjustable

Estimated tasks per month for the workflow on this client and the median tokens per task for the workflow type. Both are editable; both feed into column 4.

Volume + unit

Column 4-5

Total tokens + cost

formula · per-row

Total = tasks × tokens. Cost = tokens × per-token rate (looked up from a separate model-rate table). Auto-recalculates when columns 2-3 change.

Calculated

Column 6-7

Actual + variance

filled weekly during reconcile

Actual tokens and cost pulled from provider dashboards weekly. Variance computed automatically; conditional formatting highlights any row above 15% variance for review.

Reconciliation

06 — Step 4Weekly reconciliation.

Reconciliation is the control layer. Without it, the forecast is a once-and-done document that drifts. With weekly reconciliation and a 15% variance flag, the agency catches drift in time to course-correct on the next sprint instead of finding it at the month-end review.

Cadence

Weekly · Monday morning · 30 minutes

Pull actual usage from each provider (OpenAI, Anthropic, Google). Match to row in the forecast template. Conditional formatting flags rows above 15% variance. Total time: 30 min/week.

Monday cadence

Variance flag

15% per row, 8% portfolio-level

Per-row 15% catches workflow-specific drift; portfolio-level 8% catches mix shifts that show up across multiple rows. Both flags trigger a 15-minute next-sprint conversation.

Two-level flag

Course correction

Three standard moves

1) Swap workflow to a cheaper model class (frontier→mid-tier, mid-tier→workhorse). 2) Cap workflow volume on the next sprint. 3) Reprice the engagement at renewal. Use moves in this order; pricing only after model swaps and volume caps.

Standard plays

Escalation

30-day variance > 25% → renewal review

If variance persists above 25% for 30 days despite course corrections, escalate to renewal review. Either reprice the engagement or restructure the AI-services scope. Persistent variance is a pricing signal.

Pricing signal

07 — Worked example12-client agency forecast.

The example below is anonymised but real — a 12-person mid-market agency, March 2026 forecast. Total monthly token cost: $24,800. Total monthly AI-services revenue: $86,400. Implied gross margin: 71%.

Tier mix

2 light + 7 standard + 3 intensive

Two retainers in the light tier ($35/month tokens). Seven retainers in the standard tier ($120-380/month tokens, $1,800 total). Three retainers in the intensive tier ($1,400-4,800/month tokens, $8,400 total).

Tier distribution

Workflow mix

Research 38% / Drafting 41% / Audit 12% / Ops 9%

$24.8K

Drafting is the largest workflow line by cost. Research is second. Audit and ops together make up 21%. The mix is typical for a content/SEO-heavy agency book; agencies heavier on paid media or analytics shift mix toward audit and ops.

Workflow distribution

Model mix

Frontier 53% / Mid-tier 31% / Workhorse 16%

Frontier reasoning is 53% of cost despite being only 22% of token volume. The forecast highlighted three intensive-tier engagements where moving research synthesis from frontier to mid-tier would save $4,200/month at no quality cost.

Where margin lives

Margin

Gross margin on AI services

71%

71% is the field median across our 18-engagement sample. After the proposed model-mix change, the same agency forecasts 78% margin — the difference between a healthy AI-services line and a great one.

Field benchmark

08 — ConclusionMargin lives in the spreadsheet.

Token budget framework, April 2026

The agencies making money on AI services run a forecast, reconcile weekly, and catch margin drift before it shows up in the P&L.

Token spend is now a real line item on agency P&Ls. Without a forecast and weekly reconciliation, the line drifts and the margin disappears quietly. The framework above is the operations work that turns AI-services into a tracked line item with a target margin instead of a margin risk.

Adopt the four steps in order. Tier the client book this week. Decompose tokens by workflow and model class on the next planning cycle. Build the forecast template; share it with the operations team. Reconcile weekly; flag variance above 15%; course-correct on the next sprint.

Target 71% gross margin on AI services. Persistent margin below 65% almost always traces to using frontier reasoning where a mid-tier model would do; persistent margin above 78% usually means the agency is under-serving the workflow. Either signal is useful. Both require the spreadsheet.

Token Budget Planning Framework