BusinessPricing Tracker11 min readPublished July 3, 2026

Three agent classes · index dated July 2026 · every cell recomputable

The AI Agent Build & Run Cost Index 2026

Dev-shop guides quote $8K to $500K+ for the same “simple agent” with no math shown. This index takes the opposite approach: a stated-assumption cost model — engineering days × day rate, token volumes × verified July 2026 prices, plus hosting, vector DB, monitoring, and senior oversight — where every cell can be recomputed, and swapped for your own numbers.

DA
Digital Applied Team
Senior strategists · Published Jul 3, 2026
PublishedJuly 3, 2026
Read time11 min
Sources19 cited sources
Simple agent build
$7.2K
median · 8 eng-days × $900
Multi-agent run cost
~$6.3K
per month · all line items
Model spread per turn
9.1x
GLM-5.2 → Fable 5, July 2026
SERP spread, same tier
10x
“simple agent” $8K–$80K

The cost to build an AI agent in 2026 is the most-searched, worst- answered question in the agentic economy: the top-ranking guides quote anywhere from $8,000 to $500,000+ for functionally identical descriptors, and not one of them publishes the math behind the range. This index takes the opposite approach — a stated-assumption model where every dollar figure traces to a formula you can recompute and re-run with your own numbers.

Getting this number wrong is expensive in both directions. Overestimate and you shelve an automation that would have paid for itself in a quarter; underestimate and you join the projects Gartner predicts will be canceled — the firm forecast in June 2025 that more than 40% of agentic AI projects will be scrapped by the end of 2027, citing escalating costs, unclear business value, and inadequate risk controls.

Below: the five competing cost pages and why they disagree with each other by 10x on the same tier, the Digital Applied build and run cost model with every assumption stated, a per-model price sensitivity table spanning 1x to 9.1x, and the July 2026 price-war context that will force this index to be re-dated. Everything is a snapshot, deliberately labeled with its date.

Key takeaways
  1. 01
    The SERP disagrees with itself by 10x.Five dev-shop guides published within four months quote “simple agent” builds from $8,000 to $80,000 — none state token volumes, engineering days, or a day rate, so none can be recomputed or compared.
  2. 02
    Our index: three classes, every cell recomputable.Digital Applied methodology — simple task agent $5,400–$9,000 build / ~$800 monthly run; RAG-workflow agent $13,500–$22,500 / ~$2,700; multi-agent system $27,000–$45,000 / ~$6,250. All from stated engineering days × $900/day and July 2026 token prices.
  3. 03
    Tokens are no longer the dominant run-cost line.At July 2026 rates, model tokens are 8% of the simple-agent run median, roughly 16% of the RAG median, and roughly 27% of the multi-agent median. Senior oversight time is the largest line in all three classes.
  4. 04
    Model choice is a 1x-to-9.1x multiplier.The same 1,000 agent turns cost $3.86 on GLM-5.2 and $35.00 on Claude Fable 5 at list prices — with Sonnet 5, GPT-5.6 Terra preview, Opus 4.8, and GPT-5.5 spread between them.
  5. 05
    Serverless compute is a rounding error.A worked example on Vercel’s published rate card puts raw compute for a 10,000-request/month agent at ≈$0.26 — active-CPU billing pauses during LLM waits, so the real hosting line is the $20/month Pro seat.

01The ProblemFive guides, one descriptor, a 10x spread.

Search “AI agent development cost” and the results are dominated by dev-shop guides — SoftTeco, DestiLabs, ProductCrafters, Cleveroad, and Azilen all published or refreshed theirs between February and June 2026. Put their numbers side by side and the genre’s problem is immediate: the same “simple agent” descriptor spans $8,000 at the bottom of DestiLabs’ band to $80,000 at the top of SoftTeco’s — a 10x disagreement on the same tier, published within a four-month window.

Commonly quoted AI agent build-cost ranges from five dev-shop guides published February through June 2026 — SoftTeco, DestiLabs, ProductCrafters, Cleveroad, and Azilen — with each page’s simple, mid, and complex tiers and whether any reproducible methodology is stated. All retrieved July 3, 2026.
SourcePublished“Simple” tierMid tierComplex tierMethodology shown?
SoftTecoJun 9, 2026$10K–$30K (POC) · $20K–$80K (simple)$20K–$60K (MVP)$100K–$500K+Partial — cites $40–$180/hr US rates, never ties them to the quoted totals
DestiLabsFeb 23, 2026$8K–$25K (conversational)$25K–$80K (task) · $80K–$200K (multi-agent)$200K–$500K+ (platform)Claims a “50+ projects” basis; underlying dataset undisclosed
ProductCraftersFeb 24, 2026$20K–$35K+$40K–$70K+ · $80K–$120K+$100K–$200K+None — explicit “no one-size-fits-all price tag” disclaimer
CleveroadFeb 18, 2026$20K–$35K+$40K–$70K+ · $80K–$120K$100K–$200K+None — ranges read as experience-based, no derivation shown
AzilenFeb 18, 2026$10K–$50K (chatbot)$50K–$120K+ (task) · $80K–$180K+ (RAG)$150K–$400K+None disclosed; states $3,200–$13,000/mo average run cost

To be fair to the genre, some of the spread is real scope variation — enterprise compliance, on-prem deployment, and deep legacy integrations legitimately multiply cost. But none of the five pages state token volumes, engineering-day counts, or the day rate behind their totals, so a reader cannot tell scope variation from guesswork. ProductCrafters says it outright: “there isn’t a one-size-fits-all price tag.” True — which is exactly why the useful artifact is a formula with stated assumptions, not another point estimate. Even the recurring maintenance convention — three of the five cite 15–25% of build cost per year — is quoted as a rule of thumb with no derivation anywhere.

Why this matters
The internal inconsistency of the cost-guide SERP is itself the evidence: when five pages publishing in the same four-month window disagree by an order of magnitude on the same descriptor, the genre’s numbers cannot all be load-bearing. A cost model you can recompute is worth more than a range you have to trust.

02MethodologyOne day rate, stated volumes, published token prices.

Everything in this index derives from two formulas. Build cost = stated engineering days × a stated day rate. Monthly run cost = stated token volumes × verified July 2026 model prices, plus hosting, vector database, monitoring, and senior-oversight maintenance. That’s it — every cell in the tables below is one of those two multiplications plus a sum.

The day rate is $900 per engineering day, and it is explicitly Digital Applied’s own assumption, not an industry benchmark. It derives from our published pricing: 1 unit = €100, framed as roughly one hour of senior-level work delivered as senior-led, agent-assisted capacity — a senior person scopes, briefs, and reviews while agents compress execution. Eight units a day is €800; converted at the July 3, 2026 EUR/USD spot rate of roughly 1.146 that is about $917, rounded down to $900 for clean arithmetic. Our own published engagements cross-check the scale: packaged tiers run €2,000–€4,000 for 20–40 units, and a typical 10-week AI transformation engagement is scoped at 100–150 units (€10,000–€15,000).

For market color — and only color — freelance marketplaces quote wide bands: Upwork lists a $50/hour median for AI engineers with senior specialists at $100–$300/hour, and Toptal AI/ML listings are publicly reported at $100–$200+/hour billed to clients. If your delivery model prices differently, substitute your day rate into the formula; the index survives the swap, which is the point.

Digital Applied methodology — stated, not surveyed
These bands are our stated-assumption model, not an industry survey. Build = engineering days × $900/day (senior-led, agent-assisted delivery). Run = monthly tokens × July 2026 list prices + hosting + vector DB + monitoring + senior oversight days × $900. Every assumption is printed so you can replace any of them and recompute.

The index prices three agent classes, each with a stated monthly workload. These volumes are the second lever you should swap for your own — they drive every token figure downstream.

Class 01
Simple task agent
10,000 requests/mo · 1,200 in / 400 out per request

One LLM call per request behind a defined trigger — form triage, inbox routing, structured extraction, FAQ answering over a small prompt. Short system prompt, short response, no retrieval.

6–10 engineering days
Class 02
RAG-workflow agent
30,000 requests/mo · 4,000 in / 600 out per request

Retrieval over your own corpus plus a multi-step workflow — the input tokens carry retrieved context, and a vector database joins the bill of materials.

15–25 engineering days
Class 03
Multi-agent system
20,000 tasks/mo · ~6 calls each · 3,500 in / 700 out

An orchestrator decomposing tasks across subagents — 120,000 total model calls a month. Coordination overhead multiplies token volume and makes evaluation tooling non-optional.

30–50 engineering days

03The IndexThe Build & Run Cost Index — July 2026.

The headline table. Build bands are engineering days × $900; run bands are the sums of the line items broken out in the next section, with tokens priced at Claude Sonnet 5’s introductory rate ($2 input / $10 output per million tokens) — the realistic mid-market default, since Sonnet 5 is Anthropic’s default model on its Free and Pro tiers as of July 1, 2026. This index is dated July 2026 and priced from vendor pages verified July 3, 2026.

The Digital Applied AI Agent Build and Run Cost Index, dated July 2026 — for each of three agent classes, the stated engineering days, build cost band and median at $900 per day, and the monthly run cost band and median summed from token, hosting, vector database, monitoring, and maintenance line items at July 2026 published rates. Digital Applied stated-assumption methodology, not an industry survey.
Agent classEng-days (stated)Build cost (days × $900)Build medianMonthly run bandRun median
Simple task agent6–10$5,400–$9,000$7,200 (8 days)$534–$1,068~$800
RAG-workflow agent15–25$13,500–$22,500$18,000 (20 days)$1,925–$3,420~$2,700
Multi-agent system30–50$27,000–$45,000$36,000 (40 days)$4,709–$7,790~$6,250

Two honest notes on reading it. First, our build bands sit below most of the SERP’s — a multi-agent system tops out at $45,000 here against $80,000–$500,000 elsewhere — because the day count assumes senior-led, agent-assisted delivery rather than a traditional team-weeks staffing model, and because the bands price the agent itself, not enterprise procurement, compliance programs, or legacy integration projects that can legitimately dominate larger quotes. Second, the run medians are workload-dependent: double the request volumes and the token line doubles with them, which is why the volumes are printed. If you want a spend ceiling before you commit, our guide to budgeting a token spend cap pairs directly with this table.

04Run CostWhere the monthly money actually goes.

This is the table the dev-shop guides don’t publish: the line items. Sum any column’s monthly items and you get the run band in the headline table — that’s the recompute check, and it should be run on every number here before you budget from it.

Monthly run-cost line items for the three agent classes in the Digital Applied index — stated workload assumptions, then token costs at Claude Sonnet 5 introductory pricing, Vercel hosting, Pinecone vector database, monitoring tooling, and senior oversight maintenance, with the total monthly run band and median each column sums to. July 2026 published rates.
Line itemSimple task agentRAG-workflow agentMulti-agent system
Workload assumption (stated)
Requests / model calls per month10,000 requests30,000 requests120,000 calls (20,000 tasks × ~6)
Tokens per call (input / output)1,200 / 4004,000 / 6003,500 / 700
Monthly tokens (input / output)12M / 4M120M / 18M420M / 84M
Monthly line items (July 2026 rates)
Model tokens — Sonnet 5 intro, $2 / $10 per Mtok$64$420$1,680
Hosting — Vercel Pro seat + Fluid Compute$20–$25$25–$40$30–$60
Vector DB — Pinecone Standard floor$50–$100$50–$150
Monitoring — free tier → first paid step$0–$79$80–$160$249–$500
Maintenance — senior oversight days × $900$450–$900 (0.5–1 day)$1,350–$2,700 (1.5–3 days)$2,700–$5,400 (3–6 days)
Totals
Monthly run band (sum of line items)$534–$1,068$1,925–$3,420$4,709–$7,790
Run median~$800~$2,700~$6,250

The trend hiding in this table is the one most 2026 cost guides miss because they were written before the mid-2026 price moves: tokens are no longer the dominant run-cost line for most agents. At July 2026 rates, the token line is 8% of the simple-agent run median ($64 of ~$800), roughly 16% of the RAG median ($420 of ~$2,700), and roughly 27% of the multi-agent median ($1,680 of ~$6,250). The largest line in every class is senior oversight — the human hours that keep an agent accurate, evaluated, and improving. Several competitor pages still frame LLM API usage as the budget driver, quoting $1,000–$8,000+ per month for it; at current prices that framing is a year out of date.

Hosting is the clearest example of how cheap the plumbing has become. On Vercel’s published rate card (updated June 16, 2026), Fluid Compute bills active CPU at $0.128 per CPU-hour in US regions — and billing pauses during I/O waits, which for an agent means the long seconds spent waiting on the LLM API cost nothing in CPU. A worked example: 10,000 requests a month at 1GB memory, ~5 seconds average lifetime but only ~0.3 seconds of active CPU per request, comes to roughly $0.11 of CPU, $0.15 of provisioned memory, and under a cent of invocation fees — ≈$0.26 a month of raw compute. The real hosting line is the $20/month Pro seat (which itself includes a $20 usage credit); EU regions run higher at $0.184 per CPU-hour in Frankfurt, and usage-based observability adds $1.20 per million events if you turn it on.

The other two lines deserve their floors stated plainly. Pinecone’s Standard serverless tier carries a $50/month minimum commitment regardless of usage — the well-corroborated anchor for a managed vector database (per-unit read/write rates vary by plan; verify them on the pricing page before modeling to the cent). And monitoring runs free at light volume — LangSmith’s Developer tier covers 5,000 traces a month, Helicone’s free tier 10,000 requests, Braintrust’s roughly 10,000 scores — before the first paid step at $39/seat (LangSmith Plus), $79 flat (Helicone Pro), or $249 flat (Braintrust Pro). Enterprise platforms are murkier: Datadog’s LLM Observability pricing is not published on its pricing pages, and third-party reports of a premium that activates automatically when LLM spans are detected are just that — reported, not vendor-confirmed. Treat enterprise observability as a line that adds meaningful per-day cost and demands a quote. Our full breakdown of evals, traces, and what they cost goes deeper on this line item.

Hosting compute
Raw serverless compute, worked example
$0.26/mo

10,000 requests/month at 1GB memory, ~5s lifetime, ~0.3s active CPU on Vercel Fluid Compute (US rates). Active-CPU billing pauses during LLM waits — the seat fee, not usage, is the hosting line.

Vercel rate card, Jun 16, 2026
Vector DB floor
Pinecone Standard minimum
$50/mo

The Standard serverless tier carries a $50/month minimum commitment regardless of usage — the reliable anchor. Per-unit read/write rates vary; verify on the pricing page before modeling to the cent.

pinecone.io/pricing, Jul 2026
Observability step
Free tier → first paid tier
$0–249/mo

Free tiers cover light volume (LangSmith 5K traces, Helicone 10K requests, Braintrust ~10K scores). First paid steps: LangSmith Plus $39/seat, Helicone Pro $79 flat, Braintrust Pro $249 flat.

Vendor pricing pages, Jul 2026

05Model SensitivityOne workload, seven prices: 1x to 9.1x.

The single biggest run-cost decision is the model. To compare fairly, we standardize one “agent turn” at 1,500 input and 400 output tokens and price 1,000 of them at each model’s verified July 2026 list rate. The spread is 9.1x between the cheapest and most expensive current-generation options — for the identical workload.

Model price sensitivity for AI agents, July 2026 — for seven current models, the list price per million input and output tokens, the computed cost of 1,000 standardized agent turns of 1,500 input and 400 output tokens each, and the multiplier versus the cheapest option. Vendor list prices verified July 3, 2026; Digital Applied arithmetic.
Model$/Mtok in$/Mtok outPer 1,000 turnsvs cheapest
GLM-5.2 (Z.ai list)$1.40$4.40$3.861.0x (baseline)
Claude Sonnet 5 (intro, through Aug 31)$2.00$10.00$7.001.8x
GPT-5.6 Terra (preview pricing)$2.50$15.00$9.752.5x
Claude Sonnet 5 (from Sep 1, 2026)$3.00$15.00$10.502.7x
Claude Opus 4.8$5.00$25.00$17.504.5x
GPT-5.5 (standard, under 272K context)$5.00$30.00$19.505.1x
Claude Fable 5$10.00$50.00$35.009.1x

Cost per 1,000 agent turns · July 2026 list prices

Source: vendor list prices (Z.ai, Anthropic, OpenAI) verified Jul 3, 2026 · cost per 1,000 agent turns of 1,500 in / 400 out tokens · Digital Applied arithmetic
GLM-5.2$1.40 / $4.40 per Mtok · Z.ai list
$3.86
Sonnet 5 (intro)$2 / $10 · through Aug 31, 2026
$7.00
GPT-5.6 Terra (preview)$2.50 / $15 · not GA, subject to change
$9.75
Sonnet 5 (from Sep 1)$3 / $15 · standard rate
$10.50
Opus 4.8$5 / $25 · 1M context
$17.50
GPT-5.5$5 / $30 · standard, under 272K context
$19.50
Fable 5$10 / $50 · cache reads $1/M, Batch $5/$25
$35.00
The Sonnet 5 caveat — mandatory for run-cost math
Anthropic’s own framing of the introductory price is unusually candid: “The introductory pricing is set so that the transition to Sonnet 5 is roughly cost-neutral.” The reason is the new tokenizer, which Anthropic says produces approximately 30% more tokens for the same text (up to 1.35x in some analyses) — so the sticker drop from Sonnet 4.6 is partly offset by higher token counts per request. After August 31, 2026 the rate itself rises to $3/$15, and the tokenizer inflation compounds on top of it. Vendor-stated framing; budget on effective per-request cost, not the sticker.

Three labeling notes so this table stays honest. GPT-5.6 Terra is preview-only — OpenAI positions it as delivering competitive performance to GPT-5.5 at roughly half the price, but it is not GA and the $2.50/$15 rate is preview pricing, subject to change. Claude Fable 5 returned to full global availability on July 1, 2026 after a June 12–30 export-control suspension — worth knowing before treating the premium tier as a stable default. And GLM-5.2’s $1.40/$4.40 is Z.ai’s list price; third-party hosts run $0.93–$1.40 in and $3.00–$4.40 out, a spread we map provider-by-provider in our GLM-5.2 pricing comparison. For the running picture across every major vendor, our LLM pricing index tracks these rates as they move.

06Market ContextThe price war that keeps re-dating this index.

Why date a cost index to the month? Because the inputs keep moving. Forbes reported in June 2026 that Anthropic cut Opus pricing 67% at the Opus 4.5 launch in November 2025, that OpenAI’s Flex processing offers a 50% discount against standard rates, and — citing underlying usage data — that business token consumption grew roughly 1,001% between January 2025 and April 2026 while token spend grew only about 497%. Volume is outrunning spend because per-token prices are falling faster than usage is rising; secondary trackers reportedly put the blended cost decline at roughly $18.40 to $6.07 per million tokens between Q1 2025 and Q1 2026.

Falling unit prices do not automatically mean falling bills. The same Forbes piece reported that Uber consumed its entire 2026 AI coding budget in four months, with per-engineer costs for tools like Claude Code and Cursor running $500–$2,000 a month. Cheaper tokens invite more usage; without stated volumes and a cap, the budget conversation happens after the money is gone. That failure mode — cost without a traceable line to value — is the same one behind the pilot-graveyard statistics we unpacked in our analysis of why most agent pilots never reach production.

"If you're not actually able to draw a direct line to useful features and functionality you're shipping to your users, that trade becomes harder to justify."— Andrew Macdonald, COO, Uber — quoted in Forbes, June 11, 2026
The enterprise stakes — Gartner
Gartner’s July 1, 2026 press release estimates that up to $234 billion of enterprise application software spend — roughly 20% of enterprise application SaaS spend — is exposed to “agentic arbitrage” between now and 2030. “Agentic AI changes the economics of software,” as Gartner’s George Brocklehurst puts it. The same firm predicts more than 40% of agentic AI projects will be canceled by end-2027 on escalating costs and unclear value — both claims are Gartner’s analysis, and together they frame the stakes: the economics are real, and so is the failure rate for teams that don’t cost their builds honestly.

Projecting forward from the July snapshot: at least three dated events will move this index before year-end. Sonnet 5’s introductory pricing ends August 31, taking the mid-market reference rate from $2/$10 to $3/$15 — on top of a tokenizer that produces more tokens per request. GPT-5.6 Terra will exit preview at some point, and if its GA pricing holds near $2.50/$15 it undercuts the post-intro Sonnet rate directly. And if the past 18 months of cuts are any guide, at least one vendor reprices again before January. The build side of the index moves slower — day rates don’t drop 67% in a quarter — which is precisely why the run-cost side needs a dated, recomputable formula rather than a static range.

07Budgeting LeversFour levers that move your number.

The index is a starting grid, not a quote. Four substitutions adapt it to any project — and each is a genuine decision, not a formality.

Lever 01
Swap the model

The 1x–9.1x per-turn spread is the biggest run-cost dial. Route routine turns to a budget model and reserve premium models for the reasoning-heavy fraction — a two-tier stack can cut the token line hard without capping capability.

Biggest run-cost impact
Lever 02
Swap the volumes

Our stated workloads (10K/30K/120K calls per month) drive every token figure. Replace them with your real traffic and recompute — then set a hard monthly token budget so growth is a decision, not a surprise.

Do this before anything else
Lever 03
Swap the day rate

Our $900/day is a stated Digital Applied assumption from senior-led, agent-assisted delivery. A US contract-senior staffing model or an offshore team model will produce a different — and equally recomputable — build band.

Changes build cost only
Lever 04
Question the build itself

Some agents shouldn’t be built — an off-the-shelf tool or a paid feature that funds its own run cost can beat a custom build. Run the build-vs-buy test before pricing engineering days.

The zero-cost option

Two levers that mostly don’t work in 2026 deserve a warning label. Self-hosting an open-weight model to escape API pricing sounds like a run-cost lever, but the hardware reality is brutal at the frontier — GLM-5.2 at FP16 needs roughly 1.57TB of VRAM, on the order of 25 A100-80GB cards — so for most SMB and mid-market buyers it isn’t a real option; our self-hosting decision guide covers when it genuinely beats API pricing. And squeezing the maintenance line to zero is the false economy behind most agent decay: the oversight days are what keep the agent worth running. As Alex Salazar, CEO of Arcade.dev, wrote in CIO.com in December 2025: “Stop chasing moonshots. These vaguely scoped, overgeneralized agent dreams are often expensive, they rarely ship and they burn resources faster than they can create value.” Scope tightly, cost honestly. For the upstream question of whether to build at all, see the build-vs-buy decision — and if the agent faces customers, consider charging for AI-built features to cover run cost.

08ConclusionOwn the formula, not the range.

The index, restated

A cost model you can recompute beats a range you have to trust.

The July 2026 numbers: a simple task agent at $5,400–$9,000 to build and ~$800 a month to run; a RAG-workflow agent at $13,500–$22,500 and ~$2,700; a multi-agent system at $27,000–$45,000 and ~$6,250. Every cell derives from stated engineering days × $900/day and stated token volumes × verified July 2026 prices — swap any assumption and the index recomputes for your project.

The structural finding matters more than any single band: tokens have stopped being the budget driver. At current prices the model line is 8–27% of the monthly run median across our three classes, serverless compute is effectively a rounding error, and the largest recurring cost is the senior oversight that keeps an agent accurate. Teams still budgeting agents as “an API bill plus hosting” are pricing the smallest lines and ignoring the biggest one.

This page is a snapshot, deliberately dated. Sonnet 5’s introductory pricing ends August 31, 2026; GPT-5.6 Terra is still in preview; the price war has more rounds in it. When the inputs move, re-run the formula — that’s the entire point of publishing one.

Scope an agent build on stated assumptions

Priced from a formula you can check — before we build.

Our team scopes agent builds the same way this index prices them — stated engineering days, stated volumes, verified rates, and a run-cost model you see before anything is built.

Free consultationExpert guidanceTailored solutions
What we work on

Agent build & cost engagements

  • Agent builds scoped in stated engineering days
  • Run-cost modeling at current verified token rates
  • Model routing — budget tiers for routine turns
  • Token budget caps & observability setup
  • Build-vs-buy evaluations before any build
FAQ · Agent cost index

The questions we get every week.

Under Digital Applied’s stated-assumption methodology — engineering days × a $900/day senior-led, agent-assisted rate — a simple task agent runs $5,400–$9,000 (6–10 days), a RAG-workflow agent $13,500–$22,500 (15–25 days), and a multi-agent system $27,000–$45,000 (30–50 days). These are our stated assumptions, not an industry survey: dev-shop guides quote anywhere from $8,000 to $500,000+ for similar descriptors, usually without publishing the math. Because our model is a formula, you can substitute your own day rate and day count and get a number for your delivery model instead of trusting anyone’s range — ours included.
Related dispatches

Continue exploring agent economics.