The cost to build an AI agent in 2026 is the most-searched, worst- answered question in the agentic economy: the top-ranking guides quote anywhere from $8,000 to $500,000+ for functionally identical descriptors, and not one of them publishes the math behind the range. This index takes the opposite approach — a stated-assumption model where every dollar figure traces to a formula you can recompute and re-run with your own numbers.
Getting this number wrong is expensive in both directions. Overestimate and you shelve an automation that would have paid for itself in a quarter; underestimate and you join the projects Gartner predicts will be canceled — the firm forecast in June 2025 that more than 40% of agentic AI projects will be scrapped by the end of 2027, citing escalating costs, unclear business value, and inadequate risk controls.
Below: the five competing cost pages and why they disagree with each other by 10x on the same tier, the Digital Applied build and run cost model with every assumption stated, a per-model price sensitivity table spanning 1x to 9.1x, and the July 2026 price-war context that will force this index to be re-dated. Everything is a snapshot, deliberately labeled with its date.
- 01The SERP disagrees with itself by 10x.Five dev-shop guides published within four months quote “simple agent” builds from $8,000 to $80,000 — none state token volumes, engineering days, or a day rate, so none can be recomputed or compared.
- 02Our index: three classes, every cell recomputable.Digital Applied methodology — simple task agent $5,400–$9,000 build / ~$800 monthly run; RAG-workflow agent $13,500–$22,500 / ~$2,700; multi-agent system $27,000–$45,000 / ~$6,250. All from stated engineering days × $900/day and July 2026 token prices.
- 03Tokens are no longer the dominant run-cost line.At July 2026 rates, model tokens are 8% of the simple-agent run median, roughly 16% of the RAG median, and roughly 27% of the multi-agent median. Senior oversight time is the largest line in all three classes.
- 04Model choice is a 1x-to-9.1x multiplier.The same 1,000 agent turns cost $3.86 on GLM-5.2 and $35.00 on Claude Fable 5 at list prices — with Sonnet 5, GPT-5.6 Terra preview, Opus 4.8, and GPT-5.5 spread between them.
- 05Serverless compute is a rounding error.A worked example on Vercel’s published rate card puts raw compute for a 10,000-request/month agent at ≈$0.26 — active-CPU billing pauses during LLM waits, so the real hosting line is the $20/month Pro seat.
01 — The ProblemFive guides, one descriptor, a 10x spread.
Search “AI agent development cost” and the results are dominated by dev-shop guides — SoftTeco, DestiLabs, ProductCrafters, Cleveroad, and Azilen all published or refreshed theirs between February and June 2026. Put their numbers side by side and the genre’s problem is immediate: the same “simple agent” descriptor spans $8,000 at the bottom of DestiLabs’ band to $80,000 at the top of SoftTeco’s — a 10x disagreement on the same tier, published within a four-month window.
| Source | Published | “Simple” tier | Mid tier | Complex tier | Methodology shown? |
|---|---|---|---|---|---|
| SoftTeco | Jun 9, 2026 | $10K–$30K (POC) · $20K–$80K (simple) | $20K–$60K (MVP) | $100K–$500K+ | Partial — cites $40–$180/hr US rates, never ties them to the quoted totals |
| DestiLabs | Feb 23, 2026 | $8K–$25K (conversational) | $25K–$80K (task) · $80K–$200K (multi-agent) | $200K–$500K+ (platform) | Claims a “50+ projects” basis; underlying dataset undisclosed |
| ProductCrafters | Feb 24, 2026 | $20K–$35K+ | $40K–$70K+ · $80K–$120K+ | $100K–$200K+ | None — explicit “no one-size-fits-all price tag” disclaimer |
| Cleveroad | Feb 18, 2026 | $20K–$35K+ | $40K–$70K+ · $80K–$120K | $100K–$200K+ | None — ranges read as experience-based, no derivation shown |
| Azilen | Feb 18, 2026 | $10K–$50K (chatbot) | $50K–$120K+ (task) · $80K–$180K+ (RAG) | $150K–$400K+ | None disclosed; states $3,200–$13,000/mo average run cost |
To be fair to the genre, some of the spread is real scope variation — enterprise compliance, on-prem deployment, and deep legacy integrations legitimately multiply cost. But none of the five pages state token volumes, engineering-day counts, or the day rate behind their totals, so a reader cannot tell scope variation from guesswork. ProductCrafters says it outright: “there isn’t a one-size-fits-all price tag.” True — which is exactly why the useful artifact is a formula with stated assumptions, not another point estimate. Even the recurring maintenance convention — three of the five cite 15–25% of build cost per year — is quoted as a rule of thumb with no derivation anywhere.
02 — MethodologyOne day rate, stated volumes, published token prices.
Everything in this index derives from two formulas. Build cost = stated engineering days × a stated day rate. Monthly run cost = stated token volumes × verified July 2026 model prices, plus hosting, vector database, monitoring, and senior-oversight maintenance. That’s it — every cell in the tables below is one of those two multiplications plus a sum.
The day rate is $900 per engineering day, and it is explicitly Digital Applied’s own assumption, not an industry benchmark. It derives from our published pricing: 1 unit = €100, framed as roughly one hour of senior-level work delivered as senior-led, agent-assisted capacity — a senior person scopes, briefs, and reviews while agents compress execution. Eight units a day is €800; converted at the July 3, 2026 EUR/USD spot rate of roughly 1.146 that is about $917, rounded down to $900 for clean arithmetic. Our own published engagements cross-check the scale: packaged tiers run €2,000–€4,000 for 20–40 units, and a typical 10-week AI transformation engagement is scoped at 100–150 units (€10,000–€15,000).
For market color — and only color — freelance marketplaces quote wide bands: Upwork lists a $50/hour median for AI engineers with senior specialists at $100–$300/hour, and Toptal AI/ML listings are publicly reported at $100–$200+/hour billed to clients. If your delivery model prices differently, substitute your day rate into the formula; the index survives the swap, which is the point.
The index prices three agent classes, each with a stated monthly workload. These volumes are the second lever you should swap for your own — they drive every token figure downstream.
Simple task agent
One LLM call per request behind a defined trigger — form triage, inbox routing, structured extraction, FAQ answering over a small prompt. Short system prompt, short response, no retrieval.
RAG-workflow agent
Retrieval over your own corpus plus a multi-step workflow — the input tokens carry retrieved context, and a vector database joins the bill of materials.
Multi-agent system
An orchestrator decomposing tasks across subagents — 120,000 total model calls a month. Coordination overhead multiplies token volume and makes evaluation tooling non-optional.
03 — The IndexThe Build & Run Cost Index — July 2026.
The headline table. Build bands are engineering days × $900; run bands are the sums of the line items broken out in the next section, with tokens priced at Claude Sonnet 5’s introductory rate ($2 input / $10 output per million tokens) — the realistic mid-market default, since Sonnet 5 is Anthropic’s default model on its Free and Pro tiers as of July 1, 2026. This index is dated July 2026 and priced from vendor pages verified July 3, 2026.
| Agent class | Eng-days (stated) | Build cost (days × $900) | Build median | Monthly run band | Run median |
|---|---|---|---|---|---|
| Simple task agent | 6–10 | $5,400–$9,000 | $7,200 (8 days) | $534–$1,068 | ~$800 |
| RAG-workflow agent | 15–25 | $13,500–$22,500 | $18,000 (20 days) | $1,925–$3,420 | ~$2,700 |
| Multi-agent system | 30–50 | $27,000–$45,000 | $36,000 (40 days) | $4,709–$7,790 | ~$6,250 |
Two honest notes on reading it. First, our build bands sit below most of the SERP’s — a multi-agent system tops out at $45,000 here against $80,000–$500,000 elsewhere — because the day count assumes senior-led, agent-assisted delivery rather than a traditional team-weeks staffing model, and because the bands price the agent itself, not enterprise procurement, compliance programs, or legacy integration projects that can legitimately dominate larger quotes. Second, the run medians are workload-dependent: double the request volumes and the token line doubles with them, which is why the volumes are printed. If you want a spend ceiling before you commit, our guide to budgeting a token spend cap pairs directly with this table.
04 — Run CostWhere the monthly money actually goes.
This is the table the dev-shop guides don’t publish: the line items. Sum any column’s monthly items and you get the run band in the headline table — that’s the recompute check, and it should be run on every number here before you budget from it.
| Line item | Simple task agent | RAG-workflow agent | Multi-agent system |
|---|---|---|---|
| Workload assumption (stated) | |||
| Requests / model calls per month | 10,000 requests | 30,000 requests | 120,000 calls (20,000 tasks × ~6) |
| Tokens per call (input / output) | 1,200 / 400 | 4,000 / 600 | 3,500 / 700 |
| Monthly tokens (input / output) | 12M / 4M | 120M / 18M | 420M / 84M |
| Monthly line items (July 2026 rates) | |||
| Model tokens — Sonnet 5 intro, $2 / $10 per Mtok | $64 | $420 | $1,680 |
| Hosting — Vercel Pro seat + Fluid Compute | $20–$25 | $25–$40 | $30–$60 |
| Vector DB — Pinecone Standard floor | — | $50–$100 | $50–$150 |
| Monitoring — free tier → first paid step | $0–$79 | $80–$160 | $249–$500 |
| Maintenance — senior oversight days × $900 | $450–$900 (0.5–1 day) | $1,350–$2,700 (1.5–3 days) | $2,700–$5,400 (3–6 days) |
| Totals | |||
| Monthly run band (sum of line items) | $534–$1,068 | $1,925–$3,420 | $4,709–$7,790 |
| Run median | ~$800 | ~$2,700 | ~$6,250 |
The trend hiding in this table is the one most 2026 cost guides miss because they were written before the mid-2026 price moves: tokens are no longer the dominant run-cost line for most agents. At July 2026 rates, the token line is 8% of the simple-agent run median ($64 of ~$800), roughly 16% of the RAG median ($420 of ~$2,700), and roughly 27% of the multi-agent median ($1,680 of ~$6,250). The largest line in every class is senior oversight — the human hours that keep an agent accurate, evaluated, and improving. Several competitor pages still frame LLM API usage as the budget driver, quoting $1,000–$8,000+ per month for it; at current prices that framing is a year out of date.
Hosting is the clearest example of how cheap the plumbing has become. On Vercel’s published rate card (updated June 16, 2026), Fluid Compute bills active CPU at $0.128 per CPU-hour in US regions — and billing pauses during I/O waits, which for an agent means the long seconds spent waiting on the LLM API cost nothing in CPU. A worked example: 10,000 requests a month at 1GB memory, ~5 seconds average lifetime but only ~0.3 seconds of active CPU per request, comes to roughly $0.11 of CPU, $0.15 of provisioned memory, and under a cent of invocation fees — ≈$0.26 a month of raw compute. The real hosting line is the $20/month Pro seat (which itself includes a $20 usage credit); EU regions run higher at $0.184 per CPU-hour in Frankfurt, and usage-based observability adds $1.20 per million events if you turn it on.
The other two lines deserve their floors stated plainly. Pinecone’s Standard serverless tier carries a $50/month minimum commitment regardless of usage — the well-corroborated anchor for a managed vector database (per-unit read/write rates vary by plan; verify them on the pricing page before modeling to the cent). And monitoring runs free at light volume — LangSmith’s Developer tier covers 5,000 traces a month, Helicone’s free tier 10,000 requests, Braintrust’s roughly 10,000 scores — before the first paid step at $39/seat (LangSmith Plus), $79 flat (Helicone Pro), or $249 flat (Braintrust Pro). Enterprise platforms are murkier: Datadog’s LLM Observability pricing is not published on its pricing pages, and third-party reports of a premium that activates automatically when LLM spans are detected are just that — reported, not vendor-confirmed. Treat enterprise observability as a line that adds meaningful per-day cost and demands a quote. Our full breakdown of evals, traces, and what they cost goes deeper on this line item.
Raw serverless compute, worked example
10,000 requests/month at 1GB memory, ~5s lifetime, ~0.3s active CPU on Vercel Fluid Compute (US rates). Active-CPU billing pauses during LLM waits — the seat fee, not usage, is the hosting line.
Pinecone Standard minimum
The Standard serverless tier carries a $50/month minimum commitment regardless of usage — the reliable anchor. Per-unit read/write rates vary; verify on the pricing page before modeling to the cent.
Free tier → first paid tier
Free tiers cover light volume (LangSmith 5K traces, Helicone 10K requests, Braintrust ~10K scores). First paid steps: LangSmith Plus $39/seat, Helicone Pro $79 flat, Braintrust Pro $249 flat.
05 — Model SensitivityOne workload, seven prices: 1x to 9.1x.
The single biggest run-cost decision is the model. To compare fairly, we standardize one “agent turn” at 1,500 input and 400 output tokens and price 1,000 of them at each model’s verified July 2026 list rate. The spread is 9.1x between the cheapest and most expensive current-generation options — for the identical workload.
| Model | $/Mtok in | $/Mtok out | Per 1,000 turns | vs cheapest |
|---|---|---|---|---|
| GLM-5.2 (Z.ai list) | $1.40 | $4.40 | $3.86 | 1.0x (baseline) |
| Claude Sonnet 5 (intro, through Aug 31) | $2.00 | $10.00 | $7.00 | 1.8x |
| GPT-5.6 Terra (preview pricing) | $2.50 | $15.00 | $9.75 | 2.5x |
| Claude Sonnet 5 (from Sep 1, 2026) | $3.00 | $15.00 | $10.50 | 2.7x |
| Claude Opus 4.8 | $5.00 | $25.00 | $17.50 | 4.5x |
| GPT-5.5 (standard, under 272K context) | $5.00 | $30.00 | $19.50 | 5.1x |
| Claude Fable 5 | $10.00 | $50.00 | $35.00 | 9.1x |
Cost per 1,000 agent turns · July 2026 list prices
Source: vendor list prices (Z.ai, Anthropic, OpenAI) verified Jul 3, 2026 · cost per 1,000 agent turns of 1,500 in / 400 out tokens · Digital Applied arithmeticThree labeling notes so this table stays honest. GPT-5.6 Terra is preview-only — OpenAI positions it as delivering competitive performance to GPT-5.5 at roughly half the price, but it is not GA and the $2.50/$15 rate is preview pricing, subject to change. Claude Fable 5 returned to full global availability on July 1, 2026 after a June 12–30 export-control suspension — worth knowing before treating the premium tier as a stable default. And GLM-5.2’s $1.40/$4.40 is Z.ai’s list price; third-party hosts run $0.93–$1.40 in and $3.00–$4.40 out, a spread we map provider-by-provider in our GLM-5.2 pricing comparison. For the running picture across every major vendor, our LLM pricing index tracks these rates as they move.
06 — Market ContextThe price war that keeps re-dating this index.
Why date a cost index to the month? Because the inputs keep moving. Forbes reported in June 2026 that Anthropic cut Opus pricing 67% at the Opus 4.5 launch in November 2025, that OpenAI’s Flex processing offers a 50% discount against standard rates, and — citing underlying usage data — that business token consumption grew roughly 1,001% between January 2025 and April 2026 while token spend grew only about 497%. Volume is outrunning spend because per-token prices are falling faster than usage is rising; secondary trackers reportedly put the blended cost decline at roughly $18.40 to $6.07 per million tokens between Q1 2025 and Q1 2026.
Falling unit prices do not automatically mean falling bills. The same Forbes piece reported that Uber consumed its entire 2026 AI coding budget in four months, with per-engineer costs for tools like Claude Code and Cursor running $500–$2,000 a month. Cheaper tokens invite more usage; without stated volumes and a cap, the budget conversation happens after the money is gone. That failure mode — cost without a traceable line to value — is the same one behind the pilot-graveyard statistics we unpacked in our analysis of why most agent pilots never reach production.
"If you're not actually able to draw a direct line to useful features and functionality you're shipping to your users, that trade becomes harder to justify."— Andrew Macdonald, COO, Uber — quoted in Forbes, June 11, 2026
Projecting forward from the July snapshot: at least three dated events will move this index before year-end. Sonnet 5’s introductory pricing ends August 31, taking the mid-market reference rate from $2/$10 to $3/$15 — on top of a tokenizer that produces more tokens per request. GPT-5.6 Terra will exit preview at some point, and if its GA pricing holds near $2.50/$15 it undercuts the post-intro Sonnet rate directly. And if the past 18 months of cuts are any guide, at least one vendor reprices again before January. The build side of the index moves slower — day rates don’t drop 67% in a quarter — which is precisely why the run-cost side needs a dated, recomputable formula rather than a static range.
07 — Budgeting LeversFour levers that move your number.
The index is a starting grid, not a quote. Four substitutions adapt it to any project — and each is a genuine decision, not a formality.
Swap the model
The 1x–9.1x per-turn spread is the biggest run-cost dial. Route routine turns to a budget model and reserve premium models for the reasoning-heavy fraction — a two-tier stack can cut the token line hard without capping capability.
Swap the volumes
Our stated workloads (10K/30K/120K calls per month) drive every token figure. Replace them with your real traffic and recompute — then set a hard monthly token budget so growth is a decision, not a surprise.
Swap the day rate
Our $900/day is a stated Digital Applied assumption from senior-led, agent-assisted delivery. A US contract-senior staffing model or an offshore team model will produce a different — and equally recomputable — build band.
Question the build itself
Some agents shouldn’t be built — an off-the-shelf tool or a paid feature that funds its own run cost can beat a custom build. Run the build-vs-buy test before pricing engineering days.
Two levers that mostly don’t work in 2026 deserve a warning label. Self-hosting an open-weight model to escape API pricing sounds like a run-cost lever, but the hardware reality is brutal at the frontier — GLM-5.2 at FP16 needs roughly 1.57TB of VRAM, on the order of 25 A100-80GB cards — so for most SMB and mid-market buyers it isn’t a real option; our self-hosting decision guide covers when it genuinely beats API pricing. And squeezing the maintenance line to zero is the false economy behind most agent decay: the oversight days are what keep the agent worth running. As Alex Salazar, CEO of Arcade.dev, wrote in CIO.com in December 2025: “Stop chasing moonshots. These vaguely scoped, overgeneralized agent dreams are often expensive, they rarely ship and they burn resources faster than they can create value.” Scope tightly, cost honestly. For the upstream question of whether to build at all, see the build-vs-buy decision — and if the agent faces customers, consider charging for AI-built features to cover run cost.
08 — ConclusionOwn the formula, not the range.
A cost model you can recompute beats a range you have to trust.
The July 2026 numbers: a simple task agent at $5,400–$9,000 to build and ~$800 a month to run; a RAG-workflow agent at $13,500–$22,500 and ~$2,700; a multi-agent system at $27,000–$45,000 and ~$6,250. Every cell derives from stated engineering days × $900/day and stated token volumes × verified July 2026 prices — swap any assumption and the index recomputes for your project.
The structural finding matters more than any single band: tokens have stopped being the budget driver. At current prices the model line is 8–27% of the monthly run median across our three classes, serverless compute is effectively a rounding error, and the largest recurring cost is the senior oversight that keeps an agent accurate. Teams still budgeting agents as “an API bill plus hosting” are pricing the smallest lines and ignoring the biggest one.
This page is a snapshot, deliberately dated. Sonnet 5’s introductory pricing ends August 31, 2026; GPT-5.6 Terra is still in preview; the price war has more rounds in it. When the inputs move, re-run the formula — that’s the entire point of publishing one.