Agent washing — Gartner’s term for rebranding chatbots, RPA, and AI assistants as “agentic” without substantial agentic capability — has become the defining diligence problem in enterprise software buying. Gartner estimated in June 2025 that of the thousands of vendors claiming agentic AI, only about 130 are the real thing. Every other vendor deck on your desk is, statistically, selling a relabel.
The stakes are not abstract. The same Gartner release predicts that more than 40% of agentic AI projects will be canceled by the end of 2027, citing escalating costs, unclear business value, and inadequate risk controls — and a mislabeled product is a head start on all three. Meanwhile the term itself has jumped from analyst vocabulary into vendor marketing, legal risk memos, and a second Gartner warning in May 2026.
This guide does three things. It pins the definition to its primary sources with the dates most coverage gets wrong. It grounds the pattern in two regulator-confirmed enforcement cases that predate the buzzword. And it gives you the Digital Applied Agent-Washing Test: six axes, scored 0-5, a 30-point maximum, and bands that tell you whether you’re buying an agent or a rebrand.
- 01Agent washing has a precise, citable definition.Gartner’s June 25, 2025 release defines it as rebranding existing products — AI assistants, RPA, chatbots — without substantial agentic capabilities, and estimates only ~130 of thousands of self-described agentic vendors are real.
- 02Market conditions reward the label.Gartner’s 2026 CIO survey finds only 17% of organizations have deployed AI agents while more than 60% expect to within two years — buying intent is running far ahead of hands-on experience, which is exactly the gap a relabel exploits.
- 03The pattern is already regulator territory.SEC charges against Presto Automation and the FTC’s DoNotPay order both punished claims of autonomous capability that reality didn’t match. Neither regulator used the term agent washing — but both cases document the exact claims-vs-reality gap Gartner describes.
- 04Gartner expects over 40% of agentic projects to be canceled by end-2027.The cited causes — escalating costs, unclear business value, inadequate risk controls — all get worse when the underlying product was never agentic to begin with. The same firm still forecasts 15% of daily work decisions running through agentic AI by 2028.
- 05Our 30-point test turns the question into a number.Six axes — autonomy, planning, tool use, memory, feedback loop, human-in-loop boundaries — scored 0-5 against anchored rubrics. 0-10 reads as a rebranded chatbot or RPA, 11-20 a partial or assisted agent, 21-30 genuinely agentic.
01 — The DefinitionWhat agent washing actually means.
The term comes from Gartner, introduced in mid-2025. In its June 25, 2025 press release — the one predicting the 40%-plus cancellation rate — the firm named the practice and sized it. Vendors were, in Gartner’s words, engaging in “agent washing”: the rebranding of existing products, such as AI assistants, robotic process automation (RPA) and chatbots, without substantial agentic capabilities. The deliberate echo of greenwashing is the point — it’s a claims problem, not a technology problem.
What made the release land was the number attached to it. Gartner estimated that of the thousands of vendors marketing agentic AI products, only about 130 were genuinely agentic. That is not a rounding-error minority — it implies the overwhelming majority of "agentic" pitches a buyer receives are repackaged automation wearing this year’s label.
Gartner analyst Anushree Verma’s framing in that release is worth internalizing rather than quoting back at vendors: most agentic projects today are early-stage experiments driven by hype and often misapplied, which blinds organizations to the real cost and complexity of deploying agents at scale. Her sharper observation cuts both ways for buyers — many use cases positioned as agentic today don’t require agentic implementations at all. Sometimes the washed product isn’t just mislabeled; it’s answering a question nobody in your workflow asked.
02 — Get the Dates RightTwo Gartner warnings — eleven months apart.
Here is the detail nearly all secondary coverage gets wrong. Gartner has issued two separate, dated agent-washing warnings, and they don’t say the same thing. The ~130-vendors statistic belongs exclusively to the June 25, 2025 release. On May 20, 2026, Gartner warned again — this time about agent washing in the supply chain planning technology market — without restating that figure. Articles that attribute “130 vendors” to the 2026 release are conflating two distinct statements made nearly eleven months apart.
| Aspect | June 25, 2025 release | May 20, 2026 release |
|---|---|---|
| Announced | June 25, 2025 · Sydney, Australia | May 20, 2026 · Barcelona, Spain |
| Market scope | The agentic AI market at large | Supply chain planning (SCP) technology specifically |
| Analyst | Anushree Verma, Senior Director Analyst | Jan Snoeckx, Senior Director Analyst, Supply Chain practice |
| Core warning | Over 40% of agentic AI projects will be canceled by end-2027; “agent washing” defined as rebranding assistants, RPA, and chatbots without substantial agentic capability | Relabeling conventional automation as agentic obscures real differences between planning tools; vendors claiming end-to-end autonomous supply chain planning before 2027 are overstating what is possible |
| The “~130 vendors” figure | Yes — this release is the origin of the statistic | No — it repeats the warning, not the number |
The 2026 release matters for a different reason: it shows the warning aging into specific markets rather than fading. Per the release, Snoeckx warned that agent washing obscures the real differences between planning tools "by relabeling conventional automation as agentic, increasing the risk of misaligned investments and long-term lock-in.” And it sets a concrete autonomy bar: “True autonomous planning would require the automatic generation of plans, automatic selection of the optimal plan and seamless execution without human intervention. Most current solutions have not reached that level of end-to-end autonomy, and vendors claiming end-to-end autonomous supply chain planning before 2027 are overstating what is possible in the near term.” Swap “planning” for your own domain and that sentence is a usable diligence question.
03 — Why It WorksWashing works because the market wants to believe.
Agent washing isn’t a con that succeeds against buyers' better judgment — it succeeds because of a structural gap between enthusiasm and experience. Gartner’s 2026 CIO survey, cited in its April 15, 2026 Hype Cycle for Agentic AI, found that only 17% of organizations have deployed AI agents to date, yet more than 60% expect to do so within the next two years — described as the most aggressive adoption curve among the emerging technologies measured. When intent runs more than three times ahead of deployment, most buyers evaluating an “agent” have never operated one. That inexperience is the surface agent washing sticks to.
Orgs running AI agents
Per Gartner’s 2026 CIO survey, cited in the April 15, 2026 Hype Cycle for Agentic AI article. Agentic AI currently sits at the Peak of Inflated Expectations on that curve.
Within the next two years
The steepest intent curve of any emerging technology in the survey — a 3.5x gap between expectation and lived deployment experience.
Agentic projects by end-2027
Gartner’s June 25, 2025 forecast, attributed to escalating costs, unclear business value, and inadequate risk controls — all three amplified when the product was washed.
The money tells the same story. A January 2025 Gartner poll of 3,412 webinar attendees found the largest bloc — 42% — making only conservative investments in agentic AI, with 19% investing significantly, 31% waiting or unsure, and 8% not investing at all. Commitment is cautious even while the vocabulary is everywhere.
How organizations were investing in agentic AI · early 2025
Source: Gartner poll of 3,412 webinar attendees, January 2025 (reported June 25, 2025)None of this means the category is fake — the same June 2025 Gartner release forecasts that by 2028, at least 15% of day-to-day work decisions will be made autonomously through agentic AI (up from 0% in 2024) and 33% of enterprise software applications will include agentic AI (up from under 1% in 2024). The washing problem and the genuine trend coexist; that’s precisely what makes the label so profitable to misuse. It also compounds a separate, well-documented problem: even honestly built agent projects struggle to ship — most AI agent projects never reach production for execution reasons that have nothing to do with labeling. A washed product stacks a definitional failure on top of an already unfavorable base rate.
The clearest evidence the term has escaped analyst-land: vendors now use it against each other on stage.
"We want companies to build agents and orchestrate agents and there's a lot of agent washing going on in the market. So that's really important."— Kevin Li, SVP of Product, Optimizely · CMSWire, September 10, 2025
04 — Named CasesThe same pattern, regulator-confirmed.
An important precision: no major current AI-agent vendor has been publicly accused of “agent washing” by that name, and neither of the two regulatory cases below uses the term. What both cases document — with the evidentiary weight of federal enforcement — is the exact claims-versus-reality pattern Gartner describes: products marketed as autonomous that, in practice, ran on undisclosed human labor or untested capability claims. Treat them as the pattern’s case law, not its name-and-shame list.
Presto Automation
The SEC found Presto made materially false and misleading statements about its Presto Voice product between November 2021 and May 2023 — claiming its AI speech recognition eliminated the need for human order-taking when the vast majority of orders required human intervention, and for a period not disclosing that the AI in deployed units was owned and operated by a third party. Presto consented to a cease-and-desist without admitting or denying findings; cooperation spared it a civil penalty.
DoNotPay
Under its Operation AI Comply initiative, the FTC alleged DoNotPay never tested whether its “AI lawyer” performed at the level of a human lawyer and didn’t retain attorneys to check its law-related outputs. The final order — approved 5-0 — imposed $193,000 in monetary relief, required notice to 2021-2023 subscribers, and prohibits advertising that the service performs like a real lawyer without evidence to back it.
Read the SEC’s Presto order and the FTC’s DoNotPay release with a buyer’s eye and two diligence questions fall straight out of them. From Presto: what percentage of this product’s output currently requires human intervention, and who exactly operates the AI? From DoNotPay: what testing supports the capability claim, and can we see it? Both questions reappear as scored axes in the test below.
05 — The BaselineWhat a real agent does that a rebrand can’t.
This post won’t re-litigate the full definition of agentic AI — our agentic AI glossary and AI agent glossary cover the vocabulary in depth. What matters here is the dividing line the washing claim blurs. IBM’s reference definition draws it cleanly: an AI agent autonomously performs tasks by designing workflows with available tools — deciding, acting, and interacting with external systems beyond its training data. A chatbot is a modality; agency is a capability. An LLM-powered chat interface with no tools, no memory, and no reasoning loop is non-agentic no matter how fluent it sounds, and traditional automation that follows predefined rules doesn’t become an agent by being renamed.
AWS’s prescriptive-guidance comparison makes the same cut as a spectrum, scoring traditional AI, software agents, and agentic AI across characteristics like autonomy, proactivity, and agency — from “tools for humans” through “operates independently within predefined bounds” to “operates with purpose, goals, and self-direction.” Its strategic guide sketches the autonomy ladder most washed products quietly sit at the bottom of:
Chain
Every step and its order are authored by a human. Reliable, auditable, and categorically not an agent — this is where classic RPA lives.
Workflow
The system picks the order of known actions at runtime. Useful flexibility, still bounded by a human-authored action set.
Partially autonomous
The agent plans, executes, and adjusts with a human at defined checkpoints. Most genuine enterprise agents in 2026 live here — and honest vendors say so.
Fully autonomous
Rare in production. Per Gartner’s May 2026 warning, vendors claiming full end-to-end autonomy in complex domains before 2027 are overstating what’s possible.
Buyers aren’t short of definitions — they’re short of a way to score a specific product against them. Frameworks exist, but they’re fragmented: AWS’s comparison is taxonomic rather than evaluative; enterprise vendor WRITER published a four-level autonomy framework (noting most misrepresentation involves marketing level-1 tools as level-3 or level-4 — though WRITER sells its own agent product, so read it as positioned analysis); and other vendor blogs offer question checklists without scoring. Nothing in that pile gives a procurement team a number. That’s the gap the next section fills.
06 — Our RubricThe Digital Applied Agent-Washing Test.
The test scores a product on six axes, 0-5 each, for a maximum of 30 points. The axes are a Digital Applied synthesis — informed by IBM’s agent framing, AWS’s autonomy spectrum, and the public Gartner warnings above, but the anchors, weights, and bands are our editorial methodology. Score against what you observe in a live demonstration on your own scenario, never against the deck. Anchor descriptions are written for 0, 3, and 5; score 1-2 or 4 when the product sits between anchors.
| Axis | 0 · Washed | 3 · Assisted | 5 · Agentic |
|---|---|---|---|
| AutonomyDoes it act toward a goal, or wait for a prompt at every step? | Responds only when a human prompts it; every step is human-initiated. A chat window with a new label. | Executes a multi-step sequence once kicked off, but halts at any branch it wasn’t configured for. | Owns a goal end to end — initiates steps, works through branches, and escalates only at defined checkpoints. |
| PlanningDoes it decompose and replan, or run a fixed script? | A fixed flow authored by a human; changing the process means reconfiguring the tool. | Chooses among pre-built paths, with limited re-sequencing when inputs vary. | Decomposes a novel goal into steps and replans when a step fails or new information arrives. |
| Tool useDoes it choose and call systems itself? | No live tool calls — output is text a human then executes somewhere else. | Calls a fixed menu of pre-wired integrations in a pre-set order. | Selects which system or API to call per step, sets its own parameters, and handles errors and retries. |
| MemoryDoes state persist across sessions and change behavior? | Resets every session; no recall of prior work or outcomes. | Holds session context and retrieves documents, but nothing it learns persists as state. | Retains durable state across sessions and uses it to make different decisions next time. |
| Feedback loopDoes it verify its own output and self-correct? | No self-check; every unit of verification is human or external. | Applies basic validation rules it can flag against, but cannot fix its own failures. | Evaluates output against the goal, detects failure, and retries or corrects before handing off. |
| Human-in-loop boundariesAre the human checkpoints explicit and disclosed? | Humans quietly do the work behind an “autonomous” label; the intervention rate is undisclosed. | Checkpoints exist and are acknowledged, but intervention rates aren’t measured or shared. | Checkpoints are documented, intervention rates are measured and reported, and the autonomy claim matches them. |
| Total | Max 30 points (6 axes × 5). Digital Applied bands: 0-10 rebranded chatbot or RPA · 11-20 partial / assisted agent · 21-30 genuinely agentic. | ||
Two scoring notes. First, the sixth axis is deliberately about honesty, not capability: a vendor with explicit, measured human checkpoints scores a 5 there even if the product is only partially autonomous — while a product hiding its humans behind an “autonomous” claim scores 0 on that axis regardless of how good the demo looks. That is the Presto lesson, encoded. Second, the bands measure labeling accuracy, not product quality — an 11-20 assisted agent can be an excellent purchase at an assisted-agent price. What the test catches is paying agentic prices, and accepting agentic risk assumptions, for rung-1 automation. For the price side of that equation, pair this test with what a real agent actually costs to build and run.
07 — In the Vendor CallRunning the test — and acting on the score.
Scoring takes one structured session. Insist on a live run against a scenario you supply — not a recorded demo — and ask three questions the enforcement cases taught us to ask: What percentage of production outputs currently involve human intervention, and will you contractually report that number? Which steps in the flow we just watched were pre-scripted versus planned by the system? What testing evidence backs the headline capability claim? A vendor’s comfort with those questions is itself a signal — the genuinely agentic minority tends to answer them eagerly, because the answers are their moat.
Gartner’s May 2026 release offers three buyer missteps to avoid that map cleanly onto the test: don’t mistake vendor positioning for true autonomy, avoid monolithic transformations and legacy retrofits, and don’t pursue high-risk autonomous use cases too early. Then act on the band you scored:
Treat it as automation procurement
Nothing wrong with buying automation — RPA and chatbots earn their keep. But reprice against automation alternatives, strip agentic assumptions out of the ROI model, and ask why the vendor needed the label.
Buy the assist, not the story
Real value with a human in the loop. Contract on measured behavior: intervention rates, checkpoint definitions, and roadmap commitments in writing — not the word “autonomous” in a slide.
Verify with a scoped pilot
Plausibly one of the genuinely agentic minority. Run a bounded pilot with intervention-rate reporting before production rollout — Gartner’s 40%-cancellation forecast is built on projects that skipped this step.
Undisclosed intervention is a red line
If the demo can’t reveal where humans sit in the loop, you’re looking at the exact pattern the SEC documented at Presto. Escalate diligence or exit — the label problem is now a disclosure problem.
The scorecard also sharpens two adjacent decisions. If you’re evaluating the big CRM platforms’ agent offerings, run the test axis by axis against our guide to CRM AI agents from Salesforce, HubSpot, and Zoho — the same six questions expose exactly where each platform’s "agent" sits on the autonomy ladder. And a low score reframes the build-vs-buy decision: if the branded “agent” scores 8, a purpose-built workflow you own — scoped to rung 2 or 3 honestly — often beats renting the label. That comparative evaluation is the first step of our AI transformation engagements, where we score shortlisted vendors against live demos before any budget is committed.
Looking forward, we expect the test’s sixth axis to become the whole game. With securities lawyers now framing overstated agent claims as disclosure risk, and Gartner tracking governance and FinOps profiles for agentic AI on its 2026 Hype Cycle, the direction of travel is toward measured, reported autonomy — the intervention rate as a contract term, not a secret. Vendors who can publish that number will take the premium; vendors who can’t will quietly drop the label. Buyers who score today are simply ahead of where procurement standards are headed by 2027.
08 — ConclusionScore the claim, then buy the product.
Agent washing is a labeling problem — so measure the label.
The definition is settled and citable: Gartner named agent washing in June 2025 — rebranding assistants, RPA, and chatbots without substantial agentic capability — and estimated only about 130 of thousands of claimed agentic vendors were real. Eleven months later it was warning specific markets about the same pattern, and regulators had already punished the underlying behavior at Presto and DoNotPay without ever needing the word.
The buyer’s error isn’t believing in agents — Gartner itself forecasts agentic AI inside a third of enterprise software by 2028. The error is letting the vendor’s label substitute for measurement while intent runs three times ahead of deployed experience. Six axes, thirty points, one live demo on your own scenario: the score is the diligence.
And keep the test’s spirit rather than just its arithmetic: an honest 14 that’s priced and contracted as an assisted agent will outperform a washed “28” every time. The point of catching agent washing isn’t to avoid the category — it’s to pay the right price, set the right risk controls, and reserve agentic budgets for the minority of products that have earned the noun.