Domain expertise is the new moat in agentic coding — and the data now backs the claim. In June 2026 Anthropic published an analysis of roughly 400,000 Claude Code sessions from about 235,000 users, and its sharpest finding is also its most counterintuitive: how well someone understands the problem they are solving predicts success more reliably than whether they were trained to write code.
That reframes a debate the industry has had backwards for two years. The fear was that coding agents would commoditize developers and hand the job to anyone who could type a prompt. The data tells a more selective story. Agents have not democratized coding for everyone equally; they have amplified the people who already know what correct output looks like. Software engineers reached a 34% verified success rate on code-producing sessions; non-software occupations reached 29% — a gap of just five points — and management occupations scored highest of any group.
This guide walks through what Anthropic actually measured, why management beating engineers is less surprising than it sounds, how expertise compounds across four dimensions of agentic work, and what the shift means for agencies and engineering teams deciding what to hire for. The numbers are vendor-stated and carry caveats we flag throughout — but the direction is clear and the strategic implication is sharper still.
- 01Domain knowledge predicts success, not coding background.Across ~400K Claude Code sessions, how well a person understood the problem mattered more than formal coding training. Software engineers hit 34% verified success on code sessions; non-software occupations hit 29% — within five points.
- 02Management occupations scored highest of any group.The study's most counterintuitive finding: management occupations slightly exceeded software engineers on verified coding tasks. All ten largest occupation groups landed within seven points of each other.
- 03Expertise compounds across four dimensions at once.Experts triggered 12 Claude actions and 3,200 words of output per prompt versus 5 actions and 600 words for novices — a 2.4x action gap and 5.3x output gap. The more expertise you bring, the more work the agent does per instruction.
- 04Experts abandon struggling sessions far less often.When a session hit trouble, novices walked away 19% of the time versus 5–7% for intermediate and expert users — roughly a 3x resilience gap. Knowing how to steer through a stuck agent is itself a form of expertise.
- 05Verification, not generation, is the new scarce skill.Anthropic's companion trends report names verification as a rising bottleneck: the problem is shifting from writing code to judging whether generated code is correct. Domain experts are best positioned to make that call.
01 — The StudyWhat Anthropic actually measured.
On June 16, 2026, Anthropic published "Agentic Coding and Persistent Returns to Expertise," an economic analysis of roughly 400,000 Claude Code sessions drawn from about 235,000 users between October 2025 and April 2026. The headline conclusion is stated plainly in the research: success is determined by how well a person understands the problem they are trying to solve, not whether they are trained in coding.
The methodology matters for how much weight to put on the numbers. Anthropic used a privacy-preserving pipeline in which a Claude model classified transcripts at scale rather than humans reading them. The study deliberately excludes third-party IDE integrations and headless, fully automated Claude Code usage — so every figure below describes human-interactive sessions, not CI/CD automation. Treat the results as a portrait of how people work alongside a coding agent, not as a benchmark of the agent running on its own.
~400K sessions
A seven-month window of real Claude Code usage, classified by a Claude model rather than human reviewers to preserve privacy. Large enough that occupation-level patterns are visible.
Transcript classification
Sessions were labelled by success level and task type at scale. Excludes third-party IDE integrations and headless automated usage, so figures describe interactive human-plus-agent work only.
02 — The ParadoxThe finding nobody leads with: management scored highest.
Here is the result most coverage buries. On verified coding tasks, management occupations achieved the highest success rates of any occupation group — slightly exceeding software engineers themselves. Management, legal, and sales professionals were among the fastest-growing and highest-performing groups using agentic coding tools, per the study.
Read carefully, this is not a blowout. Anthropic frames the management edge as slight, and it reports that all ten of the largest occupation groups landed within roughly seven percentage points of each other. The point is not that managers out-engineer engineers; it is that the technical-versus-non-technical binary has stopped working as a performance predictor. When the agent handles syntax, what remains is judgment about what to build and whether the result is right — and that judgment is distributed across professions, not concentrated in one.
Verified success by occupation · agentic coding sessions
Source: Anthropic research, Jun 16, 2026 (vendor-stated). Management value is qualitative — the paper reports it as highest and 'slightly' above software engineers, not as a published percentage.03 — The MultiplierHow expertise compounds across four dimensions.
The single most useful idea in the study is that expertise does not improve one metric — it improves several at once, and they compound. The greater the domain expertise a person brings to a session, the more work Claude does per instruction. Experts triggered around 12 Claude actions and 3,200 words of output per prompt; novices triggered roughly 5 actions and 600 words. That is a 2.4x gap in actions and a 5.3x gap in output volume from the same single instruction.
The table below assembles four of the study's metrics into a single leverage view — the kind of consolidated picture no secondary coverage has put together. We have published only the two expertise tiers Anthropic states explicitly (novice and expert) and marked the intermediate row as estimated, because the paper reports intermediate values for success and abandonment but not for actions or output volume. The leverage column is our own calculation: output per prompt indexed to novice = 1.0.
| Expertise level | Verified success | Abandonment (troubled) | Actions / prompt | Words / prompt | Output leverage vs novice |
|---|---|---|---|---|---|
| Novice | 15% | 19% | 5 | 600 | 1.0× |
| Intermediate | 28% | 5–7% | est. | est. | est. |
| Expert | 33% | 5–7% | 12 | 3,200 | 5.3× |
The leverage figure is the part worth sitting with. An expert and a novice issue the same kind of instruction, and the expert extracts more than five times the output and well over double the agent actions from it. Verified success climbs from 15% to 33% over the same span. Expertise is not a tiebreaker at the margins here — it is the primary throttle on how much value an agent returns per prompt.
Experts vs novices
Experts triggered ~12 Claude actions per prompt against ~5 for novices. A clearer prompt, grounded in domain knowledge, unlocks more autonomous work from the same single instruction.
Words generated
Experts drew ~3,200 words of output per prompt versus ~600 for novices. Indexed to a novice baseline of 1.0, that is the leverage multiplier in the table above.
Novice to expert
Verified session success rose from 15% (novice) to 33% (expert). Intermediate users landed at 28% — most of the gain arrives early, then expertise keeps pushing the ceiling.
04 — ResilienceWhy experts don't quit when the agent gets stuck.
A quieter finding may be the most operationally important. When a session ran into trouble, novices abandoned it about 19% of the time, while intermediate and expert users walked away only 5–7% of the time — roughly a 3x resilience gap. The agent gets stuck for everyone; what differs is whether the human can diagnose why and redirect it.
That is the part the "anyone can prompt" narrative misses. Steering an agent through a failure — recognizing a wrong assumption, reframing the task, knowing which constraint to relax — is itself expertise. A novice who cannot tell whether the agent is confused or correct has no choice but to abandon the session. An expert reads the same failing transcript and knows exactly which lever to pull. The agent amplifies the people who can recover, and strands the people who cannot.
The binding constraint has moved from 'can you build it' to 'can you tell whether it's right.'— Aaron Brethorst, software engineer
Brethorst's framing, written shortly before the study landed, turns out to match the data. The scarce skill is no longer production — it is evaluation. The expert's lower abandonment rate is what evaluation looks like in practice: the ability to keep judging output until it is actually correct, instead of giving up when the first attempt fails.
05 — Division of LaborWho decides what versus who decides how.
The study quantifies the human-AI partnership in a way that explains why domain knowledge matters so much. In a typical Claude Code session, humans made roughly 70% of the planning decisions — what to build — while Claude made roughly 80% of the execution decisions — how to build it. The labor split is clean: people own intent, agents own implementation.
That division is the whole argument. If the agent owns four-fifths of the "how," then the human's leverage lives almost entirely in the "what." Knowing what to build, in what order, against which constraints, and what a correct result looks like — that is domain expertise by another name. The session evidence also shows the work itself moving up the value chain: average task value rose roughly 27% across the seven-month window, with building and fixing sessions climbing 32–43% in value as agents absorbed more of the routine execution.
The human–AI division of labor in Claude Code sessions
Source: Anthropic research, Jun 16, 2026 (vendor-stated). Planning and execution shares are approximate.Task composition shifted underneath these numbers, too. Debugging fell from about 33% of all sessions in October 2025 to roughly 19% by April 2026 — agents are quietly absorbing routine bug-fixing. Meanwhile non-code work grew: in the study's later snapshot, roughly 56% of sessions involved writing, fixing, testing, or orchestrating code, while about 44% were non-code work such as operating software, planning, and analysis. The tool is no longer only for writing code; it is for getting work done, and the people who know the work win.
06 — The BottleneckGeneration is solved. Verification is the new scarcity.
If agents now handle most of the "how," the constraint moves downstream to a single question: is the output actually correct? Anthropic's companion 2026 Agentic Coding Trends Report names verification as a rising bottleneck — the problem shifts from generating code to evaluating whether generated code is right. The same report describes a delegation gap: developers report using AI in a majority of their work, yet can fully hand off only a small share of tasks end to end.
The market evidence points the same way. Industry surveys through 2026 show that AI coding tools have become near-ubiquitous among developers, while deep trust in their unreviewed output remains strikingly low and agent adoption lags raw tool adoption by a wide margin. Read together, those signals describe a verification economy: usage is everywhere, trust is scarce, and someone with judgment has to close the gap between what the agent produced and what is actually shippable. That someone is a domain expert, not another model.
07 — ImplicationsWhat this means for agencies and engineering teams.
For a boutique agency, this research is a competitive repositioning argument. If domain knowledge — not coder headcount — is what drives agentic success, then senior practitioners with deep vertical expertise are now capable agentic operators in their own right. Not because they learned to code, but because they already know what good output looks like in their field. That is a direct answer to larger competitors who sell volume of junior engineers.
The decision matrix below maps the study's findings onto concrete staffing and pitching choices. Each row pairs a finding with the move it implies.
Hire for judgment, not just syntax
Verified success tracks domain understanding more than coding pedigree, and management occupations scored highest. Weight problem-understanding and taste over years-of-language-X in agentic roles.
Sell senior judgment as the product
When agents own ~80% of execution, the human's value is in the ~70% of planning they still own. Pitch the irreplaceable layer — what to build and whether it's right — not the headcount that types it.
Turn vertical experts into operators
Strategists, account leads, and analysts who know a domain cold can now ship working software with an agent. Equip them with the workflow rather than routing every build through a separate engineering queue.
Make verification a named role
With trust in unreviewed AI output low and a real delegation gap, someone must own evaluation. Treat 'can tell whether it's right' as a first-class responsibility, not an afterthought bolted onto delivery.
This is the practical core of an AI digital transformation engagement: not handing teams a tool and hoping, but reorganizing who owns intent, who owns execution, and who owns verification. The same logic applies when we build production software — the senior practitioner steering the agent is the leverage point, and the workflow is designed around their judgment. For the mechanics of that workflow, see our practical agentic engineering workflow, and for how a small shop actually rolls this out, the 30-person dev shop adoption case study.
There is a forward-looking read worth stating directly. If verification is the binding constraint and expertise is what closes it, then the next phase of competitive advantage in services is not about adopting agents — almost everyone will — but about concentrating scarce domain judgment where it compounds. The agencies that win the next two years will look less like coding shops and more like benches of senior operators, each amplified by agents they know how to steer. This dovetails with the Q2 2026 agentic hiring slowdown: teams are not freezing because the work dried up, but because the shape of the valuable hire changed.
08 — Reading the DataHow to read these numbers honestly.
A single vendor study, however large, is not the final word. Four caveats keep the conclusions sober. First, the figures are vendor-stated and not yet independently reproduced — Anthropic both built the tool and graded the transcripts. Second, the success rates describe human-interactive sessions only; the study excludes third-party IDE integrations and headless automation, so none of this generalizes to autonomous CI/CD pipelines. Third, the management-beats-engineers finding is directional, not a blowout: the paper calls the edge slight and groups it within a seven-point band.
Fourth, some of the supporting market figures circulating alongside this story — developer-survey adoption and trust percentages, vendor subscriber counts — reach us through secondary summaries rather than primary releases, so we describe them qualitatively here and would verify the exact numbers against the original sources before citing them in a decision. None of this undercuts the core finding. It simply means the right posture is to treat the study as strong directional evidence about how expertise interacts with agents, run your own measurements on your own work, and let the pattern — not any single percentage — drive the strategy. For where the tooling goes next, see what ships next in agentic coding.
09 — ConclusionThe moat moved, and it moved toward judgment.
Agents amplify the people who already know what correct looks like.
Anthropic's analysis of roughly 400,000 Claude Code sessions lands on a finding the industry should have seen coming: domain understanding predicts success more reliably than a coding background. Software engineers and non-software professionals finished within five points of each other, management occupations scored highest, and experts extracted more than five times the output per prompt that novices did — while abandoning struggling sessions roughly a third as often.
The strategic translation is clean. When agents own most of the execution and humans own intent, the scarce and defensible asset is judgment — knowing what to build and whether the result is right. Verification, not generation, is the bottleneck, and verification is a domain-expertise problem. That is the moat: not the ability to write code, but the ability to tell whether it is right.
For agencies and teams, the move follows directly. Hire for understanding over syntax, turn the vertical experts you already have into agentic operators, and make verification a named, owned responsibility. The numbers are vendor-stated and worth re-measuring on your own work — but the direction is unambiguous, and the organizations that reorganize around it first will hold the advantage longest.