AI DevelopmentIndustry Guide11 min readPublished June 20, 2026

~400K sessions · domain knowledge beats coding background · 5-pt occupation gap

Domain Expertise Is the New Moat in Agentic Coding

Anthropic analyzed roughly 400,000 Claude Code sessions and found that domain knowledge predicts success more reliably than a coding background does. Software engineers and non-software professionals land within a few points of each other — and management occupations score highest of all. The moat isn't who can write code. It's who can tell whether the code is right.

DA
Digital Applied Team
Senior strategists · Published Jun 20, 2026
PublishedJun 20, 2026
Read time11 min
SourcesAnthropic research + 6 more
Sessions analyzed
~400K
~235K users · Oct 25–Apr 26
Expert verified success
33%
vs 15% for novices
+18 pts
SWE vs non-SWE gap
5pts
34% vs 29% on code sessions
Work Claude does per prompt
2.4×
experts vs novices, actions/prompt

Domain expertise is the new moat in agentic coding — and the data now backs the claim. In June 2026 Anthropic published an analysis of roughly 400,000 Claude Code sessions from about 235,000 users, and its sharpest finding is also its most counterintuitive: how well someone understands the problem they are solving predicts success more reliably than whether they were trained to write code.

That reframes a debate the industry has had backwards for two years. The fear was that coding agents would commoditize developers and hand the job to anyone who could type a prompt. The data tells a more selective story. Agents have not democratized coding for everyone equally; they have amplified the people who already know what correct output looks like. Software engineers reached a 34% verified success rate on code-producing sessions; non-software occupations reached 29% — a gap of just five points — and management occupations scored highest of any group.

This guide walks through what Anthropic actually measured, why management beating engineers is less surprising than it sounds, how expertise compounds across four dimensions of agentic work, and what the shift means for agencies and engineering teams deciding what to hire for. The numbers are vendor-stated and carry caveats we flag throughout — but the direction is clear and the strategic implication is sharper still.

Key takeaways
  1. 01
    Domain knowledge predicts success, not coding background.Across ~400K Claude Code sessions, how well a person understood the problem mattered more than formal coding training. Software engineers hit 34% verified success on code sessions; non-software occupations hit 29% — within five points.
  2. 02
    Management occupations scored highest of any group.The study's most counterintuitive finding: management occupations slightly exceeded software engineers on verified coding tasks. All ten largest occupation groups landed within seven points of each other.
  3. 03
    Expertise compounds across four dimensions at once.Experts triggered 12 Claude actions and 3,200 words of output per prompt versus 5 actions and 600 words for novices — a 2.4x action gap and 5.3x output gap. The more expertise you bring, the more work the agent does per instruction.
  4. 04
    Experts abandon struggling sessions far less often.When a session hit trouble, novices walked away 19% of the time versus 5–7% for intermediate and expert users — roughly a 3x resilience gap. Knowing how to steer through a stuck agent is itself a form of expertise.
  5. 05
    Verification, not generation, is the new scarce skill.Anthropic's companion trends report names verification as a rising bottleneck: the problem is shifting from writing code to judging whether generated code is correct. Domain experts are best positioned to make that call.

01The StudyWhat Anthropic actually measured.

On June 16, 2026, Anthropic published "Agentic Coding and Persistent Returns to Expertise," an economic analysis of roughly 400,000 Claude Code sessions drawn from about 235,000 users between October 2025 and April 2026. The headline conclusion is stated plainly in the research: success is determined by how well a person understands the problem they are trying to solve, not whether they are trained in coding.

The methodology matters for how much weight to put on the numbers. Anthropic used a privacy-preserving pipeline in which a Claude model classified transcripts at scale rather than humans reading them. The study deliberately excludes third-party IDE integrations and headless, fully automated Claude Code usage — so every figure below describes human-interactive sessions, not CI/CD automation. Treat the results as a portrait of how people work alongside a coding agent, not as a benchmark of the agent running on its own.

Source and scope
The figures in this article are reported in Anthropic's study Agentic Coding and Persistent Returns to Expertise (June 16, 2026), authored by Zoe Hitzig, Maxim Massenkoff, Eva Lyubich, Ryan Heller, and Peter McCrory. They are vendor-stated and cover human-interactive sessions only. Independent reproduction is not yet available, so read them as directional evidence rather than settled fact.
Scale
~400K sessions
~235K users · Oct 2025 – Apr 2026

A seven-month window of real Claude Code usage, classified by a Claude model rather than human reviewers to preserve privacy. Large enough that occupation-level patterns are visible.

anthropic.com/research/claude-code-expertise
Method
Transcript classification
model-graded · privacy-preserving

Sessions were labelled by success level and task type at scale. Excludes third-party IDE integrations and headless automated usage, so figures describe interactive human-plus-agent work only.

Human-interactive sessions only

02The ParadoxThe finding nobody leads with: management scored highest.

Here is the result most coverage buries. On verified coding tasks, management occupations achieved the highest success rates of any occupation group — slightly exceeding software engineers themselves. Management, legal, and sales professionals were among the fastest-growing and highest-performing groups using agentic coding tools, per the study.

Read carefully, this is not a blowout. Anthropic frames the management edge as slight, and it reports that all ten of the largest occupation groups landed within roughly seven percentage points of each other. The point is not that managers out-engineer engineers; it is that the technical-versus-non-technical binary has stopped working as a performance predictor. When the agent handles syntax, what remains is judgment about what to build and whether the result is right — and that judgment is distributed across professions, not concentrated in one.

Verified success by occupation · agentic coding sessions

Source: Anthropic research, Jun 16, 2026 (vendor-stated). Management value is qualitative — the paper reports it as highest and 'slightly' above software engineers, not as a published percentage.
Management occupationsHighest verified success of any group (slightly above SWE)
Highest
Software engineeringVerified success on code-producing sessions
34%
Non-software occupationsVerified success on code-producing sessions
29%
Spread across top 10 occupation groupsAll largest groups fell within this band
≤7 pts
Our read of the data
The cleanest interpretation is that coding agents are not replacing domain expertise so much as amplifying it. A manager who deeply understands a workflow can now turn that understanding into working software without first clearing the syntax hurdle. The agent removes the keyboard tax on knowing what should exist — which is exactly the asset senior people already hold. (This framing echoes secondary coverage of the study; the precise wording is our own paraphrase, not a quote from Anthropic.)

03The MultiplierHow expertise compounds across four dimensions.

The single most useful idea in the study is that expertise does not improve one metric — it improves several at once, and they compound. The greater the domain expertise a person brings to a session, the more work Claude does per instruction. Experts triggered around 12 Claude actions and 3,200 words of output per prompt; novices triggered roughly 5 actions and 600 words. That is a 2.4x gap in actions and a 5.3x gap in output volume from the same single instruction.

The table below assembles four of the study's metrics into a single leverage view — the kind of consolidated picture no secondary coverage has put together. We have published only the two expertise tiers Anthropic states explicitly (novice and expert) and marked the intermediate row as estimated, because the paper reports intermediate values for success and abandonment but not for actions or output volume. The leverage column is our own calculation: output per prompt indexed to novice = 1.0.

The Expertise Leverage Multiplier: verified success rate, session abandonment when troubled, actions per prompt, words of output per prompt, and relative output leverage versus a novice baseline, across novice, intermediate, and expert Claude Code users. Source: Anthropic research, June 16, 2026. Intermediate action and output figures are estimated; the paper publishes only novice and expert values for those columns.
Expertise levelVerified successAbandonment (troubled)Actions / promptWords / promptOutput leverage vs novice
Novice15%19%56001.0×
Intermediate28%5–7%est.est.est.
Expert33%5–7%123,2005.3×

The leverage figure is the part worth sitting with. An expert and a novice issue the same kind of instruction, and the expert extracts more than five times the output and well over double the agent actions from it. Verified success climbs from 15% to 33% over the same span. Expertise is not a tiebreaker at the margins here — it is the primary throttle on how much value an agent returns per prompt.

Actions per prompt
Experts vs novices
2.4×

Experts triggered ~12 Claude actions per prompt against ~5 for novices. A clearer prompt, grounded in domain knowledge, unlocks more autonomous work from the same single instruction.

12 vs 5
Output per prompt
Words generated
5.3×

Experts drew ~3,200 words of output per prompt versus ~600 for novices. Indexed to a novice baseline of 1.0, that is the leverage multiplier in the table above.

3,200 vs 600
Verified success
Novice to expert
+18pts

Verified session success rose from 15% (novice) to 33% (expert). Intermediate users landed at 28% — most of the gain arrives early, then expertise keeps pushing the ceiling.

15% → 33%

04ResilienceWhy experts don't quit when the agent gets stuck.

A quieter finding may be the most operationally important. When a session ran into trouble, novices abandoned it about 19% of the time, while intermediate and expert users walked away only 5–7% of the time — roughly a 3x resilience gap. The agent gets stuck for everyone; what differs is whether the human can diagnose why and redirect it.

That is the part the "anyone can prompt" narrative misses. Steering an agent through a failure — recognizing a wrong assumption, reframing the task, knowing which constraint to relax — is itself expertise. A novice who cannot tell whether the agent is confused or correct has no choice but to abandon the session. An expert reads the same failing transcript and knows exactly which lever to pull. The agent amplifies the people who can recover, and strands the people who cannot.

The binding constraint has moved from 'can you build it' to 'can you tell whether it's right.'— Aaron Brethorst, software engineer

Brethorst's framing, written shortly before the study landed, turns out to match the data. The scarce skill is no longer production — it is evaluation. The expert's lower abandonment rate is what evaluation looks like in practice: the ability to keep judging output until it is actually correct, instead of giving up when the first attempt fails.

05Division of LaborWho decides what versus who decides how.

The study quantifies the human-AI partnership in a way that explains why domain knowledge matters so much. In a typical Claude Code session, humans made roughly 70% of the planning decisions — what to build — while Claude made roughly 80% of the execution decisions — how to build it. The labor split is clean: people own intent, agents own implementation.

That division is the whole argument. If the agent owns four-fifths of the "how," then the human's leverage lives almost entirely in the "what." Knowing what to build, in what order, against which constraints, and what a correct result looks like — that is domain expertise by another name. The session evidence also shows the work itself moving up the value chain: average task value rose roughly 27% across the seven-month window, with building and fixing sessions climbing 32–43% in value as agents absorbed more of the routine execution.

The human–AI division of labor in Claude Code sessions

Source: Anthropic research, Jun 16, 2026 (vendor-stated). Planning and execution shares are approximate.
Human-led planning decisionsWhat to build — owned by the person
~70%
AI-led execution decisionsHow to build it — owned by Claude
~80%
Average task value increaseAcross the Oct 2025 – Apr 2026 window
+27%
Building-session value increaseBuild and fix work climbed most in value
+43%

Task composition shifted underneath these numbers, too. Debugging fell from about 33% of all sessions in October 2025 to roughly 19% by April 2026 — agents are quietly absorbing routine bug-fixing. Meanwhile non-code work grew: in the study's later snapshot, roughly 56% of sessions involved writing, fixing, testing, or orchestrating code, while about 44% were non-code work such as operating software, planning, and analysis. The tool is no longer only for writing code; it is for getting work done, and the people who know the work win.

06The BottleneckGeneration is solved. Verification is the new scarcity.

If agents now handle most of the "how," the constraint moves downstream to a single question: is the output actually correct? Anthropic's companion 2026 Agentic Coding Trends Report names verification as a rising bottleneck — the problem shifts from generating code to evaluating whether generated code is right. The same report describes a delegation gap: developers report using AI in a majority of their work, yet can fully hand off only a small share of tasks end to end.

The market evidence points the same way. Industry surveys through 2026 show that AI coding tools have become near-ubiquitous among developers, while deep trust in their unreviewed output remains strikingly low and agent adoption lags raw tool adoption by a wide margin. Read together, those signals describe a verification economy: usage is everywhere, trust is scarce, and someone with judgment has to close the gap between what the agent produced and what is actually shippable. That someone is a domain expert, not another model.

Verification is the moat
Anthropic's trends report frames verification as one of the defining shifts of agentic development — the bottleneck moves from writing code to judging whether code is correct, and product judgment becomes the scarce input. We paraphrase rather than quote that report directly; treat the framing as the report's thesis, not a verbatim line. The practical takeaway holds either way: hire and develop people who can tell whether it is right, because generation is no longer the hard part.

07ImplicationsWhat this means for agencies and engineering teams.

For a boutique agency, this research is a competitive repositioning argument. If domain knowledge — not coder headcount — is what drives agentic success, then senior practitioners with deep vertical expertise are now capable agentic operators in their own right. Not because they learned to code, but because they already know what good output looks like in their field. That is a direct answer to larger competitors who sell volume of junior engineers.

The decision matrix below maps the study's findings onto concrete staffing and pitching choices. Each row pairs a finding with the move it implies.

Hiring signal
Hire for judgment, not just syntax

Verified success tracks domain understanding more than coding pedigree, and management occupations scored highest. Weight problem-understanding and taste over years-of-language-X in agentic roles.

Screen for domain depth
Positioning
Sell senior judgment as the product

When agents own ~80% of execution, the human's value is in the ~70% of planning they still own. Pitch the irreplaceable layer — what to build and whether it's right — not the headcount that types it.

Lead with the moat
Capability
Turn vertical experts into operators

Strategists, account leads, and analysts who know a domain cold can now ship working software with an agent. Equip them with the workflow rather than routing every build through a separate engineering queue.

Upskill the experts you have
Quality gate
Make verification a named role

With trust in unreviewed AI output low and a real delegation gap, someone must own evaluation. Treat 'can tell whether it's right' as a first-class responsibility, not an afterthought bolted onto delivery.

Staff the verification layer

This is the practical core of an AI digital transformation engagement: not handing teams a tool and hoping, but reorganizing who owns intent, who owns execution, and who owns verification. The same logic applies when we build production software — the senior practitioner steering the agent is the leverage point, and the workflow is designed around their judgment. For the mechanics of that workflow, see our practical agentic engineering workflow, and for how a small shop actually rolls this out, the 30-person dev shop adoption case study.

There is a forward-looking read worth stating directly. If verification is the binding constraint and expertise is what closes it, then the next phase of competitive advantage in services is not about adopting agents — almost everyone will — but about concentrating scarce domain judgment where it compounds. The agencies that win the next two years will look less like coding shops and more like benches of senior operators, each amplified by agents they know how to steer. This dovetails with the Q2 2026 agentic hiring slowdown: teams are not freezing because the work dried up, but because the shape of the valuable hire changed.

08Reading the DataHow to read these numbers honestly.

A single vendor study, however large, is not the final word. Four caveats keep the conclusions sober. First, the figures are vendor-stated and not yet independently reproduced — Anthropic both built the tool and graded the transcripts. Second, the success rates describe human-interactive sessions only; the study excludes third-party IDE integrations and headless automation, so none of this generalizes to autonomous CI/CD pipelines. Third, the management-beats-engineers finding is directional, not a blowout: the paper calls the edge slight and groups it within a seven-point band.

Fourth, some of the supporting market figures circulating alongside this story — developer-survey adoption and trust percentages, vendor subscriber counts — reach us through secondary summaries rather than primary releases, so we describe them qualitatively here and would verify the exact numbers against the original sources before citing them in a decision. None of this undercuts the core finding. It simply means the right posture is to treat the study as strong directional evidence about how expertise interacts with agents, run your own measurements on your own work, and let the pattern — not any single percentage — drive the strategy. For where the tooling goes next, see what ships next in agentic coding.

09ConclusionThe moat moved, and it moved toward judgment.

The shape of the moat, mid-2026

Agents amplify the people who already know what correct looks like.

Anthropic's analysis of roughly 400,000 Claude Code sessions lands on a finding the industry should have seen coming: domain understanding predicts success more reliably than a coding background. Software engineers and non-software professionals finished within five points of each other, management occupations scored highest, and experts extracted more than five times the output per prompt that novices did — while abandoning struggling sessions roughly a third as often.

The strategic translation is clean. When agents own most of the execution and humans own intent, the scarce and defensible asset is judgment — knowing what to build and whether the result is right. Verification, not generation, is the bottleneck, and verification is a domain-expertise problem. That is the moat: not the ability to write code, but the ability to tell whether it is right.

For agencies and teams, the move follows directly. Hire for understanding over syntax, turn the vertical experts you already have into agentic operators, and make verification a named, owned responsibility. The numbers are vendor-stated and worth re-measuring on your own work — but the direction is unambiguous, and the organizations that reorganize around it first will hold the advantage longest.

Build your team around domain judgment

The new moat isn't who can write code. It's who can tell whether it's right.

We help teams reorganize around the new moat — concentrating senior domain judgment, turning vertical experts into agentic operators, and building a real verification layer so AI output ships with confidence.

Free consultationSenior-led deliveryTailored solutions
What we work on

Agentic operating-model engagements

  • Mapping who owns intent, execution, and verification
  • Turning vertical experts into agentic operators
  • Standing up a real AI-output verification layer
  • Agentic workflow design for senior practitioners
  • Hiring and skills frameworks for the post-agent team
FAQ · Domain expertise & agentic coding

The questions we get every week.

Anthropic analyzed roughly 400,000 Claude Code sessions from about 235,000 users between October 2025 and April 2026, published June 16, 2026. The central finding is that domain knowledge — how well a person understands the problem they are solving — predicts session success more reliably than a coding background does. Software engineers reached a 34% verified success rate on code-producing sessions while non-software occupations reached 29%, a gap of just five points, and all ten of the largest occupation groups landed within about seven points of each other. The figures are vendor-stated and cover human-interactive sessions only.