You don't need to code to ship with AI — and a 400,000-session study from Anthropic is the strongest evidence yet. In June 2026 the company analyzed roughly 400,000 Claude Code sessions from about 235,000 users and found that people in non-software occupations reached verified success at nearly the same rate as software engineers. The thing that actually separated good sessions from wasted ones was not coding ability. It was domain knowledge — knowing precisely what a correct result looks like before the agent starts.
That is the encouraging headline. The useful part is what comes after it: the study quietly hands you a self-diagnostic. Its classifier rated each session by reading three behavioral signals, and those same three signals double as a checklist you can run on your own prompts. Get them right and you behave like an expert user regardless of whether you have ever opened a terminal before.
This is the practical playbook. We cover what the data actually says (and where it's noisy), the three-signal self-test, what an expert session looks like next to a novice one, why aiming for "intermediate" captures most of the gain, and a five-step routine you can apply to your next AI build. If you want the strategic argument for why this shift creates a durable advantage, read its companion — our deeper analysis on why domain expertise is the new moat in agentic coding. This post is the how-to.
- 01Non-coders land within five points of engineers.On Anthropic's strict verified-success measure, software engineers reached roughly 34% and non-software occupations roughly 29% — a five-point gap that held steady across all seven months. Every one of the ten largest occupation groups landed within seven points of engineers.
- 02Three behaviors decide whether you're an expert user.Anthropic's classifier read three signals: how precisely you frame the outcome, what you ask Claude to verify, and whether you correct Claude or Claude corrects you. Those three are a self-test you can run on your own prompts — no coding background required.
- 03Domain knowledge unlocks more work per prompt.Expert-rated users triggered about 12 Claude actions and ~3,200 words of output per prompt; novices triggered about 5 actions and ~600 words. That is a 2.4x gap in actions and more than 5x in output volume from the same single instruction.
- 04The biggest win is leaving novice, not reaching expert.Anthropic states that most of the gain is concentrated at the novice-to-intermediate transition. You don't have to become an expert to capture the bulk of the benefit — you need to stop prompting like a novice. That is a skills question, not a background question.
- 05Verify the population before you generalize.Success rates come from Anthropic's proprietary classifier and have not been independently replicated. The study also excludes Cursor, VS Code, and Agent SDK usage, so it describes Anthropic's own-tool users. Treat the numbers as directional and run your own measurements.
01 — The FindingWhat the data actually says.
On June 16, 2026, Anthropic published "Agentic Coding and Persistent Returns to Expertise," a privacy-preserving analysis of roughly 400,000 Claude Code sessions from about 235,000 users between October 2025 and April 2026. Rather than read transcripts by hand, the researchers built a classifier on Claude Sonnet 4.6 that rated each session on a five-point scale from novice to expert. It never looked at job titles — it read behavior. Anthropic reports the classifier's agreement with independent telemetry exceeded 90%.
The occupational result is the one to anchor on, because it's the one most people get wrong in both directions. It is not true that "non-coders are now as good as coders." And it is not true that the gap is large. On verified success — which required both the classifier judging the session a success and independent telemetry confirming it through passing tests, commits, or explicit user confirmation — software engineers reached about 34% and non-software occupations about 29%. A real five-point gap, but a narrow one. Every one of the ten largest occupation groups, Anthropic reports, landed within seven points of software engineers.
Verified success by occupation · agentic coding sessions
Source: Anthropic research, Jun 16, 2026. Success rates are from Anthropic's proprietary classifier and have not been independently replicated.Here is the interpretation worth holding onto. The technical-versus- non-technical binary has stopped working as a performance predictor. When the agent handles syntax, the remaining variable is judgment about what to build and whether the result is right — and that judgment is distributed across professions, not concentrated in one. The narrowness of the gap is exactly why a non-coder should feel invited in, and the existence of the gap is exactly why a non-coder shouldn't expect the agent to think for them.
02 — The Self-TestThe three signals that quietly grade you.
Anthropic's classifier didn't guess at expertise from a résumé. It measured three concrete things in each transcript: how precisely the user framed the work, what the user asked Claude to verify, and the direction of correction — whether the user corrected Claude or Claude had to correct the user. Crucially, the measure is task-specific, not credential-specific. A senior engineer asking their first Rust question registers as a novice for that session; an accountant who specifies reconciliation rules precisely and catches the edge case registers as an expert, regardless of whether they know Python.
That makes the three signals the most useful artifact in the entire study, because you can turn them into a self-test. Before you send your next prompt, score yourself against all three. None of them require knowing a programming language. All of them require knowing your own domain — which, if you're reading this for your field, you already do.
Framing precision
Did you describe the exact result you want — the output shape, the inputs it works from, the constraints it must respect — or did you ask for a vague 'make me a dashboard'? Precise framing is the single loudest expert signal in the transcript.
Verification you name first
Experts say what passing looks like up front: these three test cases, this edge case handled, this number reconciling. Novices accept the first plausible-looking output. Naming the check is what turns a guess into a verified success.
Correction direction
When the agent goes wrong, do you identify exactly which part failed and why, or do you re-prompt vaguely and hope? The expert reads a failing transcript and knows which lever to pull. That is domain judgment, not coding.
03 — Side by SideWhat an expert session looks like next to a novice one.
The study describes the classifier's signals but never publishes them as a user-facing checklist. The table below is our own: it maps each of the three signals — plus the session-structure data on how experts steer through trouble — into a concrete contrast between novice and expert behavior, and names what each shift unlocks inside the agent. None of the left-hand column requires you to be a novice forever. Every row is a behavior you can change on your very next prompt.
| Dimension | Novice behavior | Expert behavior | What it unlocks |
|---|---|---|---|
| Outcome framing | "Make me a dashboard." | "A component that renders daily-revenue data from this schema, passes these three test cases, and outputs TSX." | Claude works toward a target it can actually hit, instead of guessing at intent. |
| Context supply | No constraints; lets the agent assume. | Names the constraints, edge cases, and the relevant files or data up front. | More autonomous actions per prompt — fewer round-trips spent filling in blanks. |
| Verification | Accepts the first plausible output. | States what passing looks like before Claude starts. | A result that clears a real bar — the difference between "looks done" and verified success. |
| Correction direction | Re-prompts vaguely; often abandons. | Identifies exactly which part failed and why, then redirects. | Recovery instead of a dead end — experts steer through trouble rather than quitting. |
The leverage these behaviors unlock is measurable. Anthropic reports that expert-rated users triggered around 12 Claude actions and ~3,200 words of output per prompt, while novices triggered about 5 actions and ~600 words. That is a 2.4x gap in actions and more than a 5x gap in output volume — from the same single instruction. The expert isn't typing faster. They're front-loading enough clarity that one prompt does the work of several.
With the emergence of coding agents, a coding background appears to be becoming less important than before for programming success.— Zoe Hitzig, lead author of the study
04 — The ShortcutAim for intermediate first — that's where the gain is.
The most actionable thing in the entire study is also the most easily missed: Anthropic states that most of the gain is concentrated at the novice-to-intermediate transition. The jump from novice to intermediate is larger than the additional jump from intermediate to expert. Verified success climbed from about 15% for novices to the 28–33% range for intermediate and expert sessions — meaning the bulk of the improvement happens the moment you stop prompting like a beginner.
That reframes the whole project. You do not need to become a world-class agentic operator to get most of the value. You need to cross one threshold: from vague to specific, from accepting output to checking it, from abandoning when stuck to redirecting. Becoming "intermediate" is not a years-long apprenticeship — it's a handful of habits drawn directly from the three signals. For the prompt-level patterns that move you across that line, our review of which prompt patterns actually worked in H1 2026 is a useful companion.
Where most of the gain sits
Verified success rises from ~15% (novice) to the 28–33% band (intermediate and expert). Anthropic states the novice-to-intermediate jump is the larger of the two steps — so leaving novice captures most of the benefit.
Experts vs novices
Expert-rated sessions triggered ~12 Claude actions per prompt against ~5 for novices. A clearer prompt, grounded in domain knowledge, unlocks more autonomous work from a single instruction.
Words generated
Experts drew ~3,200 words of output per prompt versus ~600 for novices — more than a fivefold gap. The output multiplier is even larger than the action multiplier; both come from the same precise framing.
05 — The LadderWhat each expertise tier unlocks in one view.
The study reports these numbers across several different sections; no single table in the paper or the coverage collects them in one place. Below we assemble the tier-by-tier picture so the "get to intermediate" argument is visually obvious. We publish the success and abandonment figures Anthropic states for each tier, and mark the action and output columns as expert/novice only — because Anthropic publishes those two metrics for the endpoints, not for the intermediate tier. Every number is from Anthropic's classifier and is vendor-stated.
| Tier | Verified success | Abandonment (troubled) | Actions / prompt | Words / prompt | In practice |
|---|---|---|---|---|---|
| Novice | ~15% | 19% | ~5 | ~600 | Vague asks; accepts first output; quits when stuck. |
| Intermediate | ~28–33% | 5–7% | not published | not published | Specifies outcome and format; names a check; redirects when stuck. |
| Expert | ~28–33% | 5–7% | ~12 | ~3,200 | Precise constraints and edge cases; pinpoints failures; recovers reliably. |
Read the abandonment column alongside the success column. When sessions ran into trouble, Anthropic reports that expert users recovered to success about 15% of the time versus 4% for novices, and that novices abandoned struggling sessions at 19% against 5–7% for intermediate and expert users. The resilience gap matters as much as the headline success gap: a lot of "failure" is really just giving up early. Knowing how to steer a stuck agent — which is signal three, correction direction — is what keeps a hard session alive.
06 — The Counterintuitive BitWhy managers tend to do well at this.
One result is too instructive to skip: management occupations ranked highest on verified success, slightly above software engineers. Read it the right way and it's a gift, because it tells you what mental model the work rewards. Translating a goal into precise requirements, judging whether an output meets the bar, managing iteration when the first attempt misses — that is the daily job of a good manager, and it maps almost exactly onto the three expert signals. If you've run projects or people, you already own the mental model. Your challenge is applying it to an agent, not learning a new skill from scratch.
One honest caveat, which Anthropic itself flags. The management edge may partly be a measurement artifact: managers are more likely to explicitly confirm success in the transcript, and the classifier rewards that explicit confirmation. So the ranking might reflect genuine skill transfer, a quirk of how the metric is read, or both. Don't treat "managers beat engineers" as a clean, settled finding — treat it as a strong hint about which habits travel well into agentic work.
07 — The PlaybookThe five-step routine for shipping without code.
Here is the practical translation — a routine that operationalizes the three signals plus the resilience finding. Run it on your next AI build, whether that's a report, an automation, a small tool, or a web page. Each step pairs a finding from the study with the concrete move it implies.
Frame the exact outcome
State the result, its output format, and the inputs it works from — specifically enough that a stranger could build the right thing. Precise framing is the loudest expert signal in the data; it's also the cheapest one to fix.
Hand over your context
Name the constraints, edge cases, and relevant files or data before the agent starts. Withheld context is the gap the agent fills with assumptions — and assumptions are where novice sessions quietly go wrong.
Define correct before you start
Say what passing looks like — the test cases, the number that must reconcile, the edge case that must hold — up front. Naming verification first is what turns a plausible-looking output into a genuinely verified one.
Correct precisely, don't abandon
When the agent drifts, identify exactly which part failed and why, then redirect. Experts recover from troubled sessions far more often than novices; the lever is precise correction, not a vague 'try again.'
Stay the human in the loop
You own the 'what' and the judgment of whether it's right; let the agent own the 'how.' Production work still needs a person who can tell whether the output is correct — and on your own domain, that person is you.
This is the practical core of how we run an AI digital transformation engagement: not handing a team a tool and hoping, but teaching senior practitioners to brief, verify, and steer agents inside their own domain. The same discipline underpins how we build production software — the person steering the agent is the leverage point, and the workflow is designed around their judgment. For broader context on the tool landscape and how teams adopt it, see the H1 2026 AI coding retrospective and the 30-person dev shop adoption case study.
A forward-looking read is worth stating directly. As of April 2026 the expertise gap had not narrowed — Anthropic flags this as a trend to watch, not a temporary artifact. If that holds, the advantage doesn't accrue to whoever buys the best model; it accrues to whoever brings the sharpest domain judgment to it. For a non-coder, that's genuinely good news: the moat is built from the expertise you already have, and the only thing standing between you and an intermediate-level session is a more disciplined brief.
08 — Reading the DataHow to read these numbers honestly.
A single vendor study, however large, is not the final word. Four caveats keep the conclusions sober. First, the success rates come from Anthropic's own proprietary classifier and have not been independently replicated — read them as "Anthropic's data shows," not as settled fact. Second, the study excludes Cursor, VS Code integrations, headless sessions, and Agent SDK usage, so the population skews toward users of Anthropic's own CLI, claude.ai, and desktop app; the patterns may not generalize one-to-one to Cursor or Windsurf users.
Third, the management-beats-engineers result carries Anthropic's own measurement-artifact caveat and should be read as a directional hint, not a clean finding. Fourth, the study's figure for the average session's economic value rising about 27% is a relative benchmark estimated against public freelance posting rates — the researchers call the direction reliable but the specific percentage measurement-noisy, so we treat it as a trend toward higher-complexity work rather than a dollar figure. None of this undercuts the core finding. It simply means the right posture is to treat the study as strong directional evidence, run your own measurements on your own work, and let the pattern — not any single percentage — drive how you prompt.
09 — ConclusionThe barrier was never the code.
You already have the scarce input. Now apply it to the agent.
Anthropic's analysis of roughly 400,000 Claude Code sessions lands on a finding that should change how non-coders approach AI tools: domain knowledge, not coding skill, is what separates a productive session from a wasted one. Non-software professionals finished within five points of engineers on strict verified success, and management occupations ranked highest of all. The thing that moved the needle was framing, verification, and correction — three behaviors anyone can run on their own prompts.
The practical translation is encouraging. You don't need to become an expert; the data says most of the gain comes from simply leaving novice. Be specific about the outcome, hand over your context, define what correct looks like before you start, correct precisely instead of abandoning, and stay the human who judges whether the result is right. Those five habits are drawn straight from the study, and none of them require a programming language.
Read this beside its companion on why domain expertise is the new moat for the strategic case. The takeaway from both is the same: the barrier was never the code. It was the rigor of the brief and the judgment to know when the output is right — and if you know your own field, you already hold both.