Vibe-coding anti-patterns are the recurring AI-assisted coding mistakes that turn productivity gains into tech debt — accept- without-read approvals, copy-paste sprawl the agent never saw, agent-blame in postmortems, skipped evals on AI-authored changes, and six others that compound quietly until the debt curve crosses the productivity curve. We see them in every engineering audit.
The frame matters because the productivity story is real. Teams running Claude Code, Copilot, or Cursor at default settings genuinely ship faster in the first quarter — sometimes meaningfully faster. The debt story is also real. Without explicit gates, the same teams accumulate review-bypassed code, duplicated implementations the agent didn't see, untested branches that pass CI because the agent wrote the test against the same hallucinated contract, and postmortems that stop at "the model got it wrong." The curves cross somewhere between twelve and eighteen months.
This essay walks the ten patterns in severity order, with a diagnostic signal you can run today, the corrective pattern that prevents recurrence, and a worked example for the three highest- severity items. The intent is a punch list for engineering leaders — the kind of artifact that survives the meeting where someone says "we should write a policy about that."
- 01Accept-without-read is the silent debt accumulator.When reviewers approve AI diffs without reading the code, debt accumulates invisibly. The diffs land, CI passes, the team ships — and the duplicated logic, subtle API misuse, and confident-but-wrong type signatures pile up in places no human ever looked. It's the highest-leverage pattern to fix because the cost of correction grows with every merged PR.
- 02Agent-blame poisons postmortems.Stopping the analysis at "the model hallucinated" produces no preventive action and erodes trust in the tooling. The root cause is always a human decision — insufficient review, missing test gate, agent given too much scope. Postmortems that trace the decision chain backward produce better policy; postmortems that blame the model produce nothing.
- 03Eval gates on AI-authored changes are non-negotiable.AI is faster at writing tests than humans are at noticing missing ones. The asymmetric requirement — AI-authored code must include tests for changed paths even when human-authored code does not — is the cheapest enforcement. Score the gate at the PR template, not at the postmortem.
- 04Secrets-in-prompts incidents are policy failures.Every incident where a secret leaks into a prompt is traceable to one of three policy gaps: missing IDE-side scrubbing, missing workspace exclusions on .env-style files, or missing outbound proxy logging. Write the rule once, back it with tooling, and the failure mode disappears.
- 05Test coverage requires review, not just lines.AI-generated tests that pin the implementation rather than the contract look like coverage but aren't. They make refactors painful, mask regressions, and produce false-confidence reports. Coverage is a quality property — count it after a human reviewed the assertions, not before.
01 — Why It Becomes DebtProductivity gains compound — and so does debt.
The productivity story behind AI-assisted coding is real. Teams adopting Claude Code, Copilot, or Cursor at default settings consistently report meaningful first-quarter velocity gains — faster PRs, more parallel feature work, fewer hours spent on boilerplate. The story we don't hear as often is what happens in quarters three through six, when the velocity curve flattens and the debt curve starts to bite.
The mechanism is straightforward. AI-assisted coding accelerates output at every layer where the human used to be the bottleneck: scaffolding, test writing, refactor sweeps, doc generation, boilerplate translation. Each of those layers also accelerates the production of code that nobody read carefully, code that duplicates work the agent didn't see, code that passes a test written against a hallucinated contract, and commit messages that don't mention which tool produced the diff. None of that is inherently bad — it's the absence of gates that turns it into debt.
The ten anti-patterns below are the ones that recur across the engineering audits we run. They're grouped by severity tier rather than by chronological order — S1 patterns are incident-drivers that should be addressed within a sprint, S2 patterns accumulate debt over quarters and need gating in the current planning cycle, and S3 patterns are smells worth tracking but not at the top of the punch list.
Severity model · prioritize the gates that close incident-drivers first
Source: Digital Applied audit framework, internal field data 2025-202602 — Accept Without ReadThe silent debt accumulator.
Accept-without-read is the highest-severity pattern in this list, and the one most engineering leaders underweight. The shape: an engineer asks an agent to fix a bug, the agent produces a fifty-line diff across three files, the engineer skims the summary, accepts the changes, opens a PR, and the reviewer approves on the same skim. The code ships. Nobody read it carefully — not the author, not the reviewer, not anyone after.
The diagnostic signal is easy to instrument. Sample any merged AI-authored PR from the last quarter and ask the original author three questions about the diff that require having read it — "why did you choose this API over the obvious alternative," "what does this catch block do when the upstream service returns 429," "why is the loop bounded at thirty rather than the configured limit." If the author can't answer without re-reading, the pattern is live in the workflow. The follow-up question to the reviewer surfaces the same gap one layer up.
The corrective pattern has three components. First, the PR template requires explicit AI-attribution and a one-line summary the author wrote (not the agent) explaining what the diff does and why. Second, the review rubric for AI-authored diffs includes a single sentence requirement that reviewers cannot complete by skimming — "name one decision in this diff that you'd push back on if a junior engineer had written it," for example. Third, retroactive sampling of merged AI-authored PRs at five to ten percent per quarter, with the sample reviewed by an engineer who wasn't the original reviewer.
Three questions the author can't answer
Pick any merged AI-authored PR and ask three specific questions about the diff that require reading. If the author has to re-open the PR to answer, accept-without-read is live in the workflow.
DiagnosticReviewer time below 30 seconds
Audit log of review duration on AI-authored PRs. Median below thirty seconds on diffs over fifty lines means reviewers are signing off without reading. The threshold is rough, but the trend line is informative.
DiagnosticAI-attribution at PR template
Required field — AI tools used, plus an author-written one-line summary that explains what the diff does. The act of writing the summary forces the read; the template makes the requirement non-negotiable.
CorrectiveQuarterly retroactive sampling
Five to ten percent of merged AI-authored PRs go to a second reviewer who wasn't involved in the original merge. Surfaces patterns no individual review would catch — accumulating debt, drift, security smells that became normal.
CorrectiveOne operational note. Accept-without-read does not require malice or laziness — it's the default that emerges when the agent produces a clean-looking diff, the CI passes, and the review tool doesn't differentiate AI-authored diffs from human-authored ones. The fix is structural, not cultural. Reviewers who genuinely care about quality still skip readings when the workflow doesn't prompt them. The gates do.
"Accept-without-read is not a culture problem — it's a workflow problem. The default emerges when the diff looks clean, CI passes, and nothing in the review surface prompts the read."— Field-audit observation · Digital Applied 2026
03 — Copy-Paste SprawlAcross-file duplication the AI didn't see.
Copy-paste sprawl is the second-highest-severity pattern because it scales with adoption rate and is almost invisible in any single PR. The shape: an engineer asks an agent to write a function that handles, say, retry-with-backoff on an upstream service call. The agent obliges and produces a perfectly reasonable implementation. The engineer ships it. A week later, a different engineer in a different file asks an agent for the same thing — the agent has no memory of the first request and produces a second, slightly different, perfectly reasonable implementation. Multiply by ten agents, fifty engineers, six months.
The diagnostic signal is structural duplication that wouldn't have happened with a human-only workflow. Static analysis tools that detect near-duplicate functions across files are the obvious primitive — codebase-wide grep for common patterns (retry logic, error formatting, validation helpers, date math) and count the distinct implementations. A baseline number doubling over two quarters with no corresponding feature growth is the leading indicator.
The corrective pattern is the "shared module first" rule for agent prompts. Before asking an agent to write a new utility, the prompt template requires a one-sentence search for existing implementations. Some agents handle this natively if the prompt asks them to. Others need scaffolding — a pre-prompt hook or workspace rule that injects "search the codebase for existing implementations before writing a new one" into every agent call. The rule is cheap and the effect compounds.
Agent writes in isolation
Default behavior. Agent has no memory of prior requests, doesn't search the codebase by default, produces a fresh implementation each time. Compounds duplication at adoption-rate × time. The failure mode is invisible at the PR level.
Anti-patternSearch-first injected at prompt time
Workspace-level system prompt: "Before writing a new utility, search the codebase for existing implementations and propose reuse where possible." Cheap to install, dramatically reduces sprawl. Score 3 when the rule is enforced via pre-prompt hook.
CorrectiveNear-duplicate detection at merge
Static analysis tool runs on every PR, flags near-duplicate functions across files. The PR template requires an explicit acknowledgment when a duplicate is intentional (most aren't). The combination of prompt-side rule plus CI gate is what holds.
Defense in depthQuarterly sprawl audit
Track distinct implementations of common utilities over time. A doubling baseline with no feature growth is the trigger to revisit the prompt template or the workspace rule. The audit is cheap; the prevention is cheaper.
Detection layerThe sprawl pattern is the strongest argument for treating agent prompts as code — versioned, reviewed, and explicit about codebase-search expectations. A workspace rule that lives in a tracked file and applies to every agent call is dramatically cheaper than the cleanup required when twelve near-duplicate retry implementations need consolidation in eighteen months.
04 — Agent BlameIn incident postmortems — don't.
Agent-blame is the postmortem pattern where the root-cause analysis stops at "the model hallucinated" or "the agent produced incorrect code." It looks like a complete answer because the immediate cause of the bad diff genuinely was the model. It is not a complete answer because it produces no preventive action. Models will continue to hallucinate. The policy question is what humans did to allow the hallucination to ship.
The diagnostic signal is the postmortem template itself. If your postmortems include a root-cause field and the AI-related entries consistently read "model produced X" without naming the human decision that allowed X to merge, the pattern is live. Read the last three AI-attributable incidents and ask whether the preventive actions would have prevented the incident if the model had behaved identically. If no, the analysis stopped too early.
The corrective pattern is a postmortem template that explicitly treats AI tools as part of the workflow rather than as autonomous agents. Required fields include: which human approved the diff, what review gate did or didn't catch the issue, what test gate did or didn't exist for the changed path, what policy or tooling change makes the catch automatic next time. The test for a useful postmortem is that the preventive actions would close the gap independent of which specific model version produced the diff.
One operational note. The agent-blame pattern is sometimes a symptom of a deeper problem — engineers who don't feel psychologically safe naming the human decision in front of leadership will default to blaming the tool. The fix is partly cultural (blameless postmortem hygiene applied consistently to AI incidents the same way it would be to human-authored incidents) and partly structural (the template doesn't accept "model error" as a terminal root cause). Both layers earn their keep.
For the policy framework that operationalizes these accountability rules across review gates, IP exposure, secrets, and approved tools, our 50-point vibe-coding policy audit walks the gates that prevent the patterns in this essay from ever becoming postmortems.
05 — Eval SkipAI-authored changes without tests.
Eval-skip is the pattern where AI-authored changes ship without tests covering the changed paths, on the implicit assumption that the agent produced "obviously correct" code. The assumption is wrong roughly the same fraction of the time that human-authored code is wrong — except the AI is faster, the volume is higher, and the "obviously correct" label is harder to push back on at review time because the diff looks polished.
The diagnostic signal is the asymmetry between coverage on AI- authored versus human-authored diffs. Sample fifty PRs from the last quarter, label each by AI-authorship, and compare the rate at which test files are touched. If AI-authored diffs touch test files at a meaningfully lower rate than human-authored diffs, eval- skip is live. Some teams find the asymmetry is the other direction — AI writes more tests than humans do — which is its own pattern (see Coverage Illusion below).
The corrective pattern is asymmetric eval policy. AI-authored changes require tests for the changed paths even when the equivalent human-authored change would not. The asymmetry is intentional and grounded in the cost structure: AI is faster at writing tests than humans are at noticing missing ones, so the cheapest enforcement is to require tests on AI diffs and let CI block merges that don't include them. Score the gate at the PR template, not at the postmortem.
AI diff ships with tests for changed paths
PR template requires explicit confirmation that tests exist for the changed paths. CI gate blocks merge when AI-attributed diffs touch source files without corresponding test changes. Reviewer rubric includes a test-existence check. Cheap to install, dramatically reduces eval-skip incidents.
RequiredAI diff ships without tests
Default state in most teams adopting AI-assisted coding without policy. The diff looks clean, CI passes (because there's nothing new to fail), the reviewer skims. Untested AI-authored branches accumulate at adoption-rate × time and become the first thing to break under refactor.
Anti-patternOne nuance worth surfacing. The policy should require tests for the "changed paths" rather than tests for the diff as a whole. AI-generated tests that exercise the exact lines the agent wrote without exercising the contract those lines fulfill are not real coverage — they pass forever because they pin the implementation. Reviewers checking the eval-skip gate need a second sentence in the rubric: "do the tests exercise the contract or pin the implementation?" The first answer is the useful one.
The pattern is hard to enforce without tooling because reviewers face their own time pressure. A PR template field that asks "tests added for changed paths — Y/N" with required justification on N is the cheapest first layer. CI checks that flag AI-attributed diffs without test changes form the second layer. The combination holds; either layer alone slips.
06 — Five MorePrompt graveyard, model drift, secrets-in-prompts, refactor storms, missing AI-attribution.
The remaining five anti-patterns share a structural property — each one accumulates quietly across weeks and surfaces as an incident only when something else triggers it. Treat them as S2 (quarterly debt) rather than S1 (incident-driver) and gate them in the next planning cycle. None require novel tooling; all require explicit policy.
Prompt graveyard
Untracked agent prompts · personal IDE configs · driftEngineers accumulate dozens of personal agent prompts in IDE settings, none version-controlled, none shared. When the "magic prompt" produces production code, the team has no audit trail and no reproducibility. Corrective: tracked prompt library in-repo, with named prompts referenced by version.
S2 · Quarterly debtModel drift in shared prompts
Same prompt · different model · different outputA prompt that worked in Claude Sonnet 4.5 produces meaningfully different output in Claude Opus 4.7 or GPT-5.5. Shared prompts pinned to no specific model version accumulate silent drift. Corrective: model version pinned per shared prompt, change-log on version bumps, eval comparison against the prior version.
S2 · Quarterly debtSecrets in prompts
Production credentials · .env contents · screenshotsEngineers paste production credentials, .env contents, or screenshots containing tokens into agent prompts. Every instance is a third-party data transfer to the AI vendor. Corrective: IDE-side prompt scrubbing, workspace exclusions on credentials directories, outbound proxy logging, written incident playbook for suspected leaks.
S1 · Incident-driverRefactor storms
One bug fix · ten unrelated file changes · sprawl PRAgents are willing to refactor ten files when asked to fix one bug. The PR balloons, the diff becomes unreviewable, reviewers approve on faith. Corrective: scope-discipline rule in workspace prompt ("stay scoped to the originating task, propose unrelated refactors separately"), file-count thresholds at CI, sprawl-gate in review template.
S2 · Quarterly debtMissing AI-attribution in commits
Commit messages don't mention which tool produced the diffFuture archaeology — incident postmortems, license audits, training-data questions — all rely on commit metadata. When commits don't record AI authorship, the team loses the audit trail entirely. Corrective: required AI-attribution field at PR template, commit-message footer (e.g. "AI-tools: claude-code/4.7"), enforced via pre-commit hook.
S2 · Quarterly debtOf the five, secrets-in-prompts is the only S1 — every other pattern accumulates over quarters rather than surfacing as a named incident. The reason secrets-in-prompts ranks differently is that one instance is enough to trigger a compliance event, rotate production credentials, and write a postmortem. The other four don't produce that kind of immediate harm, but they consistently produce the long-tail debt that makes AI-assisted workflows less productive over time than they appeared at adoption.
The prompt-graveyard pattern deserves a brief expansion because it's the one most teams underweight. When the team's best agent prompts live in individual IDE configs and never get shared, three things happen. The team can't reproduce its own production output. New hires bootstrap from worse prompts than tenured engineers. And when a tenured engineer leaves, the prompts leave with them. A tracked prompt library — even a flat file in the repo — fixes all three failures at near-zero cost.
Model-version drift is the under-discussed counterpart. The same prompt produces meaningfully different output across model versions. Teams running prompts that were tuned against an older model and not re-evaluated when the version bumped are silently shipping different behavior than the prompt's comment block suggests. Pin the model version per shared prompt, run a quick eval against the new version on every bump, and the drift becomes an explicit decision rather than a surprise.
07 — Coverage IllusionGenerated tests that look like coverage, aren't.
The coverage illusion is the pattern where AI-generated tests inflate coverage metrics without actually validating the contract the code is supposed to fulfill. The shape: an engineer asks an agent to add tests for a module, the agent generates a thorough- looking test file with high line coverage, the engineer ships it, the coverage dashboard goes up, and the team marks the work done. Six months later, a refactor breaks the tests in ways that surface no bugs — because the tests were pinning the implementation, not the contract.
The diagnostic signal is brittleness under refactor. If routine refactors break a meaningful fraction of AI-generated tests without surfacing real regressions, the tests are pinning implementation details. Sample five AI-generated test files and run a structural refactor on the code they cover — rename a private method, reorder argument lists, inline a helper. The tests that break without the public contract changing are the ones measuring nothing useful.
The corrective pattern has two layers. First, the review rubric for AI-generated test files includes a single question — "do these tests exercise the public contract or pin the implementation?" — that reviewers cannot answer by skimming. Second, the prompt template for generating tests asks the agent to enumerate the contract first, then write tests against it, rather than generating tests directly from the implementation. Both layers earn their keep.
Test-coverage quality tiers · count tests at Tier 2+ only
Source: Field-audit test-quality tiers, Digital Applied 2025-2026The coverage illusion is the pattern with the longest tail. Lines- covered is easy to measure, easy to optimize, easy to celebrate. Contract-asserted is none of those things — it requires a human reading the test file with the public API in front of them. The policy that lands is the one that explicitly retires lines-covered as a quality metric and replaces it with a per-PR review checkpoint: does this test file exercise the contract, or pin the implementation? Answer once per PR, in writing, and the pattern stops compounding.
For teams ready to install the gates that prevent the patterns in this essay, our AI transformation engagements audit the AI-coding practice against the ten anti-patterns and ship the policy plus tooling that closes them. The companion framework — the Claude Code team adoption audit — scores the adoption side of the same coin, so the gates land without slowing the productivity story.
Productivity gains compound — but so does the debt when anti-patterns ship unchecked.
The ten anti-patterns are not exotic, not subtle, not new. Each one is the kind of pattern an experienced engineering leader would recognize on description. What's changed with AI-assisted coding is the rate at which they compound — and the difficulty of spotting them at the PR level because the diffs look polished and the CI passes. The gates that prevent them are correspondingly unexotic: explicit attribution, asymmetric eval requirements, scope discipline, secrets scrubbing, and a postmortem template that doesn't terminate at "the model got it wrong."
The honest framing is that the productivity gains and the debt both compound. Teams that install the gates before the curves cross keep the productivity. Teams that don't install the gates reach a point — usually somewhere between twelve and eighteen months in — where the debt service on AI-authored code starts eating into the velocity that justified the adoption. We've seen the second pattern more often than the first.
The practical next step: run the ten patterns against the team you actually have, rank them by severity for your codebase, and install the S1 gates this sprint. The S2 patterns earn the quarter. The S3 patterns earn the tracker. None of the work is conceptually hard — the leverage is entirely in doing it before the first AI-attributable postmortem makes the case for you.