Your CLAUDE.md best practices probably need a rewrite. A June 2026 study of 100 popular open-source AGENTS.md and CLAUDE.md files found that 91 of them carried at least one configuration smell — a recurring, problematic pattern in the very files meant to steer a coding agent. The files written to make agents smarter are, more often than not, making them skim, waste tokens, and ignore the rules that actually matter.
The term comes from software engineering: a code smell is not a bug, but a surface symptom of a deeper structural problem. Researchers at the Federal University of Minas Gerais (UFMG) borrowed the idea and applied it to agent configuration, producing the first formal taxonomy of six distinct smells — each with a measured prevalence and a clear structural cause. The pattern is everywhere because almost nobody treats these files as code: they get appended to, never pruned, and never reviewed.
This guide covers what the study found, names and explains all six smells with their prevalence rates, maps each one to a precise fix, and reconciles three concurrent research streams that appear to contradict each other on whether context files help at all. The through-line: a lean, non-obvious-only config file is a genuine asset; a bloated or auto-generated one is a liability.
- 0191 of 100 popular config files have a smell.The UFMG study analyzed 532,000 files across 100 popular open-source repositories that each carry an AGENTS.md or CLAUDE.md. Ninety-one contained at least one of six named configuration smells.
- 02Lint Leakage and Context Bloat dominate.Lint Leakage (62% of files) repeats rules a linter already enforces; Context Bloat (42%) over-specifies behavior. The paper reports that when certain smells co-occur, the likelihood of Context Bloat rises substantially.
- 03Anthropic warns bloat makes agents ignore you.Anthropic's own best-practices documentation states that bloated CLAUDE.md files cause Claude to ignore your actual instructions, and suggests keeping the file lean — its guidance points toward roughly 200 lines.
- 04More context is not always better.A separate ETH Zurich study reports that LLM-generated context files reduced task success versus no file at all, and that any context file added meaningful inference cost — a counterintuitive case against auto-generated config.
- 05Instructions are probabilistic; hooks are deterministic.An instruction in CLAUDE.md is followed most of the time, not always. For something that absolutely must not happen, Anthropic says an instruction is the wrong tool — a hook that runs outside the model's reasoning is.
01 — The StudyThe first formal catalog of config smells.
The paper at the center of this is arXiv:2606.15828, submitted June 14, 2026 by a team at the Federal University of Minas Gerais (UFMG) in Brazil. It is described as the first academic catalog of configuration smells for coding-agent config files. The authors combined a grey-literature review with automated heuristic detection, analyzing 532,000 files across 100 popular open-source repositories that each contained an AGENTS.md or CLAUDE.md file.
The headline number — 91 of 100 files (91%) carrying at least one smell — was picked up by mainstream tech press, including coverage in InfoWorld and The Register. The interesting part is not that bad config exists — every engineer knows that — but that the badness is systematic enough to name, measure, and remediate. That is exactly what makes config smells worth treating as a discipline rather than a vibe.
One caveat worth keeping straight: this UFMG group also published an earlier descriptive study, arXiv:2511.09268 (Decoding the Configuration of AI Coding Agents), which analyzed 328 CLAUDE.md files and found software architecture specified in 72.6% of them — establishing architecture as the first thing teams document. That paper describes what teams put in their files. The June 2026 paper identifies the smells — the problematic patterns. Same authors, different question.
02 — The TaxonomySix smells, named and measured.
The taxonomy is the contribution. Each smell is a distinct failure mode with its own prevalence across the studied files. Recognizing them by name is the first step to auditing your own config — most teams have two or three without realizing it. Config smells sit at the top of the broader stack of common deployment anti-patterns for AI agents: a misconfigured file quietly degrades everything downstream.
Configuration smell prevalence · share of files affected
Source: arXiv:2606.15828 (UFMG), June 2026 — prevalence across 100 popular open-source config filesLint Leakage (62%) is the most common by a wide margin: agent instructions repeating rules that linters, formatters, and static-analysis tools already enforce reliably. Every line spent telling an agent to “use 2-space indentation” is a token wasted on guidance a formatter handles deterministically.
Context Bloat (42%) is over-specification: the file tries to anticipate every behavior. Per the researchers, bloated files increase token consumption, raise costs, and reduce the visibility of the instructions that actually matter — the important rules get buried under the noise.
Skill Leakage (35%) loads rarely-used, task-specific workflows into every session instead of isolating them in separate skill files. Conflicting Instructions (28%) are contradictory directives within the same file that confuse the agent and raise error rates. Init Fossilization (24%) is stale rules accumulated at project setup that no longer reflect the current state of the project. And Blind References (16%) cite external documents without explaining their relevance, timing, or how the agent should use them.
"Bloated configuration files increase token consumption, raise costs, and reduce the visibility of important instructions."— UFMG researchers (Helio Victor F. dos Santos et al.), arXiv:2606.15828
03 — Severity MatrixEvery smell mapped to its fix and effort.
Most coverage of the paper lists the smells. None maps each one to a concrete remediation with an effort estimate. The matrix below does — prevalence from the UFMG study, fix mechanisms drawn from Anthropic’s official documentation, and an effort rating reflecting how much structural change the fix demands. The smells are ordered by prevalence, highest first.
| Smell | Prevalence | Primary harm | Fix mechanism | Effort |
|---|---|---|---|---|
| Token-waste smells — cheap to remove, biggest immediate win | ||||
| Lint Leakage | 62% | Token waste | Delete the rule; let the linter / formatter enforce it (e.g. move it to .eslintrc or a pre-commit hook). | Low |
| Init Fossilization | 24% | Stale guidance | Schedule a periodic review; delete rules that no longer match the project’s current state. | Low |
| Blind References | 16% | Wasted lookups | Add explicit context: why the file matters, when to read it, and how the agent should use it. | Low |
| Structural smells — require moving content to another layer | ||||
| Context Bloat | 42% | Cost + buried rules | Move selective content to path-scoped Rules in .claude/rules/ that load only for relevant work. | Medium |
| Skill Leakage | 35% | Cost inflation | Move task-specific workflows into Skills (.claude/skills/<name>/SKILL.md) that load on demand. | Medium |
| Conflicting Instructions | 28% | Agent confusion | Review for contradictions; resolve to a single source of truth per rule. Hardest to catch without a careful read. | High |
Read the matrix top to bottom and a triage order falls out. The three low-effort smells — Lint Leakage, Init Fossilization, Blind References — together account for the bulk of the wasted tokens and cost almost nothing to fix: you are deleting or annotating lines, not restructuring. Start there. The structural smells need content moved to another configuration layer, which is more work but pays off in both cost and reliability.
04 — The ParadoxWhen more context makes things worse.
Three concurrent research streams seem to contradict each other on whether context files help at all. They resolve cleanly once you read them together — and the synthesis is the most useful thing in this guide.
First, the smell catalog says most files are badly written. Second, an efficiency study (arXiv:2601.20404) found that a well-written AGENTS.md reduced an agent’s median wall-clock runtime by 28.64% (98.57s to 70.34s) and median output tokens by 16.58% (2,925 to 2,440) across 124 pull requests, while holding task completion comparable — though that study used only OpenAI Codex, so the result may not generalize. Third, an ETH Zurich study (arXiv:2602.11988) found that LLM-generated context files reduced task success rate by about 3% versus no file at all, while developer-written files offered only about a 4% average improvement — and both file types raised inference cost by 20 to 23%.
Faster median runtime
A well-written AGENTS.md cut median wall-clock runtime from 98.57s to 70.34s and output tokens 16.58% across 124 PRs — but on OpenAI Codex only.
LLM-generated files hurt
Auto-generated context files reduced task success vs no file at all; developer-written files gained only ~4% on average. Both added cost.
Cost of any context file
ETH Zurich reports any context file raised inference cost 20–23% (roughly 2.45–3.92 extra steps per task) — the file is never free.
The contradiction dissolves once you separate quality from quantity. A lean file that surfaces only non-obvious, project-specific knowledge is net positive — it earns back its cost in faster, more correct runs. A bloated or auto-generated file is net negative: it duplicates documentation the agent would read anyway, adds inference cost, and buries the instructions that matter. The ETH Zurich finding that LLM-generated files improved performance only marginally when other documentation was removed confirms the redundancy — the agent already had the information. The cost of a smelly file is also visible in production: it shows up when you start measuring agent performance with evals and traces, where bloated context inflates both token spend and step count.
Our reading of the trend: the industry spent 2025 and early 2026 adding context files because tools made it trivial to generate one, and the default assumption was that more guidance helps. The 2026 research inverts that assumption. The skill that matters going forward is not writing a comprehensive config — it is writing a deliberately small one. The single highest-value content, per the ETH Zurich work, is non-obvious tooling: when a context file explicitly named a specific package manager, agents reportedly used it dramatically more often. We treat the exact multiplier reported in the summary as illustrative, but the direction is unambiguous — tell the agent the things it cannot infer, and nothing else.
"Human-written context files should describe only minimal requirements."— Thibaud Gloaguen, Niels Mündler et al. (ETH Zurich), arXiv:2602.11988
05 — The FixesThe right layer for each kind of rule.
The structural smells share a root cause: everything is dumped into one root config file regardless of when it is actually needed. Anthropic’s documentation describes a layered model where different kinds of guidance live in different places, each with its own loading behavior and context cost. Matching the rule to the right layer is the fix.
Root CLAUDE.md
Reserve for what is always relevant: project overview, architecture, non-obvious tooling, and the gotchas an agent could not infer. Keep it lean — Anthropic's guidance points toward roughly 200 lines.
Rules in .claude/rules/
Rules with a paths field load only during relevant work — a rule for src/api/** never loads during unrelated tasks. The direct fix for Context Bloat: selective loading instead of always-on.
Skills in .claude/skills/
Task-specific workflows live as SKILL.md files. Only the name and description load at session start; the full body loads when the agent invokes the skill. The architectural fix for Skill Leakage.
The same logic extends to the rest of the configuration surface. A subagent runs in its own context window, so its instructions cost nothing in the main session. A hook runs outside the model entirely — zero context cost and, crucially, deterministic enforcement. The mistake almost every smelly file makes is putting medium- and low-frequency content in the always-loaded layer, where it pays full context cost on every single turn whether it is needed or not.
For the low-effort smells, the fix is even simpler. Lint Leakage: delete the rule and let the linter own it. Init Fossilization: schedule a quarterly review and cut anything that no longer matches the project. Blind References: for every external file you cite, add a sentence on why it matters and when to read it. If your team treats agent config like the production code it effectively is, these stop recurring. That discipline is part of what we build into our AI transformation engagements, where agent configuration is reviewed like any other shipped artifact.
06 — EnforcementThe gap between “should” and “must”.
Here is the sleeper finding. A rule in CLAUDE.md is an instruction, and an instruction is probabilistic — the model follows it most of the time, not every time. Practitioner write-ups commonly put CLAUDE.md adherence at roughly 70%; Anthropic itself uses qualitative language rather than a precise figure, noting that Claude may fail to follow a prompted rule. Either way, the conclusion holds: an instruction is a strong nudge, not a guarantee.
A hook is different. Hooks run outside the model’s reasoning chain — wired to events such as PreToolUse and PostToolUse — and they cannot be overridden by the model. That makes them deterministic. The distinction maps directly onto risk: nudges for preferences, hard gates for things that must never happen.
This reframes the entire purpose of the config file. CLAUDE.md is for the things you want the agent to know and generally do — conventions, architecture, preferences. It was never the right place for the things that must not happen under any circumstances. Teams that have been stuffing hard prohibitions into instructions and wondering why they occasionally get ignored were using the wrong layer. The fix is not a better-worded instruction; it is a hook.
07 — FormatAGENTS.md vs CLAUDE.md — which file?
Both file types share the smells, but they differ in reach. AGENTS.md is the broad-compatibility format — as of mid-2026 it is read natively by a wide set of tools including Claude Code, OpenAI Codex CLI, Cursor, Aider, and others, which makes it the pragmatic default for teams using more than one agent. CLAUDE.md is Claude Code-specific. Tooling support shifts quickly, so verify against each tool’s current changelog before standardizing on one format.
Scale data underlines how dominant context files have become. An exploratory study (arXiv:2602.14690) of 2,926 GitHub repositories found 4,860 context files (AGENTS.md / CLAUDE.md) — far more than the 601 Skills across 158 repositories or 452 Subagents across 131. The context file is often the only configuration mechanism a repository uses, which is precisely why getting it right matters so much: for most teams it carries the entire burden of steering the agent.
You use more than one coding agent
AGENTS.md is read natively by a broad set of tools as of mid-2026. Standardize on it so one file steers every agent. Verify support in each tool's changelog before committing.
Your stack is Claude Code end to end
CLAUDE.md unlocks the full layered model — path-scoped Rules, Skills, subagents, and hooks — that the smell fixes depend on. Use it and lean on the layers.
Keep the root file lean
Whichever you choose, the smell research applies equally. Reserve the root file for always-true, non-obvious context and push the rest to scoped layers.
Auto-generate and forget
The ETH Zurich study reports LLM-generated files can reduce success vs no file at all by duplicating docs the agent already reads. Generation is a starting draft, not a finished config.
08 — The ChecklistAuditing your own config in an hour.
You do not need the research apparatus to clean up a config file. The following pass, run against any CLAUDE.md or AGENTS.md you own, will catch the bulk of the smells. Treat it as one habit inside a broader practical agentic engineering workflow. Anthropic frames the core test as a single question for each line: would removing this cause the agent to make mistakes? If not, cut it.
- Delete lint leakage. Any rule a linter, formatter, or type checker already enforces — indentation, quote style, import order — comes out. The tools handle it deterministically.
- Hunt for contradictions. Read the whole file in one sitting and look for rules that conflict. Resolve each to a single source of truth. This is the hardest smell to spot piecemeal.
- Move rare workflows to Skills. Anything that applies to one occasional task — a release process, a migration recipe — belongs in a Skill that loads on demand, not in the always-loaded file.
- Scope path-specific rules. Conventions that only matter for one part of the codebase move to path-scoped Rules so they load only during relevant work.
- Annotate every reference. For each external file you point the agent at, add why it matters and when to read it. No bare links.
- Promote the non-negotiables to hooks. Any rule phrased as “never” or “must not” is a candidate for a hook. Instructions nudge; hooks enforce.
- Schedule a recurring review. Put a calendar hold to re-read the file quarterly and cut stale rules before they fossilize.
09 — ConclusionTreat agent config like the code it is.
A lean config is an asset; a bloated one is a liability.
The research converges on one message: the value of an agent config file is in what you leave out. Ninety-one of 100 popular files carry a smell because nobody prunes them — they get appended to and never reviewed. The fix is not better prose; it is discipline and the right layer for each kind of rule.
The deeper shift is in how we think about context. The intuition that more guidance helps is wrong often enough to be dangerous — auto-generated files can actively hurt, and every file carries an inference tax. The skill that matters now is writing a deliberately small config that surfaces only what the agent cannot infer, then pushing scoped conventions to Rules, rare workflows to Skills, and hard prohibitions to hooks.
Run the one-line test against your own files this week. Delete the lint leakage, resolve the contradictions, and move the “never” rules to hooks. The payoff is concrete: lower token spend, fewer ignored instructions, and an agent that follows the rules that actually matter because they are no longer buried under the ones that never did.