AGENTS.md: ETH Zurich Study on AI Agent Costs
ETH Zurich study finds AGENTS.md and similar convention files increase AI agent inference costs by up to 159% with minimal accuracy gains. Findings and alternatives.
Max Cost Increase
Accuracy Improvement
Research Institution
Recommended Approach
Key Takeaways
Convention files like AGENTS.md, CLAUDE.md, and .cursorrules have become standard practice in AI-assisted development. The idea is simple: write down your coding standards, architecture preferences, and workflow rules in a file that gets loaded into the agent context automatically. Every tool interaction then follows your team conventions. The problem is that this approach has never been rigorously tested for its cost and performance implications.
A 2026 study from ETH Zurich changed that. Researchers systematically measured what happens when convention files are injected into coding agent sessions across multiple model families, task types, and file sizes. The results challenge a core assumption of the AI-assisted development workflow: more instructions do not produce better code. They produce more expensive code, and in many cases, worse code. This guide breaks down the findings, explains the mechanisms behind the cost increases, and provides a concrete alternative strategy that preserves the value of documented conventions without the overhead.
What the ETH Zurich Study Found
The ETH Zurich research team designed a controlled experiment comparing coding agent performance with and without convention file injection. They tested across three dimensions: instruction file size (1,000 to 8,000 tokens), model family (including Claude Opus 4.6 and GPT-5.2), and task complexity (from single-function generation to multi-file refactoring). The methodology isolated the convention file as the only variable, holding all other parameters constant.
- Up to 159% increase in inference costs per session
- Tokens billed on every API call, not just once
- Cost scales linearly with file size and call count
- Larger models amplify the cost penalty further
- Minimal improvement on task completion rates
- Performance degradation on complex multi-step tasks
- Models sometimes followed irrelevant instructions
- Smaller, targeted instruction sets outperformed large files
The central finding was a disconnect between cost and value. Convention files added significant token overhead to every interaction but did not produce proportionally better outcomes. In several test configurations, the convention file actively degraded performance by introducing conflicting or irrelevant instructions that distracted the model from the task at hand.
Why More Context Hurts Performance
Large language models process all tokens in the context window during inference. When a convention file adds thousands of tokens before the actual task description, the model must attend to both the instructions and the task simultaneously. This creates two problems: attention dilution and instruction interference.
Attention Dilution
The model distributes attention across all tokens in the context window. A 4,000-token instruction file competing with a 2,000-token task description means roughly two-thirds of the attention budget goes to instructions, not the actual work. This ratio worsens with larger convention files.
Instruction Interference
When a file contains rules for testing, deployment, naming conventions, and architecture patterns simultaneously, the model may apply a testing rule during a code generation task or apply a naming convention to a refactoring operation where it introduces unnecessary renames.
Contradictory Signals
Large instruction files often contain rules that are contextually contradictory. A rule saying "prefer simple implementations" can conflict with another requiring "comprehensive error handling on every function." The model must resolve these tensions on every call, consuming capacity that could serve the task.
The ETH Zurich team observed that models with smaller, task-relevant instruction sets consistently outperformed those given the full convention file. This is not because the excluded instructions were bad. It is because their presence during unrelated tasks created noise that the model had to process and potentially act on, reducing the quality of the primary task output.
Instruction Types and Their Impact
The study categorized instructions found in convention files into four groups, then measured the impact of each category independently and in combination. The results revealed that instruction categories have vastly different value profiles depending on the task being performed.
Naming patterns, formatting rules, import ordering, comment styles. These showed moderate positive impact during code generation tasks but added pure noise during refactoring and debugging operations.
Task-dependent valueDirectory structure, module boundaries, dependency injection patterns. These were valuable during scaffolding and multi-file tasks but irrelevant for single-function edits, where they sometimes caused over-engineering.
Scope-dependent valueTest framework preferences, coverage expectations, mocking patterns. These were strongly positive when the agent was writing tests, and actively harmful when the agent was generating production code, often triggering unwanted test file creation.
Highly task-specificCommit message formats, PR description templates, deployment checklists. These had near-zero impact on code quality and consumed tokens on every call despite only being relevant during version control operations.
Low value per tokenThe key insight is that no single instruction category is universally valuable. Each category is relevant to a specific subset of operations. Loading all categories into every call means that at any given moment, 60-80% of the injected instructions are irrelevant to the current task. Those irrelevant tokens still cost money and still consume model attention.
The 159% Cost Increase Breakdown
The headline 159% cost increase requires context. This figure represents the worst-case scenario: a large convention file (8,000+ tokens) loaded into a long-running agent session with a frontier model. The typical case is more moderate but still significant. Understanding the math helps teams estimate their own exposure.
CONVENTION FILE COST IMPACT CALCULATOR
═════════════════════════════════════════════════════
Variable │ Low │ Medium │ High
────────────────────────────┼──────────┼──────────┼──────────
Instruction file tokens │ 1,000 │ 4,000 │ 8,000
API calls per task │ 10 │ 40 │ 80
Base task tokens (no file) │ 5,000 │ 8,000 │ 12,000
────────────────────────────┼──────────┼──────────┼──────────
Added tokens per task │ 10,000 │ 160,000 │ 640,000
Cost increase percentage │ ~20% │ ~80% │ ~159%
TEAM COST PROJECTION (per month):
────────────────────────────────────────────────────
Team size: 5 developers
Tasks per developer/day: 8
Working days/month: 22
────────────────────────────────────────────────────
Monthly added tokens: 5 × 8 × 22 × 160,000
= 140,800,000 input tokens
At frontier model pricing, this overhead becomes a
meaningful line item in the engineering budget.The cost compounds because convention files are input tokens, not output tokens. Input tokens are cheaper per unit, but the volume is massive. A 4,000-token file across 40 calls adds 160,000 input tokens to a single task. Across a five-person team running eight tasks per day over a month, that is over 140 million unnecessary input tokens. Even at the lower per-token rates of newer models, this represents real money that produces minimal value.
Tiered Instruction Strategy
The ETH Zurich study does not recommend abandoning convention files entirely. Instead, it recommends restructuring how instructions are delivered. The tiered injection strategy categorizes instructions by operation type and injects only the relevant subset for each call. This approach preserves the value of documented conventions while eliminating the cost and performance overhead of monolithic injection.
Always injected. Language version, framework, import paths, critical project structure. Keep under 500 tokens.
~500 tokens per call
Injected based on operation type. Coding conventions for generation, architecture rules for scaffolding, test patterns for test writing.
~500-1,500 tokens per call
Never auto-injected. Workflow preferences, deployment checklists, commit message formats. Available on explicit request only.
0 tokens unless requested
The tiered approach reduces injected tokens by 60-80% on average. Tier 1 provides the minimal context every call needs. Tier 2 adds operation-specific rules only when they are relevant. Tier 3 removes instructions that are useful for humans reading the documentation but add no value to the model during code generation. The study found that this structure maintained or improved accuracy on every task category tested.
What to Do Instead of AGENTS.md
The practical alternative is not to eliminate convention files but to restructure them. Most teams can implement the tiered strategy without changing their tooling. The key is separating what the agent needs from what the human developer needs, and further separating what the agent needs always from what it needs conditionally.
- Audit your current convention file. Categorize every instruction into one of the four types: coding conventions, architecture rules, testing requirements, or workflow preferences. Count the tokens in each category. Most teams find that 40-60% of their file is workflow preferences that add zero value during code generation.
- Create a minimal core file. Extract only the instructions that apply to every single operation: language version, framework, critical import paths, and project structure. This becomes your Tier 1 file. Target 500 tokens or fewer.
- Build task-specific instruction sets. Group the remaining valuable instructions by operation type. Create separate sections or files for code generation, testing, refactoring, and architecture decisions. Each set should stay under 1,500 tokens.
- Move workflow preferences to documentation. Commit message formats, PR templates, and deployment checklists belong in your contributing guide or project wiki, not in the agent context. These are human-readable references that the model does not need during code operations.
- Measure the difference. Run the same set of coding tasks with your original monolithic file and with the tiered approach. Compare token consumption, output quality, and task completion time. The data will confirm whether the restructuring delivers value for your specific codebase.
For teams using Claude Code, the CLAUDE.md file supports this approach natively. You can structure the file with clear section headers and rely on the tool to extract relevant portions. For Cursor and other tools, consider splitting .cursorrules into multiple rule files if the tool supports it, or maintaining a lean primary file with extended rules in referenced documents. For a deeper look at how Claude Code handles convention files in practice, see our Claude Code Remote Control Feature Guide.
The Future of Agent Convention Files
The ETH Zurich study arrives at a moment when the ecosystem is actively evolving. Tool vendors are beginning to implement context-aware instruction loading, where the tool itself determines which subset of conventions to inject based on the detected operation type. This is the automated version of the tiered strategy described above.
Claude Code already performs partial context management by analyzing which portions of CLAUDE.md are relevant to the current task. Cursor has introduced rule scoping features that allow different rules for different file types. GitHub Copilot is experimenting with repository-level instruction weighting. These developments suggest that within 12-18 months, the monolithic convention file pattern will be replaced by structured instruction registries that tools consume intelligently.
For teams building custom AI agents, the implications are clear. The convention file is not disappearing, but its format is changing. Instead of a single flat text file loaded verbatim into the system prompt, the future is a structured configuration that maps instruction categories to operation types, with the orchestration layer handling selective injection. Teams that adopt this pattern now will see immediate cost savings and be positioned for seamless migration as tooling catches up. For guidance on building agents with this architecture in mind, see our Build and Sell Custom AI Agents Developer Guide.
Putting It All Together
The ETH Zurich study provides the first rigorous evidence that convention files, as currently implemented, impose a significant cost penalty with minimal accuracy benefit. The 159% worst-case cost increase and the observed performance degradation on complex tasks challenge the assumption that more instructions always produce better agent output.
The solution is not to abandon convention files but to restructure how they are consumed. The tiered injection strategy reduces token overhead by 60-80% while maintaining or improving code quality. Audit your current file, separate core instructions from task-specific rules and reference documentation, and measure the impact on your own codebase. The data consistently shows that focused, relevant instructions outperform comprehensive but unfocused ones.
As AI-assisted development matures, the convention file will evolve from a flat text document into a structured instruction registry. Teams that adopt tiered injection now benefit from immediate cost savings and a smoother transition as tooling vendors formalize these patterns. The research is clear: less context, more carefully delivered, produces better code at lower cost.
Optimize Your AI Development Costs
Our AI and development team helps engineering organizations restructure agent workflows, reduce inference costs, and implement tiered instruction strategies that improve code quality.
Frequently Asked Questions
Related Guides
Continue exploring AI development tools and agent optimization.