Practical Agentic Engineering Workflow: Production Guide 2025

Agentic engineering has moved from experimental to production-ready. AI agents now write 100% of production code—not because of elaborate prompting tricks or complex tooling charades, but because we finally understand how to work with them effectively.

This isn't another post celebrating AI achievements with unrealistic benchmarks. It's a practical playbook from someone shipping a ~300K LOC TypeScript React app, Chrome extension, CLI tool, Tauri client app, and Expo mobile app—all built primarily by AI agents working in parallel terminal grids.

The Core Insight: The breakthrough in agentic engineering isn't better prompts or tools—it's understanding concepts like "blast radius," running parallel agents in the same folder, and just talking to models like human colleagues instead of elaborate instruction engineering.

The Agentic Engineering Revolution

We crossed a threshold in 2025. Models moved from "this is interesting" (May 2024 with Sonnet 4.0) to "this is production ready" (GPT-5 Codex). The difference isn't just benchmark scores—it's how models approach problems, read codebases, and maintain context across complex changes.

From Code Completion to Code Generation

Traditional AI coding tools focused on tab completion—suggesting the next few lines based on immediate context. Agentic engineering represents a fundamental shift: agents reason about entire features, make architectural decisions, coordinate changes across dozens of files, and handle complex refactoring that would take human developers hours or days.

The workflow transition looks like this:

Old: Write detailed specs, review 100+ file changes, manually fix inconsistencies
New: Start conversations with screenshots, watch features build live, queue related changes, iterate in real-time

Digital Applied Expertise: Our AI & Digital Transformation team helps agencies implement agentic workflows that accelerate development cycles while maintaining code quality standards.

Model Selection: GPT-5 Codex vs Claude Code

After months working daily with both GPT-5 Codex and Claude Code, clear patterns emerged. This isn't about benchmark leaderboards or marketing claims—it's practical observations from shipping real-world applications.

Why GPT-5 Codex Won for Daily Work

Context Superiority

230K usable context vs Claude's 156K. While Claude offers 1M token context window if you get lucky, it gets "silly" long before depleting that context realistically. Codex maintains coherence across far more files and longer conversations.

Token Efficiency

Context fills far slower. Whatever OpenAI does differently, context bloat is significantly reduced compared to Claude Code's frequent "Compacting..." messages. This means longer work sessions without context reset.

Message Queuing

Queue related tasks. Codex allows queuing multiple messages for sequential execution. Claude changed this behavior months ago so messages "steer" the model mid-work. Having both options (queue vs steer) is far better—press escape and enter when you want steering behavior.

Performance & Reliability

Rust-based CLI, no memory bloat. Codex is incredibly fast with no multi-second freezes, no gigabyte memory bloat, no terminal flickering. It feels lightweight and responsive in ways Claude Code doesn't match.

Professional Language

No "absolutely perfect" false confidence. Claude's language ("absolutely right," "100% production ready" while tests fail) causes genuine frustration. Codex communicates more like an introverted engineer—chugs along, gets stuff done, pushes back on silly requests. This matters for mental health during long coding sessions.

Reading Before Acting

The most underrated difference: GPT-5 Codex reads far more files before deciding what to do. It pushes back harder when you make questionable requests. Claude and other agents are more eager—they just try something even when uncertain.

This changes prompt engineering fundamentally. With Claude, you need extensive context in prompts to compensate. With Codex, prompts became significantly shorter—often just 1-2 sentences plus an optional screenshot. The model already understands your codebase deeply before suggesting changes.

The Blast Radius Principle

"Blast radius" refers to estimating how many files a change will touch before executing it. This concept transforms how you think about agent orchestration and parallel workflows.

Small Bombs vs Fat Man

When planning work, you have intuition about complexity and scope. You can throw many small bombs at your codebase or one "Fat Man" and a few small ones. The blast radius determines your approach:

Small blast radius (1-5 files): Perfect for parallel agents, easy rollback, clean atomic commits
Medium blast radius (6-20 files): Single agent with monitoring, "what's the status" check-ins
Large blast radius (20+ files): Consider "give me options before making changes" to gauge impact
Multiple large bombs: Impossible to do isolated commits, much harder to reset if something goes wrong

Pro Tip: If something takes longer than anticipated, hit escape and ask "what's the status." Get status update, then either help the model find direction, abort, or continue. Don't be afraid of stopping models mid-way—file changes are atomic and they pick up where they stopped.

Developing Blast Radius Intuition

Over time you develop feelings for task complexity. You know before starting if a change will touch 3 files or 30. This intuition guides decisions about:

Whether to use parallel agents or a single focused agent
Whether to ask for options/plan first or just start building
Whether work needs separate folder/context or can share main context
How long to let an agent run before checking status or intervening

Parallel Agent Orchestration Strategy

Running 3-8 agents in parallel in a 3x3 terminal grid, most in the same folder, some experiments in separate folders. This setup ships features faster than any traditional branching strategy.

Why Same-Folder Beats Worktrees

One dev server, one application, multiple simultaneous changes. As the project evolves, click through and test multiple changes at once. This workflow proves significantly faster than alternatives:

Same-Folder Advantages

One dev server: Test all changes together by clicking through application
Atomic commits: Each agent commits only files it edited
Real-time integration: See how changes interact immediately
No OAuth limits: Single domain for all callback testing
Faster iteration: No context switching between branches/servers

Worktree/Branch-Per-Feature Drawbacks

Multiple dev servers: Quickly gets annoying and resource-heavy
OAuth limitations: Can only register some domains for callbacks
Context switching: Slower to test interactions between changes
Setup overhead: Spinning up/down environments adds friction

Atomic Git Commits Per Agent

Agents make atomic git commits themselves for exactly the files they edited. Maintaining clean commit history required iterating on agent configuration to make git operations sharper. The result: 3-8 agents working simultaneously with minimal merge conflicts.

Models are incredibly clever—no hook will stop them if they're determined, but clear instructions in agent configuration work well. The key: explain that multiple agents work in the same folder and each should only commit their own changes.

Prompt Engineering: Just Talk To It

The controversial truth: elaborate prompting tricks and agent instructions mostly don't matter with GPT-5 Codex. The model is good enough that you can just talk to it like a human colleague.

From Verbose to Concise

With Claude, extensive prompts helped compensate for context gaps—the more context supplied, the better results. With GPT-5 Codex, prompts became dramatically shorter. Often just 1-2 sentences plus a screenshot.

At least 50% of prompts contain screenshots. Drag image into terminal, model finds exactly what you show, matches strings, arrives at the right place. No annotation needed (though it helps for complex cases). A screenshot takes 2 seconds and provides immense context.

Workflow Shift: Instead of "You are a senior software engineer specializing in React. Follow these 10 rules…" try "let's discuss this approach" or "give me a few options before making changes." The model waits for approval without elaborate plan mode charades.

Trigger Words That Help

When things get hard, certain phrases improve results noticeably:

"take your time" — prevents rushing through complex problems
"comprehensive" — encourages thorough analysis
"read all code that could be related" — broader context gathering
"create possible hypothesis" — explores multiple solution paths
"preserve your intent" — maintains code purpose through changes
"add code comments on tricky parts" — helps future model runs

But these are gentle nudges, not elaborate instructions. The fundamental approach remains conversational.

Conversational Development Pattern

Start discussions with Codex by pasting websites, sharing ideas, asking it to read code. Flesh out features together. For complex features, ask it to write everything into a spec, send that to GPT-5-Pro via chatgpt.com for review (surprisingly often improves the plan significantly), then paste back useful suggestions.

For UI work: start with something woefully under-specified. Watch the model build. See browser update in real-time. Queue additional changes and iterate. Often don't fully know how something should look—play with ideas, see them come to life. Sometimes Codex builds something interesting you didn't even think of. Don't reset, iterate and morph the chaos into shape.

The Tooling Ecosystem Reality Check

Controversial opinion: Most agentic engineering tools solve non-problems. RAG, elaborate MCPs, custom plugins, subagent orchestration systems—these work around current inefficiencies that GPT-5 Codex largely eliminated.

Why MCPs Are Usually Wrong

Almost all MCPs should be CLIs instead. Reference a CLI by name, models know how to use it from world knowledge, zero context tax. The CLI presents help menu on first incorrect call, context now has full info, and it works from then on.

MCP Context Tax Example

GitHub MCP: 23K tokens gone (was 50K at launch)
gh CLI alternative: Same feature set, zero context tax
Models already know gh CLI from world knowledge—no explanation needed

Exception: chrome-devtools-mcp for closing the loop on web debugging. Replaced Playwright for browser interaction. But even this isn't needed daily—most API endpoints are faster and more token-efficient via curl with API keys.

Subagents vs Separate Windows

Subagents were originally called "subtasks" in May—a way to spin out work into separate context when the model doesn't need full text, mainly for parallelization or reducing context waste for noisy build scripts.

What others do with subagents, do with separate terminal windows. This gives complete control and visibility over context engineering, unlike subagents which make it harder to view and steer what gets sent back.

If you want to research something, do it in a separate terminal pane and paste results to another. Simple, controllable, visible.

The Plugin/Agent Instructions Reality

Claude Code promotes elaborate agent instructions and plugins as ways to improve model behavior. Looking at their recommended "AI Engineer" agent reveals the problem: an autogenerated soup of words mentioning GPT-4o and o1 for integration with no actual meat.

Telling a model "You are an AI engineer specializing in production-grade LLM applications" doesn't change output quality. Giving it documentation, examples, and do/don't patterns helps. You'd get better results asking the agent to "google AI agent building best practices" and load some websites than using vague role-play instructions.

Practical Approach: Keep agent configuration lean. Document project-specific patterns, newer tech than world knowledge cutoff, preferred naming conventions. Skip role-play and vague instructions. The model already knows how to code—just give it your project context.

Real-World Implementation Patterns

After months shipping production code with agentic workflows, certain patterns consistently outperform others. These aren't theoretical—they're battle-tested approaches from real projects.

The Web-Based Agent Role

Codex web serves as short-term issue tracker. Ideas on the go via iOS app, review later on Mac. Intentionally limiting mobile capabilities—work is already addictive enough without constant pull-in during downtime.

Originally didn't count toward usage limits, but those days are numbered. Still valuable for capturing ideas without disrupting flow.

Background Task Management

GPT-5 Codex currently lacks background task management—one of Claude's advantages. CLI tasks that don't end (dev servers, tests that deadlock) can get stuck.

Workaround: tmux. Old tool for running CLIs in persistent background sessions. The model has plenty of world knowledge about tmux—just prompt "run via tmux." No custom agent.md charade needed.

Queue Up Continue Messages

Instead of perfect prompts to motivate continuation on long-running tasks, use lazy workarounds. For bigger refactors where Codex often stops mid-work, queue up continue messages if you want to step away. If Codex finishes and gets more messages, it happily ignores them.

Write Tests After Each Feature

Ask the model to write tests after each feature/fix is done—use the same context. This leads to far better tests and likely uncovers bugs in your implementation. If it's purely UI tweaks, tests make less sense. For anything else, do it.

AI generally is bad at writing good tests, but tests written with full implementation context in the same conversation prove far better than separate test generation.

Agent Configuration Approach

Agent file is ~800 lines of organizational scar tissue. Didn't write it personally—Codex did. Anytime something happens, ask it to make a concise note. This grew organically from actual pain points, not speculative "best practices."

Key sections: git instructions, product explanation, naming/API patterns, React Compiler notes, preferred React patterns, database migration management, testing, ast-grep rules. Things newer than world knowledge cutoff get documented; things the model already knows get removed.

Refactoring & Code Quality Management

Spend about 20% of time on agent-driven refactoring. All refactoring is done by agents—no manual time wasted. Refactor days are great when needing less focus or feeling tired, since you can make great progress without intense concentration.

Typical Refactoring Work

Code duplication: Using jscpd to identify and consolidate
Dead code: Running knip to find unused exports and imports
React Compiler: eslint react-compiler and deprecation plugins
API consolidation: Checking for routes that can be merged
Documentation: Maintaining docs, adding comments for tricky parts
File size: Breaking apart files that grew too large
Test quality: Finding and rewriting slow tests
Modern patterns: Updating to latest React patterns (you might not need useEffect)
Dependencies: Tool upgrades and version updates
File structure: Reorganizing for better clarity

Rhythm Pattern: Phases of iterating fast then maintaining and improving the codebase (paying back technical debt) prove far more productive than trying to perfect each commit. This rhythm is also more fun—fast innovation alternating with quality improvements.

The "Code is Slop" Argument

Critics argue AI-generated code is "slop." The response: 20% of time on refactoring addresses this through systematic quality maintenance. This isn't unique to AI—human-written code also accumulates technical debt without regular cleanup.

The advantage: agents execute refactoring far faster than humans. What would take days of manual work completes in hours. This makes regular quality maintenance actually sustainable rather than perpetually postponed.

Conclusion: Develop Intuition, Skip Charades

Don't waste time on stuff like RAG, subagents, elaborate agent instructions, or custom tooling that solves non-problems. Just talk to it. Play with it. Develop intuition. The more you work with agents, the better your results will be.

Many skills needed to manage agents mirror managing human engineers—characteristics of senior software engineers. Understanding task complexity, breaking down problems, giving clear direction, knowing when to intervene or let work continue. These are fundamentally people skills applied to AI systems.

Yes, writing good software is still hard. Just because you don't write code anymore doesn't mean you don't think hard about architecture, system design, dependencies, features, or how to delight users. Using AI simply means expectations of what to ship went up dramatically.

Ready to Accelerate Development with AI Agents?

Whether you're evaluating AI coding tools, implementing agentic workflows, or scaling development operations, our team helps agencies navigate this transformation with practical, battle-tested strategies.

Get Started Explore AI Transformation Services

Free consultation

Expert guidance

Tailored solutions

Practical Agentic Engineering Workflow: Production Guide 2025

Key Takeaways

The Agentic Engineering Revolution

From Code Completion to Code Generation

Model Selection: GPT-5 Codex vs Claude Code

Why GPT-5 Codex Won for Daily Work

Reading Before Acting

The Blast Radius Principle

Small Bombs vs Fat Man

Developing Blast Radius Intuition

Parallel Agent Orchestration Strategy

Why Same-Folder Beats Worktrees

Atomic Git Commits Per Agent

Prompt Engineering: Just Talk To It

From Verbose to Concise

Trigger Words That Help

Conversational Development Pattern

The Tooling Ecosystem Reality Check

Why MCPs Are Usually Wrong

Subagents vs Separate Windows

The Plugin/Agent Instructions Reality

Real-World Implementation Patterns

The Web-Based Agent Role

Background Task Management

Queue Up Continue Messages

Write Tests After Each Feature

Agent Configuration Approach

Refactoring & Code Quality Management

Typical Refactoring Work

The "Code is Slop" Argument

Conclusion: Develop Intuition, Skip Charades

Ready to Accelerate Development with AI Agents?

Frequently Asked Questions

Related Articles

Key Takeaways

The Agentic Engineering Revolution

From Code Completion to Code Generation

Model Selection: GPT-5 Codex vs Claude Code

Why GPT-5 Codex Won for Daily Work

Reading Before Acting

The Blast Radius Principle

Small Bombs vs Fat Man

Developing Blast Radius Intuition

Parallel Agent Orchestration Strategy

Why Same-Folder Beats Worktrees

Atomic Git Commits Per Agent

Prompt Engineering: Just Talk To It

From Verbose to Concise

Trigger Words That Help

Conversational Development Pattern

The Tooling Ecosystem Reality Check

Why MCPs Are Usually Wrong

Subagents vs Separate Windows

The Plugin/Agent Instructions Reality

Real-World Implementation Patterns

The Web-Based Agent Role

Background Task Management

Queue Up Continue Messages

Write Tests After Each Feature

Agent Configuration Approach

Refactoring & Code Quality Management

Typical Refactoring Work

The "Code is Slop" Argument

Conclusion: Develop Intuition, Skip Charades

Ready to Accelerate Development with AI Agents?

Frequently Asked Questions

Why choose GPT-5 Codex over Claude Code for agentic engineering?

What is 'blast radius' in the context of agentic development?

How do parallel agents work without conflicting with each other?

Should I use elaborate agent instructions and plugins or simple prompts?

How do you handle code quality with 100% AI-generated code?

What about Model Context Protocol (MCP) tools and custom plugins?

How has agentic engineering changed your development workflow?

Do you need separate environments for each agent or feature?

What's the learning curve for effective agent management?

Is this approach suitable for team collaboration or solo development?

Related Articles