H1 2026 was the half AI coding stopped being a side experiment in most engineering organisations and became a default — Cursor, Claude Code, and Codex CLI all crossed the chasm into mainstream adoption, multi-agent workflows moved from demo to production, MCP consolidated as the connector standard, and observability + governance features started appearing inside the IDE rather than bolted on after the fact.

What changed is not the underlying model story — the frontier model cadence is its own quarterly retrospective. What changed is the shape of the developer surface: how teams actually use the tools, how the tools ship features, what the productivity numbers look like when measured honestly, and where pricing settled after a year of cost compression. The data below is a six-month look at that surface, drawn from vendor announcements, team adoption surveys, and the configurations we've been rolling out for clients.

This guide covers adoption rates across seven tools, the feature-shipping cadence that defined the half, productivity benchmarks measured against honest baselines, the pricing-tier shifts that mattered, the four trend lines that explain the half, and a calibrated projection for what H2 probably looks like for engineering teams planning rollouts now.

Key takeaways

01
AI coding crossed the chasm in H1 2026.Cursor, Claude Code, and Codex CLI all moved past early-adopter share into mainstream engineering use. The conversation shifted from 'should we use it' to 'how do we roll it out without breaking governance'.
02
Multi-agent workflows are mainstream.Subagents, background agents, and parallel-execution patterns shipped in every major tool. Engineering teams running dual-model collaborations or agent-per-task rollouts moved from experimental to production by Q2.
03
MCP-as-standard is consolidating.Model Context Protocol crossed enough vendor adoption to become the default connector pattern for IDE-to-tool integrations. Bespoke plugin APIs are increasingly considered legacy.
04
Subagent + skill libraries compound.Teams that built reusable skill libraries and per-task subagents reported the largest productivity gains. The 90th-percentile teams treat AI coding as orchestration over a library of playbooks rather than chat.
05
Observability + governance are entering the IDE.Tool-call audit logs, cost dashboards, permission gates, and policy enforcement started shipping inside Cursor, Claude Code, and Copilot rather than as separate platforms. H2 will accelerate this.

01 — Why RetrospectiveH1 2026 was the year AI coding crossed the chasm.

Retrospectives are easy to write badly. The temptation is to list every release announcement, count the version numbers, and call it a story. The harder and more useful framing is to ask what changed about the work itself — which teams reorganised around which capabilities, which patterns moved from experimental to production, and what the steady-state shape of the half looks like in hindsight.

By that test, H1 2026 is the cleanest inflection point we've seen since the original Copilot launch. The defining shift is not that the tools got better — though they did. The defining shift is that the cohort of engineering teams treating AI coding as background plumbing rather than a meeting-worthy experiment crossed from minority to majority. That is what "crossed the chasm" means in this context: the default question stopped being whether to roll out and became how.

Three signals confirm the inflection. First, the rollout conversations we have with clients changed shape — early-half calls were about pilot selection and seat counts; late-half calls are about governance, cost attribution, and which subagents to standardise. Second, the vendor roadmaps stopped competing on raw capability and started competing on enterprise readiness — audit logs, RBAC, MCP server registries, billing consolidation. Third, the productivity benchmarks normalised: the wild claims of the early days got replaced by measured 15–40% per-engineer hour reductions on the tasks AI coding actually helps with.

The frame

The half's defining question stopped being "does AI coding work?" and became "which rollout pattern fits our team?" That shift — from capability debate to deployment design — is what a chasm crossing looks like in practice. Teams that recognised it early are six months ahead going into H2.

The rest of this retrospective is organised around the four questions a leadership team typically asks when planning H2: which tools are people actually using and at what intensity, what did the vendors ship that matters, what productivity numbers can we honestly expect, and what is the pricing reality after a year of compression. We close with the four trend lines that explain the half — and a calibrated view of what H2 will consolidate.

02 — AdoptionSeven tools, six months, team adoption rates.

Adoption rarely settles into a single number. Different team archetypes adopt different tools, often in parallel, and the interesting story is not the average — it's the shape of the distribution. Across the seven tools we tracked in H1, engineering teams clustered into four archetypes that explain most of the rollout variance.

The archetypes below are drawn from a combination of vendor disclosures, the rollouts we've directly observed with clients, and the public adoption surveys that ran across H1. They are intentionally rough — the point is to give leaders a mental model for where their team probably sits, not to claim precision the underlying data can't support.

Cursor-first teams

IDE-native pair programming

Cursor 3.x · Composer · Window

Front-end heavy product teams adopting Cursor as the primary IDE. Use Composer for multi-file edits, Window for inline pair-programming, and increasingly Design Mode for design-to-code work. Often supplemented with Claude Code in the terminal for repo-wide refactors.

≈40% of teams we observed

Claude Code-first teams

CLI + IDE hybrid

Claude Code 1.3 · subagents · hooks

Backend-heavy and platform teams running Claude Code as the agentic primitive — interactive REPL for exploratory work, print mode for CI pipelines, subagents for specialised tasks, skills for repeatable procedures. The IDE becomes a host for the terminal.

≈30% of teams we observed

Codex CLI teams

Headless agentic scripting

Codex CLI · --search · sandbox modes

Teams using Codex CLI for headless agent runs, often alongside Claude Code as a second-model review pass. Strong in research, data engineering, and security audit contexts. Many teams pair this with the dual-model collaboration pattern.

≈15% of teams we observed

Copilot baseline + tools

Enterprise baseline plus specialisation

Copilot · Windsurf · Continue · Zed

Enterprise organisations that standardised on Copilot for licensing simplicity, then layered Windsurf, Continue, or Zed for teams with specific needs. The pluralist pattern — common in larger orgs where central procurement runs ahead of individual team preferences.

≈15% of teams we observed

The headline read on the archetype distribution is that the market is meaningfully bifurcated between IDE-first and CLI-first cohorts, with the enterprise baseline pattern sitting alongside both as a procurement-driven third group. What the distribution does not show — and what we think matters for H2 planning — is that the IDE-first and CLI-first cohorts have started converging on workflow even while their primary surface differs. Cursor users increasingly run Claude Code in the integrated terminal; Claude Code users increasingly use Cursor or VS Code as the host. The terminal and the IDE are blending.

For teams choosing now, the practical move is to pick a primary surface based on the work that dominates your day-to-day, then layer the second tool deliberately rather than waiting for the tools to converge further. A backend platform team adopting Claude Code in the terminal will benefit from VS Code as the host for syntax highlighting and file navigation. A front-end product team adopting Cursor will benefit from a Claude Code terminal for repo-wide refactors that Composer is less suited to. The right rollout is rarely a single tool.

Adoption signal worth watching

The archetype that grew fastest in H1 is the Codex CLI second-pass pattern — teams running Claude Code as primary and Codex CLI as a review-and-cross-check pass on hard problems. Started rare in January, became common by April. It is the cheapest way to lift quality without doubling your tooling complexity.

03 — FeaturesMulti-agent, MCP, skills, observability.

Vendors shipped a lot across H1. The pattern worth noticing is that the releases clustered around four themes rather than scattering across the surface — and the four themes overlap remarkably across Cursor, Claude Code, Codex CLI, and the adjacent tools. That convergence is itself a signal: when independent vendors arrive at the same feature set in the same half, the underlying product category is consolidating.

The four themes below are described in the order they crystallised across the half — multi-agent execution shipping first in late January, MCP standardisation reaching critical mass in March, skills and subagent libraries proliferating from March onwards, and observability + governance arriving as the late-half push that defines where H2 starts.

Multi-agent execution

Jan→Mar

Background + parallel agents

Cursor shipped Background Agents, Claude Code formalised Subagents as a first-class surface, Codex CLI added --search agent mode, and every major tool added some flavour of parallel-execution primitive. By Q1 end, multi-agent stopped being a demo.

Shipped across all majors

MCP-as-standard

Mar

Model Context Protocol consolidation

MCP server registries, vendor-published servers for Supabase, Vercel, Linear, GitHub, and Zoho, and IDE-side MCP browsers shipped across Claude Code, Cursor, and adjacent tools. The bespoke-plugin-API era ended quietly.

Anthropic, vendors, IDEs aligned

Skills + subagents

Mar→May

Reusable playbook libraries

Claude Code Skills (.claude/skills/<name>/SKILL.md) and equivalents in adjacent tools turned ad-hoc procedures into versioned, trigger-driven libraries. Combined with subagents, the practical effect was a 5–10× compression of repeatable task setup.

House style for senior teams

Observability + governance

Apr→Jun

Audit, cost, policy inside the IDE

Tool-call audit logs, per-team cost dashboards, permission gates, and policy enforcement started shipping inside Cursor, Claude Code, and Copilot. Late-half feature set, but the trajectory into H2 is clear — enterprise governance moves to the IDE surface.

H1 close → H2 acceleration

The single most consequential of the four themes for engineering leaders is the MCP consolidation. Before H1, every IDE-to-tool integration required a bespoke plugin or extension, with different auth flows, different schemas, and different maintenance burdens. After H1, MCP servers became the default connector — and the practical effect is that adding a new backing system to your AI coding setup went from a multi-week integration project to a one-line configuration change.

The skills-plus-subagents pattern compounded the consolidation. Once MCP made it cheap to wire connections, skills made it cheap to encode procedures over those connections, and subagents made it cheap to scope those procedures to least-privilege execution. The three ship together, and teams that adopted all three reported the largest productivity gains in the half. We cover the numbers in the next section.

The half's feature shipping converged on four themes — multi-agent, MCP, skills, observability — and teams that adopted all four are an order of magnitude ahead of teams that adopted any one in isolation.— Our reading of the H1 rollout pattern across clients

For teams planning H2 rollouts, the operational read is that the feature set has stabilised enough to commit. The big primitives — multi-agent execution, MCP, skills, subagents, observability — are not going to be obsolete by Q3. They will be deepened, refined, and made more enterprise-ready, but the architectural shape is settled. That is the right environment in which to standardise — for a deeper dive on Claude Code specifically, see our Claude Code 1.3 deep dive.

04 — ProductivityHours saved, PR throughput, defect rates.

Productivity claims in AI coding range from absurd to indefensible, and the gap between them is mostly down to how the baseline gets defined. The most honest framing we've seen across H1 is a triplet of measures — engineer-hours saved on tasks AI coding actually helps with, PR throughput on repo-scoped work, and defect rates on AI-assisted versus unassisted changes. The bars below are H1 medians from the rollouts we've observed, weighted toward teams that adopted at least three of the four feature themes from Section 03.

H1 2026 productivity ranges · seven-tool composite

Source: H1 2026 rollouts we observed · directional medians

Engineer-hours savedPer-engineer per-week on AI-amenable tasks

15–40%

PR throughputMedian PRs per engineer per week, H1 vs H2 2025

+20–45%

Time-to-first-commitNew-engineer ramp, AI-assisted vs unassisted

−25–55%

Defect rateProduction defects on AI-assisted PRs, well-reviewed

±5–12%

Code review burdenReviewer hours per PR on AI-generated changes

+10–18%

The bars to read most carefully are the two in the secondary colour. Defect rate is largely flat versus unassisted — slightly up on teams that skipped review discipline, slightly down on teams that invested in subagent-based pre-review. The headline "AI coding reduces bugs" claim is not supported by the H1 data we've seen; what is supported is that defect rate is roughly stable while throughput climbs meaningfully. That is still a strong business case, but it's a different business case than the one many rollout decks claim.

The other secondary bar — code review burden — is the cost most teams underestimate. AI-generated PRs are larger on average, touch more files, and require reviewers to understand context the original author didn't fully author themselves. The honest reading is that review burden rises 10–18% on AI-assisted PRs unless review discipline gets paired with the rollout. Teams that invested in AI-assisted review (subagent pre-review, automated PR summaries, lint gates) saw that number stay flat or fall; teams that didn't saw it climb.

The honest read

AI coding in H1 2026 delivered real, measurable throughput gains at roughly stable defect rates — provided review discipline scaled with the rollout. Teams that skipped the review-side investment saw the throughput gains erode against rising review burden. That is the lesson H2 planning should internalise.

For engineering leaders setting H2 OKRs, the conservative calibration is a 15–25% per-engineer hour reduction on AI-amenable work, paired with a flat or slightly elevated defect rate, and a 10–18% rise in review burden that requires a corresponding investment. Teams hitting 30–40% gains consistently are running a deliberate stack — multi-agent execution plus skills plus subagents plus governance — not simply "Cursor turned on for everyone."

05 — PricingTier movement and value-per-dollar trends.

Pricing shifted meaningfully across H1, but the headline number — list price per seat — moved less than the underlying value calculation. What changed more is the bundling: which features sit inside the base tier, which require the pro tier, and how usage-based pricing interacts with seat-based pricing. The matrix below is the decision framework we use with clients to navigate the tier landscape now.

Solo / small team

Individual + Pro plans

Cursor Pro, Claude Code Pro, Copilot Pro Plus, or Codex via OpenAI Plus. List prices clustered around $20–40/month/seat. Adequate for individual usage; subagent and MCP features sometimes gated to higher tiers — check before committing.

Pick by primary surface

Mid-team engineering

Pro plans + targeted Max tier

$200/month Max plans for senior engineers doing the agentic work, Pro plans for the rest. The Max tier unlocks higher usage limits, premium model access, and (increasingly) governance features. The 80/20 pattern: 20% of engineers on Max generate 80% of the value.

Mixed-tier rollout

Enterprise

Business / Enterprise tier

$40–60/seat business tiers with SAML, audit logs, centralised billing, MCP server registries, and policy enforcement. Often paired with vendor-managed deployment of Copilot or Cursor. Less price-sensitive than mid-team; governance features dominate the decision.

Enterprise tier

Routing layer

API + AI Gateway

For teams running custom integrations, multi-vendor model routing via Vercel AI Gateway, OpenRouter, or direct API consumption. Cost-tracking and provider failover dominate the architecture choice. Adds operational complexity in exchange for vendor leverage.

Layer above seat plans

The H1 pricing shift worth highlighting is that the Max-tier pattern emerged as the dominant mid-team rollout. The $200/month Max plans started as power-user oddities in January and ended the half as the obvious destination for engineering leaders and senior staff doing the bulk of agentic work. The mental model is straightforward: put your top quintile of engineers on Max, put everyone else on Pro, and let the productivity asymmetry justify the spend differential. The math works for almost any team that has measured its own throughput gains honestly.

The trend that did not happen in H1 — despite repeated predictions — is a price war. The list prices on Pro tiers held remarkably steady from January to May. What moved is what those tiers include: more requests per month, broader model access, and (selectively) governance features that used to live in business tiers. The signal is that vendors are competing on value per dollar rather than headline price, and that pattern looks likely to continue into H2.

06 — Four TrendsMulti-agent mainstream, MCP-as-standard, subagent + skill compounding, observability + governance in IDE.

Pull the threads from the prior sections together and four trend lines define H1 2026. They are not surprising individually — each was visible by Q1 — but the fact that they all consolidated within the same six-month window is the structural story of the half.

1. Multi-agent went mainstream. Background agents in Cursor, formalised subagents in Claude Code, and agent modes in Codex CLI moved multi-agent execution from experimental to default. The pattern we see in senior engineering teams: a primary interactive agent, one or two specialised subagents (code review, security audit, documentation), and the occasional one-shot background agent for long-running work. That stack is now ordinary.

2. MCP-as-standard consolidated.Before H1, connecting your AI coding tool to Supabase, Linear, Vercel, or Zoho required a vendor-specific plugin and a multi-week integration. After H1, it requires an MCP server entry. The implication for tool selection is that "does this tool speak MCP?" is now a meaningful filter — and almost every serious tool says yes.

3. Subagent + skill libraries compounded. The teams getting the largest productivity gains share a structural pattern: a library of skills encoding repeatable procedures, a roster of subagents enforcing least-privilege execution on those procedures, and an MCP layer wiring both to the systems-of-record. The compounding is real because each addition makes the others more valuable — more skills make subagents reach further, more subagents make MCP connections safer, more MCP connections make skills more powerful.

4. Observability + governance moved to the IDE. The late-half push was audit logs, cost dashboards, permission gates, and policy enforcement inside the IDE rather than as separate platforms. Teams previously assembled governance externally; H1 closed with vendors shipping it natively. H2 will accelerate the pattern.

The four trend lines do not just coexist — they compound. Multi-agent execution needs MCP to be useful, MCP needs skills to be safe, skills need subagents to be governed, and governance needs observability to be auditable. That's the architecture H2 will refine.— Our composite read of the H1 2026 stack

For engineering leaders, the practical synthesis is that the shape of a senior AI coding rollout is now legible. It is not a single tool. It is a multi-agent execution model, plumbed through MCP, encoded in a skill library, scoped by subagent policy, and instrumented for audit and cost. Teams that built that stack across H1 are well-positioned for H2; teams still at the "Cursor turned on for everyone" stage have a visible playbook to follow.

07 — H2 ProjectionWhat H2 probably looks like.

Projections age badly, so the calibration here is deliberate: H1 was a phase change, and H2 reads more like a consolidation phase than another inflection. The new primitives — multi-agent, MCP, skills, observability — are not going to be replaced. They are going to be refined, deepened, and integrated. The interesting H2 questions are about scope, not category.

Vendor convergence continues.Cursor, Claude Code, and Codex CLI will keep adopting each other's distinguishing features. Background agents, MCP server registries, skill-style reusable procedures, and IDE-level observability are likely to be table stakes by Q4. The differentiation will move to ergonomics, governance depth, and which model partnerships unlock the best models at the best prices.

Governance becomes the buying decision. In H1, enterprise rollouts hinged on capability and licensing. In H2, we expect them to hinge on governance — audit log depth, cost-per-team attribution, policy expressiveness, and integration with corporate SSO and DLP. Vendors that nail governance will out-sell vendors that ship one more model partnership.

Productivity ranges tighten and lift modestly. The 15–40% range in Section 04 will probably tighten to roughly 20–35% with the bottom of the range lifting as average teams adopt the patterns the top quartile already uses. Defect rates and review-burden patterns will not change much without deliberate investment in review-side tooling, which is the H2 gap we expect to see closed.

Pricing stays mostly flat at list, value rises. Continued bundling of features into Pro tiers, with Max-tier adoption expanding from senior engineers to a broader cohort as the productivity case gets clearer. We do not expect a price war; we do expect the value-per-dollar gap between laggard vendors and leaders to widen.

Conclusion · H1 2026 retrospective

H1 2026 was the year AI coding crossed the chasm — H2 will be the year it consolidates.

The half's defining shift was not the model cadence or the feature releases — it was the structural change in how engineering organisations relate to AI coding. The question stopped being whether to roll out and became how. Multi-agent execution moved from demo to default, MCP standardised the connector layer, skills and subagents became the house pattern for senior teams, and governance started arriving inside the IDE rather than bolted on after the fact.

For leaders planning H2, the calibration is straightforward. The primitives are settled — bet on multi-agent, bet on MCP, bet on skill libraries, bet on IDE-native governance. The productivity story is real but conditional — 15–40% gains require the full stack, not just a license. The pricing story is stable — Pro tiers for the team, Max tiers for the quintile doing the bulk of agentic work, enterprise governance where SOC2 and audit logs matter.

The teams that crossed the chasm in H1 spent the half building the platform. The teams that cross in H2 will spend it consolidating what worked, retiring what didn't, and tightening the production discipline that turns "measured productivity gains" into a durable engineering advantage. Either way, the conversation has permanently moved on from whether AI coding belongs in your engineering org. It does — and the only remaining question is how cleanly you run it.

AI Coding H1 2026 Retrospective: Cursor, Claude Code, Codex Data