Predictions are cheap. Specific predictions with probabilities and a named scoring metric are not — they expose the forecaster to actual grading. This sheet has thirty calls for H2 2026, each with a probability and a metric.

We split them across five clusters: model and capability (1–8), MCP and infrastructure (9–14), enterprise deployment (15–20), agency and labor (21–25), regulation and policy (26–30). Twelve are high-confidence (≥ 0.70), thirteen are moderate (0.50–0.70), five are speculative (0.30–0.50). The mix is calibrated against our Q1 scorecard, which hit 18-of-25.

Read the methodology section first if you have not used a confidence-rated forecast sheet before — the scoring discipline is what makes the predictions worth reading.

Key takeaways

01
Open-weights inference will become the default deployment for high-volume agentic workloads (0.78).DeepSeek V4 Preview and Llama 4.x close the cost gap to ~5–10× cheaper than closed-frontier rack rate without losing on most non-frontier benchmarks. Closed models route to high-stakes calls only.
02
MCP server count crosses 18,000 published servers by year-end (0.72).Q2 hit 9,400 with +58% QoQ growth. Sustaining even a slowing 30–40% QoQ rate carries the count past 18k. The growth shape is the headline; the maturation of the curated registries is what makes it usable.
03
Pilot-to-production conversion stabilizes at 35–40% for the rest of the year (0.65).Q1→Q2 went 18% → 31% as MCP standardization removed the bespoke-integration tax. The remaining gap is eval-harness rigour and stakeholder buy-in — slower to move. Plateau, not regression.
04
Mid-sized agency M&A wave hits volume in Q3 — 0.7–1.1× revenue multiples (0.62).Agentic-native digital agencies acquire traditional digital shops to apply agentic delivery to existing portfolios. The market structure rewards portfolio buyers because the productivity multiplier is largest on legacy book-of-business.
05
EU AI Act enforcement triggers visible procurement disruption in Q3 — 30%+ of mid-market RFPs require fresh AI documentation (0.74).August 2026 enforcement window forces AI inventory and risk-register documentation into procurement. Vendors without ready artefacts get pulled from short-lists.

01 — Forecast MethodHow we grade a forecast.

A forecast that cannot be wrong is not a forecast. Every prediction here meets four conditions, in order to qualify for the sheet.

Probability rating. A number on the 0.0–1.0 scale, calibrated against base rates. We use 0.30, 0.50, 0.65, 0.75, 0.85, 0.95 as anchor stops. Predictions below 0.30 are tail-risk and live on a separate sheet.
Watch-signal. The specific metric or event we are watching. "MCP gets adopted" is not a signal; "published-server count crosses 18,000 by Dec 31" is.
Year-end scoring metric. The exact thing we will measure when we publish the scorecard in January 2027. If we cannot define the measurement now, the prediction is too vague.
Failure mode disclosure. What kind of evidence would force us to mark this a miss versus a partial. Tracks the difference between "we were directionally right" and "we were factually wrong."

"The point of a forecast is to be wrong about specific things, not right about vague things."— Internal forecasting note, March 2026

02 — Model & CapabilityPredictions 1 through 8.

Open-weights becomes default for high-volume workloads (0.78)

Watch: % of mid-market enterprise inference spend on open-weights models. Q2 base: ~22%. Year-end: 50%+ on high-volume agentic workloads, with closed-frontier routed to high-stakes calls only.

0.78 · high

Two more 1M-context frontier model releases by Dec 31 (0.72)

Watch: announced model releases with 1M+ token context windows. Currently Opus 4.7 leads. We expect GPT-5.5 Pro 1M and Gemini 3.0 Ultra 1M to ship by year-end as competitive responses.

0.72 · high

MRCR-1M long-context retrieval crosses 95% on at least one closed frontier model (0.68)

Watch: Long-context retrieval scores. Opus 4.7 sits at 92.9%. Memory-architecture improvements + RLHF tuning probably get one model to 95%+ — making 1M context genuinely usable end-to-end.

0.68 · moderate

Cost-per-successful-task falls another 30–40% across blended rack rates (0.66)

Watch: Blended top-5 frontier-provider rack rate per 1M tokens. Q2 fell 42% QoQ. Open-weights pressure + cache pricing wars sustain compression, even if the slope flattens.

0.66 · moderate

First widely-deployed agentic memory standard emerges (0.55)

Watch: An open spec for cross-session agent memory adoption by ≥3 of {Anthropic, OpenAI, Google, Meta, Alibaba}. Currently fragmented; pressure mounts as multi-session agents become the production pattern.

0.55 · moderate

Tool-use success rates flatten across top 3 models (0.74)

Watch: MCP-Atlas tool-use benchmarks. Opus, GPT-5.5, and tuned DeepSeek V4 are already within 4 points of each other. We expect a continued tightening — differentiation moves to other axes.

0.74 · high

Visual / video reasoning becomes the new frontier moat (0.62)

Watch: VideoMME, MMMU-Pro, OmniBench rankings. Text-reasoning gaps are nearly closed; the next visible quality differentiator is multimodal reasoning, especially video. Expect aggressive benchmarking through H2.

0.62 · moderate

Reasoning-effort dial generalizes — 3+ providers ship a reasoning-budget API (0.70)

Watch: Reasoning-budget controls in major provider APIs. Anthropic + OpenAI ship today; expect Google + DeepSeek + Mistral by year-end as the cost-quality dial becomes table stakes.

0.70 · high

03 — MCP & InfrastructurePredictions 9 through 14.

MCP published-server count crosses 18,000 by Dec 31 (0.72)

Watch: Smithery + Glama + PulseMCP + Cloudflare AI MCP combined registry counts. Q2 hit 9,400 with +58% QoQ growth. Even at a slowing 30–40% QoQ rate, the year-end target is reached.

0.72 · high

First-party MCP servers from all top-20 SaaS companies (0.68)

Watch: First-party MCP servers from Atlassian, Salesforce, Stripe, GitHub, Linear, HubSpot, Zendesk, Notion, Slack, ServiceNow, Workday, Snowflake, Datadog, Okta, etc. Q2 had ~12 of 20; year-end 18+.

0.68 · moderate

MCP-over-Workers (Cloudflare runtime) becomes default deployment (0.58)

Watch: % of new MCP servers deployed on managed runtimes (Cloudflare AI MCP, Vercel MCP, AWS MCP) versus self-hosted. Q2 base ~24%; year-end 40%+. Managed-runtime convenience wins on operational cost.

0.58 · moderate

MCP bot-id and authentication standard ratified (0.50)

Watch: An IETF-style standard for MCP server-to-agent authentication, with 3+ implementations. Currently fragmented; enterprise security teams are forcing convergence. Coin-flip whether the 2026 cycle ships.

0.50 · moderate

Agent observability platforms consolidate (0.62)

Watch: M&A activity in LangSmith, LangFuse, Arize, Braintrust, Helicone. Procurement pressure favors fewer-vendor solutions. Expect 1–2 acquisitions or strategic partnerships by year-end.

0.62 · moderate

First $100M ARR agent-ops vendor (0.45)

Watch: Public revenue disclosures from LangSmith / LangFuse / Arize / Braintrust / Vellum / Restate / Pleck. The category is real but young. $100M ARR is a stretch by Dec 31; more likely Q2 2027.

0.45 · speculative

04 — Enterprise DeploymentPredictions 15 through 20.

Pilot-to-production conversion stabilizes at 35–40% (0.65)

Watch: Quarterly survey conversion rate. Q2 base 31%. Easy wins (MCP standardization, cheap inference) banked; remaining gaps (eval rigour, stakeholder buy-in) move slower. Plateau, not regression.

0.65 · moderate

Mid-market agentic-AI deployment crosses 80% (0.66)

Watch: % of mid-market enterprises (250–2500 FTE) reporting at least one production agentic-AI workflow. Q2 base 67%; year-end 80%+. The remaining 20% are concentrated in regulated verticals.

0.66 · moderate

Agentic-content production hits 50% of B2B blog output (0.62)

Watch: % of B2B blog content drafted by agentic workflows (with human review). Spot survey of agency clients suggests Q2 was 28%; AI Overview demand pressure pushes the rate to 50% by year-end.

0.62 · moderate

Enterprise agent-eval becomes a buying axis in RFPs (0.71)

Watch: % of mid-market AI vendor RFPs requiring documented eval methodology. Q1 base ~12%; year-end 35%+. EU AI Act and NIST guidance jointly force the discipline into procurement.

0.71 · high

Multi-vendor model routing becomes default architecture (0.78)

Watch: % of new agentic deployments routing across 2+ frontier providers. Q2 base ~31%; year-end 60%+. Single-vendor commitment costs increasingly visible — leader-by-axis rotates monthly.

0.78 · high

Voice-first agent deployments cross 10% of customer-service workloads (0.55)

Watch: Customer-service agentic deployments using voice as primary interface. Q2 base ~3.5%. Voice models matured; latency dropped below 250ms. Adoption depends on regulator-approved patterns.

0.55 · moderate

Why pilot-to-prod stabilizes rather than accelerates

The Q1→Q2 jump from 18% to 31% banked the easy wins — MCP standardization removed the bespoke integration tax, and inference cost compression made business cases pencil out. The remaining friction is slower: eval-harness rigour, stakeholder change-management, and integration with legacy systems. We expect the curve to plateau at 35–40% through year-end before resuming growth in 2027.

05 — Agency & LaborPredictions 21 through 25.

Agency M&A wave reaches 8–14 deals at 0.7–1.1× revenue (0.62)

Watch: Public M&A announcements involving agentic-native digital agencies acquiring traditional digital shops. Q2 had 3; expect 8–14 cumulative by year-end as PE-backed agencies build portfolios.

0.62 · moderate

Agency entry-level production roles fall 30%+ YoY (0.74)

Watch: SoDA + 4A's hiring panel net new entry-to-mid production role count. Q2 was -24% QoQ. Year-end YoY -30%+ as agentic delivery replaces drafting, formatting, scheduling, and version control work.

0.74 · high

Agentic engineering becomes top-3 hiring priority for agencies 250+ FTE (0.70)

Watch: Job-posting analytics from major agency networks. Q2 +34% QoQ growth. Year-end: agentic engineering ranks alongside senior strategy and creative direction in priority surveys.

0.70 · high

First Top-50 agency rebrands as 'AI-native' or equivalent (0.55)

Watch: Adweek / Digiday / Marketing Brew agency rebrand announcements. Major holding-company subsidiary or independent Top-50 leads with explicit positioning. The signal is the marketing, not just the tech adoption.

0.55 · moderate

First public agency union response to agentic AI rollout (0.42)

Watch: Formal union actions or organizing campaigns at major agencies tied to AI-driven layoffs. WGA/SAG playbook informs creative-side organizing. Possible by year-end; more likely 2027.

0.42 · speculative

06 — Regulation & PolicyPredictions 26 through 30.

EU AI Act enforcement triggers procurement disruption (0.74)

Watch: % of mid-market RFPs requiring fresh AI documentation. August enforcement window forces AI inventory + risk register into procurement. Year-end: 30%+ of RFPs require artefacts that did not exist in Q2.

0.74 · high

First major FTC action against AI-generated marketing claims ($50M+) (0.58)

Watch: Federal AI-marketing enforcement settlements. Q2 had $24M cumulative across smaller actions. Year-end likely brings a single major action establishing case-law for AI-generated claim accuracy.

0.58 · moderate

AI-content disclosure rules pass in 5+ US states (0.62)

Watch: State legislatures passing AI-content disclosure mandates for political ads, marketing, or product reviews. Q2 had 2 states (CA, NY); year-end 5+ as election-cycle activity intensifies.

0.62 · moderate

China releases competitive open-weights model that outperforms DeepSeek V4 (0.70)

Watch: Open-weights releases from Alibaba (Qwen 4), DeepSeek (V5), Baidu (Ernie 5), or others. The open-weights pace from China outpaces US releases on the same dimension. Year-end likely brings 1–2 new leaders.

0.70 · high

Major nation-state AI sovereignty incident affects procurement (0.45)

Watch: Public incidents involving state-level AI access restrictions, model bans, or cross-border data-residency disputes. Possible flashpoints: India, Brazil, EU. Speculative — geopolitics drives.

0.45 · speculative

The August enforcement window

Of the thirty predictions, the EU AI Act August enforcement signal is the one we are most confident about and the one most enterprises appear least prepared for. Two of three Q2 client engagements found AI-system inventories, risk registers, and fundamental-rights impact assessments either undocumented or incomplete. If you are reading this and your program does not have those artefacts, Q3 is remediation quarter. Plan accordingly.

07 — How We Will ScoreThe grading rules.

We publish a scorecard the second week of January 2027. Each prediction is scored hit / miss / partial against the watch-signal and year-end metric specified above. Probability ratings are calibrated against the hit rate — a forecaster who says 0.70 should be right roughly 70% of the time across that band. Mis-calibration shows up as systematic over- or under-prediction by band.

Hit

Watch-signal cleared

Year-end metric meets prediction threshold

Counted as a hit if the specific metric defined in the prediction is met or exceeded. We use disclosed third-party data (CB Insights, registry snapshots, survey results) wherever possible to avoid scoring our own reports against our own forecast.

Counts toward hit rate

Partial

Directionally right, magnitude off

Trend correct; threshold missed

Counted as a half-hit when the direction is correct (e.g., MCP server count grew) but the specific number was missed (e.g., reached 14k instead of 18k). Partial credit forces probabilistic precision, not vague directional bets.

0.5 of a hit

Miss

Direction wrong or signal not visible

Year-end metric did not meet threshold

Counted as a miss if the direction was wrong, the specific signal did not occur, or the year-end measurement is below the threshold. We disclose the actual measurement so readers can pressure-test our scoring.

0 of a hit

08 — ConclusionThe shape of the back half.

The H2 2026 forecast · 30 calls · scored January 2027

The cluster matters more than any single prediction.

Read the predictions individually if you want to grade us; read them collectively if you want to plan. The cluster shape says four things at once. The model layer is approaching commodity faster than anyone's pricing model assumed. MCP infrastructure has crossed the noise floor and is now a procurement axis. Enterprise adoption is real but is plateauing because the remaining friction is organizational, not technical. And the regulatory clock is running.

That cluster shape is what should drive H2 budget choices. If you're running a business that depends on agentic AI, plan against multi-vendor routing as the default, MCP adoption as non-negotiable, eval rigour as a procurement gate, and August 2026 as the EU compliance deadline. Each of those is a high-confidence call individually; together they form the operating environment.

We will publish the scorecard in January 2027. Bookmark this page and we will update with hit / miss / partial against each prediction.

30 Agentic AI Predictions for H2 2026

01 — Forecast MethodHow we grade a forecast.

02 — Model & CapabilityPredictions 1 through 8.

Open-weights becomes default for high-volume workloads (0.78)

Two more 1M-context frontier model releases by Dec 31 (0.72)

MRCR-1M long-context retrieval crosses 95% on at least one closed frontier model (0.68)

Cost-per-successful-task falls another 30–40% across blended rack rates (0.66)

First widely-deployed agentic memory standard emerges (0.55)

Tool-use success rates flatten across top 3 models (0.74)

Visual / video reasoning becomes the new frontier moat (0.62)

Reasoning-effort dial generalizes — 3+ providers ship a reasoning-budget API (0.70)

03 — MCP & InfrastructurePredictions 9 through 14.

MCP published-server count crosses 18,000 by Dec 31 (0.72)

First-party MCP servers from all top-20 SaaS companies (0.68)

MCP-over-Workers (Cloudflare runtime) becomes default deployment (0.58)

MCP bot-id and authentication standard ratified (0.50)

Agent observability platforms consolidate (0.62)

First $100M ARR agent-ops vendor (0.45)

04 — Enterprise DeploymentPredictions 15 through 20.

Pilot-to-production conversion stabilizes at 35–40% (0.65)

Mid-market agentic-AI deployment crosses 80% (0.66)

Agentic-content production hits 50% of B2B blog output (0.62)

Enterprise agent-eval becomes a buying axis in RFPs (0.71)

Multi-vendor model routing becomes default architecture (0.78)

Voice-first agent deployments cross 10% of customer-service workloads (0.55)

05 — Agency & LaborPredictions 21 through 25.

Agency M&A wave reaches 8–14 deals at 0.7–1.1× revenue (0.62)

Agency entry-level production roles fall 30%+ YoY (0.74)

Agentic engineering becomes top-3 hiring priority for agencies 250+ FTE (0.70)

First Top-50 agency rebrands as 'AI-native' or equivalent (0.55)

First public agency union response to agentic AI rollout (0.42)

06 — Regulation & PolicyPredictions 26 through 30.

EU AI Act enforcement triggers procurement disruption (0.74)

First major FTC action against AI-generated marketing claims ($50M+) (0.58)

AI-content disclosure rules pass in 5+ US states (0.62)

China releases competitive open-weights model that outperforms DeepSeek V4 (0.70)

Major nation-state AI sovereignty incident affects procurement (0.45)

07 — How We Will ScoreThe grading rules.

Watch-signal cleared

Directionally right, magnitude off

Direction wrong or signal not visible

08 — ConclusionThe shape of the back half.

The cluster matters more than any single prediction.

Plan against the cluster, not the headline.

Forecast-driven planning

The questions readers ask every quarter.

Continue exploring the agentic AI shift.

The MCP Adoption Wave: 6-Month Forecast Q2–Q3 2026

State of Agentic AI Q2 2026: The Quarterly Report

Alibaba's Agent-Native Cloud: AgentLoop and AgentTeams

Grok 4.3 on Amazon Bedrock: xAI Goes Enterprise 2026