Predictions are cheap. Specific predictions with probabilities and a named scoring metric are not — they expose the forecaster to actual grading. This sheet has thirty calls for H2 2026, each with a probability and a metric.
We split them across five clusters: model and capability (1–8), MCP and infrastructure (9–14), enterprise deployment (15–20), agency and labor (21–25), regulation and policy (26–30). Twelve are high-confidence (≥ 0.70), thirteen are moderate (0.50–0.70), five are speculative (0.30–0.50). The mix is calibrated against our Q1 scorecard, which hit 18-of-25.
Read the methodology section first if you have not used a confidence-rated forecast sheet before — the scoring discipline is what makes the predictions worth reading.
- 01Open-weights inference will become the default deployment for high-volume agentic workloads (0.78).DeepSeek V4 Preview and Llama 4.x close the cost gap to ~5–10× cheaper than closed-frontier rack rate without losing on most non-frontier benchmarks. Closed models route to high-stakes calls only.
- 02MCP server count crosses 18,000 published servers by year-end (0.72).Q2 hit 9,400 with +58% QoQ growth. Sustaining even a slowing 30–40% QoQ rate carries the count past 18k. The growth shape is the headline; the maturation of the curated registries is what makes it usable.
- 03Pilot-to-production conversion stabilizes at 35–40% for the rest of the year (0.65).Q1→Q2 went 18% → 31% as MCP standardization removed the bespoke-integration tax. The remaining gap is eval-harness rigour and stakeholder buy-in — slower to move. Plateau, not regression.
- 04Mid-sized agency M&A wave hits volume in Q3 — 0.7–1.1× revenue multiples (0.62).Agentic-native digital agencies acquire traditional digital shops to apply agentic delivery to existing portfolios. The market structure rewards portfolio buyers because the productivity multiplier is largest on legacy book-of-business.
- 05EU AI Act enforcement triggers visible procurement disruption in Q3 — 30%+ of mid-market RFPs require fresh AI documentation (0.74).August 2026 enforcement window forces AI inventory and risk-register documentation into procurement. Vendors without ready artefacts get pulled from short-lists.
01 — Forecast MethodHow we grade a forecast.
A forecast that cannot be wrong is not a forecast. Every prediction here meets four conditions, in order to qualify for the sheet.
- Probability rating. A number on the 0.0–1.0 scale, calibrated against base rates. We use 0.30, 0.50, 0.65, 0.75, 0.85, 0.95 as anchor stops. Predictions below 0.30 are tail-risk and live on a separate sheet.
- Watch-signal.The specific metric or event we are watching. "MCP gets adopted" is not a signal; "published-server count crosses 18,000 by Dec 31" is.
- Year-end scoring metric. The exact thing we will measure when we publish the scorecard in January 2027. If we cannot define the measurement now, the prediction is too vague.
- Failure mode disclosure.What kind of evidence would force us to mark this a miss versus a partial. Tracks the difference between "we were directionally right" and "we were factually wrong."
"The point of a forecast is to be wrong about specific things, not right about vague things."— Internal forecasting note, March 2026
02 — Model & CapabilityPredictions 1 through 8.
Open-weights becomes default for high-volume workloads (0.78)
Watch: % of mid-market enterprise inference spend on open-weights models. Q2 base: ~22%. Year-end: 50%+ on high-volume agentic workloads, with closed-frontier routed to high-stakes calls only.
0.78 · highTwo more 1M-context frontier model releases by Dec 31 (0.72)
Watch: announced model releases with 1M+ token context windows. Currently Opus 4.7 leads. We expect GPT-5.5 Pro 1M and Gemini 3.0 Ultra 1M to ship by year-end as competitive responses.
0.72 · highMRCR-1M long-context retrieval crosses 95% on at least one closed frontier model (0.68)
Watch: Long-context retrieval scores. Opus 4.7 sits at 92.9%. Memory-architecture improvements + RLHF tuning probably get one model to 95%+ — making 1M context genuinely usable end-to-end.
0.68 · moderateCost-per-successful-task falls another 30–40% across blended rack rates (0.66)
Watch: Blended top-5 frontier-provider rack rate per 1M tokens. Q2 fell 42% QoQ. Open-weights pressure + cache pricing wars sustain compression, even if the slope flattens.
0.66 · moderateFirst widely-deployed agentic memory standard emerges (0.55)
Watch: An open spec for cross-session agent memory adoption by ≥3 of {Anthropic, OpenAI, Google, Meta, Alibaba}. Currently fragmented; pressure mounts as multi-session agents become the production pattern.
0.55 · moderateTool-use success rates flatten across top 3 models (0.74)
Watch: MCP-Atlas tool-use benchmarks. Opus, GPT-5.5, and tuned DeepSeek V4 are already within 4 points of each other. We expect a continued tightening — differentiation moves to other axes.
0.74 · highVisual / video reasoning becomes the new frontier moat (0.62)
Watch: VideoMME, MMMU-Pro, OmniBench rankings. Text-reasoning gaps are nearly closed; the next visible quality differentiator is multimodal reasoning, especially video. Expect aggressive benchmarking through H2.
0.62 · moderateReasoning-effort dial generalizes — 3+ providers ship a reasoning-budget API (0.70)
Watch: Reasoning-budget controls in major provider APIs. Anthropic + OpenAI ship today; expect Google + DeepSeek + Mistral by year-end as the cost-quality dial becomes table stakes.
0.70 · high03 — MCP & InfrastructurePredictions 9 through 14.
MCP published-server count crosses 18,000 by Dec 31 (0.72)
Watch: Smithery + Glama + PulseMCP + Cloudflare AI MCP combined registry counts. Q2 hit 9,400 with +58% QoQ growth. Even at a slowing 30–40% QoQ rate, the year-end target is reached.
0.72 · highFirst-party MCP servers from all top-20 SaaS companies (0.68)
Watch: First-party MCP servers from Atlassian, Salesforce, Stripe, GitHub, Linear, HubSpot, Zendesk, Notion, Slack, ServiceNow, Workday, Snowflake, Datadog, Okta, etc. Q2 had ~12 of 20; year-end 18+.
0.68 · moderateMCP-over-Workers (Cloudflare runtime) becomes default deployment (0.58)
Watch: % of new MCP servers deployed on managed runtimes (Cloudflare AI MCP, Vercel MCP, AWS MCP) versus self-hosted. Q2 base ~24%; year-end 40%+. Managed-runtime convenience wins on operational cost.
0.58 · moderateMCP bot-id and authentication standard ratified (0.50)
Watch: An IETF-style standard for MCP server-to-agent authentication, with 3+ implementations. Currently fragmented; enterprise security teams are forcing convergence. Coin-flip whether the 2026 cycle ships.
0.50 · moderateAgent observability platforms consolidate (0.62)
Watch: M&A activity in LangSmith, LangFuse, Arize, Braintrust, Helicone. Procurement pressure favors fewer-vendor solutions. Expect 1–2 acquisitions or strategic partnerships by year-end.
0.62 · moderateFirst $100M ARR agent-ops vendor (0.45)
Watch: Public revenue disclosures from LangSmith / LangFuse / Arize / Braintrust / Vellum / Restate / Pleck. The category is real but young. $100M ARR is a stretch by Dec 31; more likely Q2 2027.
0.45 · speculative04 — Enterprise DeploymentPredictions 15 through 20.
Pilot-to-production conversion stabilizes at 35–40% (0.65)
Watch: Quarterly survey conversion rate. Q2 base 31%. Easy wins (MCP standardization, cheap inference) banked; remaining gaps (eval rigour, stakeholder buy-in) move slower. Plateau, not regression.
0.65 · moderateMid-market agentic-AI deployment crosses 80% (0.66)
Watch: % of mid-market enterprises (250–2500 FTE) reporting at least one production agentic-AI workflow. Q2 base 67%; year-end 80%+. The remaining 20% are concentrated in regulated verticals.
0.66 · moderateAgentic-content production hits 50% of B2B blog output (0.62)
Watch: % of B2B blog content drafted by agentic workflows (with human review). Spot survey of agency clients suggests Q2 was 28%; AI Overview demand pressure pushes the rate to 50% by year-end.
0.62 · moderateEnterprise agent-eval becomes a buying axis in RFPs (0.71)
Watch: % of mid-market AI vendor RFPs requiring documented eval methodology. Q1 base ~12%; year-end 35%+. EU AI Act and NIST guidance jointly force the discipline into procurement.
0.71 · highMulti-vendor model routing becomes default architecture (0.78)
Watch: % of new agentic deployments routing across 2+ frontier providers. Q2 base ~31%; year-end 60%+. Single-vendor commitment costs increasingly visible — leader-by-axis rotates monthly.
0.78 · highVoice-first agent deployments cross 10% of customer-service workloads (0.55)
Watch: Customer-service agentic deployments using voice as primary interface. Q2 base ~3.5%. Voice models matured; latency dropped below 250ms. Adoption depends on regulator-approved patterns.
0.55 · moderate05 — Agency & LaborPredictions 21 through 25.
Agency M&A wave reaches 8–14 deals at 0.7–1.1× revenue (0.62)
Watch: Public M&A announcements involving agentic-native digital agencies acquiring traditional digital shops. Q2 had 3; expect 8–14 cumulative by year-end as PE-backed agencies build portfolios.
0.62 · moderateAgency entry-level production roles fall 30%+ YoY (0.74)
Watch: SoDA + 4A's hiring panel net new entry-to-mid production role count. Q2 was -24% QoQ. Year-end YoY -30%+ as agentic delivery replaces drafting, formatting, scheduling, and version control work.
0.74 · highAgentic engineering becomes top-3 hiring priority for agencies 250+ FTE (0.70)
Watch: Job-posting analytics from major agency networks. Q2 +34% QoQ growth. Year-end: agentic engineering ranks alongside senior strategy and creative direction in priority surveys.
0.70 · highFirst Top-50 agency rebrands as 'AI-native' or equivalent (0.55)
Watch: Adweek / Digiday / Marketing Brew agency rebrand announcements. Major holding-company subsidiary or independent Top-50 leads with explicit positioning. The signal is the marketing, not just the tech adoption.
0.55 · moderateFirst public agency union response to agentic AI rollout (0.42)
Watch: Formal union actions or organizing campaigns at major agencies tied to AI-driven layoffs. WGA/SAG playbook informs creative-side organizing. Possible by year-end; more likely 2027.
0.42 · speculative06 — Regulation & PolicyPredictions 26 through 30.
EU AI Act enforcement triggers procurement disruption (0.74)
Watch: % of mid-market RFPs requiring fresh AI documentation. August enforcement window forces AI inventory + risk register into procurement. Year-end: 30%+ of RFPs require artefacts that did not exist in Q2.
0.74 · highFirst major FTC action against AI-generated marketing claims ($50M+) (0.58)
Watch: Federal AI-marketing enforcement settlements. Q2 had $24M cumulative across smaller actions. Year-end likely brings a single major action establishing case-law for AI-generated claim accuracy.
0.58 · moderateAI-content disclosure rules pass in 5+ US states (0.62)
Watch: State legislatures passing AI-content disclosure mandates for political ads, marketing, or product reviews. Q2 had 2 states (CA, NY); year-end 5+ as election-cycle activity intensifies.
0.62 · moderateChina releases competitive open-weights model that outperforms DeepSeek V4 (0.70)
Watch: Open-weights releases from Alibaba (Qwen 4), DeepSeek (V5), Baidu (Ernie 5), or others. The open-weights pace from China outpaces US releases on the same dimension. Year-end likely brings 1–2 new leaders.
0.70 · highMajor nation-state AI sovereignty incident affects procurement (0.45)
Watch: Public incidents involving state-level AI access restrictions, model bans, or cross-border data-residency disputes. Possible flashpoints: India, Brazil, EU. Speculative — geopolitics drives.
0.45 · speculative07 — How We Will ScoreThe grading rules.
We publish a scorecard the second week of January 2027. Each prediction is scored hit / miss / partial against the watch-signal and year-end metric specified above. Probability ratings are calibrated against the hit rate — a forecaster who says 0.70 should be right roughly 70% of the time across that band. Mis-calibration shows up as systematic over- or under-prediction by band.
Watch-signal cleared
Year-end metric meets prediction thresholdCounted as a hit if the specific metric defined in the prediction is met or exceeded. We use disclosed third-party data (CB Insights, registry snapshots, survey results) wherever possible to avoid scoring our own reports against our own forecast.
Counts toward hit rateDirectionally right, magnitude off
Trend correct; threshold missedCounted as a half-hit when the direction is correct (e.g., MCP server count grew) but the specific number was missed (e.g., reached 14k instead of 18k). Partial credit forces probabilistic precision, not vague directional bets.
0.5 of a hitDirection wrong or signal not visible
Year-end metric did not meet thresholdCounted as a miss if the direction was wrong, the specific signal did not occur, or the year-end measurement is below the threshold. We disclose the actual measurement so readers can pressure-test our scoring.
0 of a hit08 — ConclusionThe shape of the back half.
The cluster matters more than any single prediction.
Read the predictions individually if you want to grade us; read them collectively if you want to plan. The cluster shape says four things at once. The model layer is approaching commodity faster than anyone's pricing model assumed. MCP infrastructure has crossed the noise floor and is now a procurement axis. Enterprise adoption is real but is plateauing because the remaining friction is organizational, not technical. And the regulatory clock is running.
That cluster shape is what should drive H2 budget choices. If you're running a business that depends on agentic AI, plan against multi-vendor routing as the default, MCP adoption as non-negotiable, eval rigour as a procurement gate, and August 2026 as the EU compliance deadline. Each of those is a high-confidence call individually; together they form the operating environment.
We will publish the scorecard in January 2027. Bookmark this page and we will update with hit / miss / partial against each prediction.