AI DevelopmentIndustry Guide15 min readPublished May 24, 2026

The definitive May 2026 model tracker — pricing, specs, and availability for every launch.

May 2026's release wave: every launch, priced and ranked.

Ten-plus frontier launches in 22 calendar days: Copilot Studio computer use GA, Grok Build CLI, Composer 2.5, Gemini 3.5 Flash, Gemini Omni Flash, Antigravity 2.0, Anthropic self-hosted sandboxes, and more. Every launch with pricing, benchmarks, and availability — all in one tracker.

DA
Digital Applied Team
Senior strategists · Published May 24, 2026
PublishedMay 24, 2026
Read time15 min
Sources25
Frontier launches
10+
May 13–23, 2026
All vendor-confirmed
Cheapest new model
$0.10
Per task (Composer 2.5 Std)
100K in / 20K out
Google I/O headline
3.2Q
Tokens/mo (7× YoY)
Sundar Pichai, May 19
Pricing events
18
Apr 30 → Jun 1 window
Three expire ≤8 days

May 2026 delivered more frontier-model launches in a single month than any prior period in 2026 — Gemini 3.5 Flash, Composer 2.5, Grok Build 0.1, Gemini Omni Flash, Antigravity 2.0, Managed Agents in the Gemini API, Anthropic self-hosted sandboxes, MCP tunnels, Microsoft Copilot Studio computer-use GA, and GLM-5.1 — all within 22 calendar days. This tracker assembles every confirmed launch with verified pricing, benchmark anchors, and availability data, sourced directly from vendor documentation retrieved on May 24, 2026.

The stakes behind the velocity: three separate pricing promos expire within eight days of publication. GitHub Copilot transitions from request-based to AI-credit billing on June 1 — and per-credit dollar pricing has not been published. Gemini 3.5 Pro is committed for June 2026 per Sundar Pichai's I/O stage remarks. The decisions teams make about model selection this week are pricing decisions, not capability decisions — and the promotional-rate window is closing.

This post covers: the master May 2026 model-launch tracker (Table 1), the 22-day pricing-event chronology (Table 2), the cost-per-task comparison matrix (Table 3), benchmark normalization rebased to Opus 4.7 = 100 (Table 4), the intro-promo cliff pattern as a recurring industry dynamic, the enterprise-deployment wave that completed the Big Four AI-deployment matrix, and the June outlook. For the Gemini 3.5 Flash deep-dive, see our Gemini 3.5 Flash benchmarks and API guide. For the comparable prior-month format, see our April 2026 coding-assistants tracker.

Key takeaways
  1. 01
    Ten-plus launches in 22 days — the densest release cluster of 2026.Between May 13 and May 23, 2026, the industry shipped Copilot Studio CUA GA (May 13), Grok Build CLI (May 14), Composer 2.5 (May 18), Gemini 3.5 Flash + Omni Flash + Spark + Antigravity 2.0 + Managed Agents (May 19), Anthropic self-hosted sandboxes + MCP tunnels (May 19), GitHub Copilot Gemini cull (May 20), Google May core update (May 21), EY × Microsoft $1B (May 21), and Antigravity IDE patch (May 23). No prior month in 2026 compressed this many frontier decisions into one window.
  2. 02
    Composer 2.5 is the cheapest frontier coding intelligence available — at $0.10/task.At $0.50 input / $2.50 output per Mtok, Composer 2.5 Standard delivers an effective $0.10 per task (100K input / 20K output, uncached) — 10× cheaper than Opus 4.7 Standard ($1.00/task) and roughly a third of Gemini 3.5 Flash ($0.33/task). The catch: Composer 2.5 is Cursor-IDE-only at launch with no public API, making it the only frontier coding model where the only way to access the cheapest rate is to use the vendor's own IDE. Composer 2.5 also does not publish SWE-Bench Verified scores — only vendor-controlled CursorBench v3.1 numbers.
  3. 03
    Gemini 3.5 Flash inverted the usual Pro-first Google cadence.Prior I/Os shipped a Pro model first (Gemini 1.5 Pro, Gemini 2.0 Pro) and Flash variants later. At I/O 2026, Flash led the keynote and reached GA the same day (May 19), while Pro was held for June with Pichai's 'rolling it out next month' commitment. This inversion — Flash-first, Pro-after — is the single biggest strategic signal of I/O 2026 and suggests Google is explicitly positioning smaller, faster, cheaper models as the primary agent-era interface rather than the flagship Pro tier.
  4. 04
    Three intro promos are expiring within days of publication.Composer 2.5's first-week 2× promo ends May 25 (tomorrow). Codex Pro's 2× promo ends May 31. Opus 4.7's Copilot promo already ended April 30 — the 7.5× multiplier doubled to 15× on that date. GitHub Copilot transitions to usage-based billing June 1 with no per-credit dollar price published. Introductory rates are now the industry default; teams that don't lock in workflows before expiry should budget for 2–3× rate increases within 90 days of every major launch.
  5. 05
    The Big Four enterprise-AI deployment matrix completed on May 21.The May 4–21 wave completed the Big Four audit-firm AI matrix: PwC + Anthropic 30K seats (May 14), KPMG + Anthropic 276K seats (May 19), and EY × Microsoft $1B direct deal (May 21) joined Deloitte's Microsoft-adjacent posture. Three of the four Big Four firms now run on Claude. The enterprise signal is unambiguous: large-organization AI deployment has shifted from pilots to volume contracts at a scale that mirrors software-licensing deal structures from the late 2010s.

01May 2026 Master TrackerEvery May 2026 launch — priced, benchmarked, and sourced.

The table below is the canonical May 2026 model-release record. All pricing figures are live-verified from vendor documentation as of 2026-05-24 — specifically ai.google.dev/gemini-api/docs/pricing for Gemini rates and docs.x.ai/docs/models for Grok rates. Benchmark anchors marked ⚠️ are vendor-controlled or vendor-published — cross-variant comparison with SWE-Bench Verified is not appropriate for these rows. Rows for April 2026 launches are included as context anchors. For the full April 2026 launch record, see our March 2026 release-wave tracker and Q2 2026 release-velocity index.

May 13
Microsoft Copilot Studio Computer Use — GA all commercial geos

Underlying model: Claude Sonnet 4.5 (beta). Expanded to all commercial geographies in Microsoft Power Platform. Pricing: bundled in Copilot Studio messages. Context: N/A (computer-use surface, not token API). Source: Microsoft TechCommunity (MustaphaLazrek, May 13, 2026). Note: Day 07 brief incorrectly listed May 22 — TechCommunity primary date is May 13.

All Power Platform CCA tier
May 14
xAI Grok Build CLI (`grok-build-0.1`) — Early beta

Pricing: $1.00 input / $2.00 output per Mtok (live-verified docs.x.ai/docs/models, 2026-05-24). Context: 256K tokens. Benchmark: 70.8% SWE-Bench Verified ⚠️ xAI internal harness. Access: SuperGrok Heavy required ($99/mo intro for 6 months, then $300/mo list). Sub-agent parallelism: up to 8 concurrent agents. Source: xAI News + Bloomberg, May 14, 2026.

$99/mo intro → $300/mo list
May 18
Cursor Composer 2.5 Standard — GA (IDE-only)

Pricing: $0.50 input / $2.50 output per Mtok. Context: N/A (Cursor-IDE-only, no public API). Benchmarks: 63.2% CursorBench v3.1 ⚠️ vendor / 79.8% SWE-Bench Multilingual ⚠️ vendor / 69.3% Terminal-Bench 2.0 ⚠️ vendor. Base model: Moonshot Kimi K2.5 (1.04T / 32B active MoE). 25× more synthetic tasks vs Composer 2. First-week 2× promo ends May 25. Source: cursor.com/blog/composer-2-5.

Cursor IDE only — no external API
May 18
Cursor Composer 2.5 Fast — GA (IDE-only)

Pricing: $3.00 input / $15.00 output per Mtok. Same intelligence as Standard; 2× faster. Context: N/A. Note: Fast tier doubled vs Composer 2 Fast ($1.50/$7.50 → $3.00/$15.00) — 100% price increase that most coverage missed. Source: cursor.com/blog/composer-2-5.

2× faster, 6× standard price
May 19
Google Gemini 3.5 Flash (`gemini-3.5-flash`) — GA

Pricing: $1.50 input / $9.00 output per Mtok; cached input $0.15; cache storage $1.00/1M tokens/hr (live-verified ai.google.dev/gemini-api/docs/pricing). Context: 1.05M tokens. Benchmarks: Terminal-Bench 2.1 76.2% ⚠️ Google / MCP Atlas 83.6% ⚠️ Google / GDPval-AA 1656 Elo. Vendor claim: 4× faster tokens/sec vs 'other frontier models.' Replaces gemini-3-flash-preview as default in Gemini app and AI Mode. Source: Google I/O 2026 100 Things + ai.google.dev.

Default in Gemini app + AI Mode
May 19
Google Gemini Omni Flash — GA

First model in 'Omni' family. Generates samples in any output modality (text, image, video, audio) from any input modality. All Omni output watermarked by SynthID. Available to AI Plus / Pro / Ultra subscribers and via YouTube Shorts Remix (free). Pricing: N/A (included in subscription tiers). Source: Sundar Pichai I/O 2026 keynote, May 19, 2026.

AI Plus/Pro/Ultra + YouTube free
May 19
Google Antigravity 2.0 — GA

Standalone desktop app + CLI (agy) + SDK (Python/TS/Go) + Managed Agents API + Enterprise Agent Platform. Default model: Gemini 3.5 Flash. Co-developed using Antigravity itself. Multi-model support includes Claude Sonnet 4.5 and GPT-OSS. Launch was rocky — IDE gutted on May 19, restore patch + quota reset May 23. Source: TechCrunch + PiunikaWeb, May 2026. Note: distinct from Antigravity 1.0 (Nov 18, 2025 VS Code fork).

Free / $20 Pro / $100 / $200 Ultra
May 19
Google Managed Agents in Gemini API — GA

Single API call provisions a remote Linux execution environment for agent reasoning, code execution in sandbox, and web browsing. Default model: Gemini 3.5 Flash ($1.50/$9.00). Context: 1.05M tokens (inherits Flash). Source: Google I/O 2026 100 Things items 60–61, May 19, 2026.

GA via Gemini API + AI Studio
May 19
Anthropic Self-Hosted Sandboxes — Public beta

Enterprise agents run inside customer VPC; both sandbox execution and tool services run within enterprise security boundaries. Day-one managed sandbox providers: Cloudflare, Daytona, Modal, Vercel. Pricing: model-dependent (not sandbox-specific rate). Announced at Code with Claude London (May 19–20, 2026) as counter-programming to I/O. Source: Anthropic, claude.com/blog/claude-managed-agents-updates.

Cloudflare / Daytona / Modal / Vercel
May 19
Anthropic MCP Tunnels — Research preview

Agents reach private MCP servers via an outbound-only gateway — no inbound firewall rules required. Pricing: N/A (platform capability). Source: Anthropic, claude.com/blog/claude-managed-agents-updates, May 19, 2026.

Private-network MCP gateway
May (mid)
Z.ai GLM-5.1 — OSS release

754B parameters with FP8 variant. Open-source release on HuggingFace (zai-org/GLM-5.1). BYOK pricing. Source: fact-pack §1.1; HuggingFace model card. Context: N/D (open-weight). Benchmark: N/D at time of publication.

OSS — BYOK on HuggingFace

April 2026 context anchors (included for tracker completeness): Claude Opus 4.7 (April 16, $5/$25 per Mtok, 87.6% SWE-Bench Verified), Grok 4.3 (April 17, $1.25/$2.50 per Mtok, 1M context), GPT-5.5 (April 23, $5/$30 per Mtok with 272K long-context surcharge), DeepSeek V4 Preview (April 24, 1M context, preview status). For the Opus 4.7 deep-dive, see our Claude Opus 4.7 complete guide and the Opus 4.7 1M-context cost strategy guide. For the GPT-5.5 surcharge mechanics, see our GPT-5.5 complete guide.

0222-Day ChronologyWhat changed in 22 days — the complete pricing-event timeline.

Seven separate pricing events and ten-plus model launches compressed into 22 calendar days. The table below is the complete record from the April 30 Opus 4.7 Copilot promo expiry through the June 1 GitHub Copilot billing transition — every date a pricing or availability signal changed. No competitor publication has assembled the May 2026 pricing-event chronology in one place; this becomes the evergreen reference for retrospectives and Q2 2026 wrap-ups.

Apr 30
Opus 4.7 Copilot promo expires
Anthropic / GitHub

7.5× → 15× premium-request multiplier. Opus 4.7 on Copilot Pro now costs 15× the standard request allowance. 300 Pro requests = ~20 effective Opus prompts. Source: GitHub Changelog, April 16, 2026 (anchors the promo expiry date).

15× multiplier from May 1
May 6
Claude Code 5-hour limits doubled
Anthropic

Doubling of Claude Code's five-hour rate limits for Pro, Max, Team, and seat-based Enterprise plans. Announced alongside Anthropic–SpaceX compute deal. Source: Anthropic news, anthropic.com/news/higher-limits-spacex, May 6, 2026.

2× 5-hr limits for all paid tiers
May 13
Copilot Studio CUA GA globally
Microsoft

Computer-use agents GA in all commercial geographies in Power Platform. Underlying model: Claude Sonnet 4.5 (beta). Expands the enterprise surface for computer-use automation across 185+ countries. Source: Microsoft TechCommunity, MustaphaLazrek, May 13, 2026.

All commercial geos
May 17
GitHub Business/Enterprise base → GPT-5.3-Codex
GitHub

Default model swap for all Copilot Business and Enterprise accounts. GPT-5.3-Codex pricing: $1.75 input / $14.00 output per Mtok; cached input $0.175. Source: GitHub Changelog, May 17, 2026.

Default model swap for Business/Enterprise
May 18
Composer 2.5 launch + first-week 2×
Cursor

Standard: $0.50/$2.50. Fast: $3.00/$15.00. First-week 2× promo active through May 25. Blackstone × Google $5B TPU JV ('N1') also announced May 18. Source: cursor.com/blog/composer-2-5 + Blackstone press release, May 18, 2026.

Promo ends May 25 — tomorrow
May 19
Gemini 3.5 Flash GA + I/O pricing tier changes
Google

Gemini 3.5 Flash: $1.50/$9.00. AI Ultra repriced $250 → $200/mo. New $100/mo tier at 5× Pro limits + 20TB storage. $25 pay-as-you-go credit packs. Antigravity 2.0 GA on same day. Source: Google I/O 2026 100 Things items 66–67 + TechCrunch.

AI Ultra $250 → $200 + new $100 tier
May 20
GitHub purges Gemini from Copilot Chat web
GitHub

All Gemini models + GPT-5.2 Codex + GPT-5.4 nano removed from Copilot Chat on the web only. Scope: web surface. VS Code, JetBrains, and CLI surfaces NOT explicitly confirmed in same notice. Source: GitHub Changelog, May 20, 2026.

Web surface only — not VS Code/JetBrains
Jun 1
GitHub Copilot → AI-credit billing
GitHub

Copilot transitions from request-based to usage-based 'AI credit' billing. Per-credit dollar pricing NOT published as of 2026-05-24. Verify pricing at github.com/settings/billing on or after June 1. Source: GitHub Changelog, May 12, 2026.

Per-credit pricing unpublished — 8 days

For further context on the Copilot billing transition and premium-request multiplier mechanics, see our Copilot vs Cursor vs Windsurf comparison. For the broader agentic-coding platform matrix across 20 tools, see the Q2 2026 20-platform agentic-coding matrix.

03Strategic AnalysisThe Flash-first inversion — what Google's I/O cadence change signals.

Every prior major Google I/O shipped a Pro model first. Gemini 1.5 Pro appeared ahead of 1.5 Flash. Gemini 2.0 Pro headlined before Flash variants followed. The pattern was so consistent that developer convention assigned “Pro” as the keynote anchor and “Flash” as the follow-on efficiency tier.

I/O 2026 inverted this. Gemini 3.5 Flash reached GA on May 19 — keynote day. Gemini 3.5 Pro was held for June 2026 with Sundar Pichai's stage commitment: “Already being used internally and we look forward to rolling it out next month.” (Sundar Pichai, I/O 2026 keynote, May 19, 2026.)

Two explanations, not mutually exclusive: (1) Compute scarcity — Pro models require significantly more inference compute per token; launching Flash first may reflect that Google can serve Flash at scale while Pro demand is still being ramped at the infrastructure layer. (2) Agent-era positioning— Gemini 3.5 Flash, at $1.50 input and "less than half the cost of other frontier models" per Google's I/O 2026 100 Things recap, is explicitly positioned as the agent-optimized tier. Agentic workloads run millions of API calls per task; the economics only work at Flash pricing. Google may be signaling that Flash is the correct model for the agent era — and Pro is the fallback for tasks that genuinely require maximum capability.

The interpretation matters for practitioners. If Google is correct that Flash-class models are the right default for agent loops, then Composer 2.5 ($0.50/$2.50), Grok Build ($1.00/$2.00), and Gemini 3.5 Flash ($1.50/$9.00) represent the three most cost-relevant pricing points for 2026 agentic deployment — not Opus 4.7 or GPT-5.5. For the head-to-head analysis of these three on agentic coding tasks, see our Gemini 3.5 Flash vs GPT-5.5 vs Opus 4.7 head-to-head.

Pichai's use of Antigravity to co-develop Flash itself is the strongest internal signal: Antigravity 2.0 was used to build the model that now powers Antigravity. The recursive loop — Flash accelerating Flash — is both a product story and an architecture thesis about where the performance ceiling actually sits in this generation of models.

We've been using 3.5 Flash with the reimagined version of our agent-first development platform, Antigravity. And it's dramatically accelerated how we build.Sundar Pichai, CEO Google, I/O 2026 keynote, May 19, 2026

04Cost-Per-Task MatrixEffective $/task across every May 2026 model — the four levers most comparisons ignore.

Most model-pricing comparisons print headline $/Mtok rates and stop there. Four levers — tokenizer overhead, long-context surcharges, cache hit rate, and Fast-mode pricing — routinely produce 2-10× effective-cost differences that never appear in the headline rate. The matrix below makes each lever explicit. Per-task math: $/Mtok-in × 0.1 (100K input tokens) + $/Mtok-out × 0.02 (20K output tokens), uncached. All rates from vendor docs retrieved 2026-05-24.

Effective cost per task — May 2026 model lineup (ascending)

Source: vendor pricing docs, all retrieved 2026-05-24. Per-task = 100K input + 20K output, uncached.
Composer 2.5 Standard$0.50/$2.50 per Mtok · Cursor IDE only · no public API
$0.10/task
Grok Build 0.1$1.00/$2.00 per Mtok · 256K context · SuperGrok Heavy required
$0.14/task
GPT-5.4-mini$0.75/$4.50 per Mtok · general availability
$0.165/task
Grok 4.3$1.25/$2.50 per Mtok · 1M context
$0.175/task
Haiku 4.5$1.00/$5.00 per Mtok
$0.20/task
Gemini 3.5 Flash$1.50/$9.00 per Mtok · 1.05M context · live-verified 2026-05-24
$0.33/task
GPT-5.4$2.50/$15.00 per Mtok
$0.55/task
Sonnet 4.6$3.00/$15.00 per Mtok · 1M context
$0.60/task
Opus 4.7 Standard$5.00/$25.00 per Mtok · +35% tokenizer overhead risk · 1M context
$1.00/task
Opus 4.7 Standard (+35% tokenizer)Anthropic: 'new tokenizer may use up to 35% more tokens for same fixed text'
$1.35/task
GPT-5.5 (≤272K session)$5.00/$30.00 per Mtok · surcharge applies to FULL session above 272K
$1.10/task
Critical: The long-context surcharge applies to the FULL session

OpenAI's own documentation states: “Prompts with >272K input tokens are priced at 2× input and 1.5× output for the full session.” This is not a per-overflow charge — the entire session reprices when input exceeds 272K tokens. At that threshold, GPT-5.5 effective per-task cost more than doubles ($1.10 → $2.20). Three vendors apply similar surcharges at scale: Anthropic Fast mode charges 6× the standard rate; OpenAI charges 2×/1.5× at full session above 272K; Googleapplies a tiered surcharge on Gemini 3.1 Pro above 200K input. “1M context” in the headline never means flat pricing at 1M. Verify the surcharge structure before architecting long-context agent loops.

The Opus 4.7 tokenizer overhead is a separate risk category from the long-context surcharge. Anthropic's pricing documentation discloses: “Opus 4.7 uses a new tokenizer compared to previous models. This new tokenizer may use up to 35% more tokens for the same fixed text.” For workloads migrating from Sonnet 4.6 to Opus 4.7, this overhead means effective per-task cost can reach $1.35 even at the $5/$25 headline rate — a 35% cost premium invisible in the API pricing page. For the full cost-strategy analysis at 1M context scale, see our Opus 4.7 cost strategy guide.

05Benchmark TableSWE-Bench normalized — Opus 4.7 as 100 baseline.

Raw SWE-Bench Verified percentages are not directly comparable across vendors because different evaluation harnesses, task selections, and instruction formats produce materially different scores on the same underlying model. Rebasing to a common anchor (Opus 4.7 = 100) makes the relative capability difference readable: a score of 80 means that model completes roughly 80% of the tasks Opus 4.7 would complete in the same harness, all else equal.

Critical caveat: Composer 2.5 and Gemini 3.5 Flash do not publish SWE-Bench Verified scores at launch. Both publish only vendor-controlled or vendor-published benchmarks (CursorBench v3.1, SWE-Bench Multilingual, Terminal-Bench 2.1, MCP Atlas). Cross-variant comparison with SWE-Bench Verified for these models is not appropriate per Day 02 SWE-Bench methodology research. For the full leaderboard analysis, see our Q2 2026 SWE-Bench leaderboard analysis.

Opus 4.7
SWE-Bench Verified — baseline (100)
87.6%

Anthropic published. API ID: claude-opus-4-7. Knowledge cutoff Jan 2026. New tokenizer — up to 35% more tokens vs prior models. $5/$25 per Mtok.

Index: 100 — anchor
Codex CLI / GPT-5.5
SWE-Bench Verified — index 101.3
88.7%

OpenAI published for Codex CLI on GPT-5.5. Marginal lead over Opus 4.7 baseline in vendor evaluation. GPT-5.5: $5/$30 per Mtok (272K surcharge applies to full session). Source: OpenAI; note — confirm primary source before citing.

Index: 101.3 — marginal lead
Grok Build 0.1
SWE-Bench Verified — index 80.8
70.8%

xAI internal harness ⚠️. At $1.00/$2.00 per Mtok, roughly 1/5 the API cost of Opus 4.7 for 80.8 index performance. SuperGrok Heavy required ($99 intro → $300 list). Source: xAI News, May 14, 2026.

Index: 80.8 — cost vs capability trade
Kimi K2.5
SWE-Bench Verified — index 87.7
76.8%

Moonshot model card. Kimi K2.5 is the open-weight base model powering Composer 2.5 (1.04T / 32B active MoE, 256K context, Modified MIT). Composer 2.5 adds ~85% proprietary RL post-training compute on top of K2.5. Index: 87.7.

Index: 87.7 — Composer 2.5 base

The “open-weight base + proprietary RL” architecture that Composer 2.5 demonstrates — Moonshot's open-weight Kimi K2.5 as the foundation, with ~85% of total compute in Cursor's proprietary RL post-training — is the clearest commercial example of a two-layer split in 2026. The controlled experiment is unusually clean: Composer 2 and Composer 2.5 share the same base model (K2.5), separated by 60 days and 25× more synthetic tasks during RL post-training. The result — +11.0 points on CursorBench v3.1, +6.1 on SWE-Bench Multilingual, +7.6 on Terminal-Bench 2.0 — is the strongest single data point in 2026 for “post-training is the new scaling axis.” For the Chinese open-weight context for K2.5 and Kimi K2.6, see our Kimi K2.5 agent swarm guide and the Q2 2026 Chinese AI market share report.

06Industry PatternThe intro-promo cliff pattern — four active discounts, three expiring this week.

Four separate intro discounts are active across the May 2026 frontier model landscape. Three expire within eight days of this publication date. This is not a coincidence — it is a pattern that has now repeated across every major launch since Opus 4.7 in April.

The four active promos (as of May 24, 2026):

1. Composer 2.5 first-week 2× rate. Standard: $0.25/$1.25 during promo period (effectively half the list price). Promo ends May 25 — tomorrow from publication. Standard rate of $0.50/$2.50 applies from May 26. Source: Cursor blog — Composer 2.5 launch post, May 18, 2026.

2. Codex Pro 2× promo.“Double your normal Codex usage on the $100/month tier until May 31, 2026.” Seven days from publication. Source: OpenAI Codex pricing page, retrieved 2026-05-24.

3. SuperGrok Heavy $99/mo intro.Six-month introductory rate, then $300/mo list price. Grok Build access requires SuperGrok Heavy — meaning the effective cost of Grok Build roughly triples after the intro period ends. DISAMBIGUATION: Standard SuperGrok at $300/year (~$25/mo) is a different tier. Heavy is $300/month post-promo. Both display “$300” in different billing contexts.

4. Opus 4.7 Copilot promo — already expired April 30.The 7.5× premium-request multiplier on GitHub Copilot doubled to 15× on April 30. For context: Copilot Pro includes 300 premium requests. At 15×, that is effectively 20 Opus prompts per billing period — a meaningful effective-cost change for teams that built workflows at the 7.5× rate.

The strategic implication: introductory rates are now the industry default for frontier model launches. Every model in the May cohort launched with a time-limited discount. Teams that evaluate models during the promo window and budget accordingly will face a 2–3× cost increase within 90 days of every major launch. The right organizational response is to build workflows that are cost-portable across model tiers — not to optimize exclusively for the intro price. For the efficient frontier of price vs performance across the full Q2 2026 model landscape, see our Q2 2026 price-vs-performance efficient frontier.

07Enterprise DeploymentThe Big Four matrix completed — and what it means for enterprise AI.

The May 4–21 deployment wave did not just deliver new models — it completed the Big Four audit-firm AI-deployment matrix, the clearest signal that enterprise AI has moved from pilot to volume contract infrastructure.

The sequence: Anthropic + Blackstone $1.5B JV (May 4) → OpenAI Deployment Co $4B (May 11) → SAP + Claude Joule (May 12) → PwC + Anthropic 30K seats (May 14) → KPMG + Anthropic 276K seats (May 19) → EY × Microsoft $1B direct (May 21). With Deloitte's Microsoft-adjacent posture, three of the four Big Four firms are now on Claude at scale. The EY + Microsoft $1B deal closes the matrix — the four largest professional services firms in the world have each made an explicit, public AI deployment commitment at the volume level.

The parallel Anthropic counter-programming at Code with Claude London (May 19–20) — shipping self-hosted sandboxes and MCP tunnels the same day as I/O — was not coincidental timing. The combination of the KPMG deal announcement (276K seats) and the enterprise sandbox features signals Anthropic's Q2 2026 go-to-market: land large-volume enterprise contracts, then deepen the moat with infrastructure that only runs in the customer's VPC.

For practitioners running AI strategy at large organizations: the Big Four adoption pattern is the clearest proxy signal for where enterprise AI spend is moving in H2 2026. Three-of-four on Claude is not coincidence — it reflects a specific set of security, compliance, and contract terms that Anthropic has built for the professional-services buyer. Our AI transformation advisory services work with organizations navigating these same vendor-selection and deployment-security decisions.

Quote: Anthropic Managed Agents — May 19, 2026

“Agents can now operate in a sandbox you control, with both the sandbox where an agent executes tools and the services it reaches running within the established boundaries of your enterprise, under your security and runtime controls.” — Anthropic engineering team, New in Claude Managed Agents, May 19, 2026.

08June 2026 OutlookWhat ships in June — confirmed commitments and open questions.

June 2026 carries three confirmed hard dates and one major strategic unknown that will shape the next quarter of AI pricing decisions.

June (early) — Gemini 3.5 Pro rollout.Sundar Pichai committed on stage at I/O: “Already being used internally and we look forward to rolling it out next month.” No launch date within June has been specified. Pricing is not published. This will be the most anticipated model release of Q2 2026: if Google holds the Flash-first pattern, 3.5 Pro will land as a premium tier above 3.5 Flash — potentially with a tiered surcharge above a context threshold, following the Gemini 3.1 Pro pattern (200K threshold, 2× rate above). Watch the Gemini API pricing page for the first indication of launch.

June 1 — GitHub Copilot AI-credit billing begins.As of May 24, 2026, per-credit dollar pricing has not been published. Teams running Copilot at scale need to download their April usage reports (available from May 12, 2026) and model the credit-vs-request translation before June 1. The premium-request multipliers remain the best available proxy: Opus 4.7 at 15× and GPT-5.5 at 7.5× define the outer bounds of per-task cost in AI credits. For the detailed premium-request mechanics, see the GitHub Copilot requests documentation.

May 31 — Codex Pro 2× promo expires. Seven days from publication. Teams on the $100/mo Codex Pro plan will see their effective Codex capacity halve after the promo window closes.

Strategic unknown: Cursor + SpaceX. The SpaceX $60B acquisition option (April 21–22, 2026) triggers approximately 30 days after the SpaceX IPO, currently targeted for June 12, 2026, at a $1.75T valuation. If the IPO proceeds and the option exercises, the acquisition closes approximately July 2026. Cursor's training roadmap (Composer 3 on Colossus compute) and pricing strategy would become subject to SpaceX + xAI decisions rather than Cursor's independent roadmap. This is the largest single structural uncertainty in the agentic-coding pricing landscape for H2 2026. For forward-looking analysis, see our 30 agentic AI predictions for H2 2026.

For the full Q2 2026 model-launch arc across March, April, and May, see our Q2 2026 release-velocity index. For the H1 2026 retrospective placing May 2026 in context, see the H1 2026 agentic-coding retrospective.

Conclusion

May 2026 was a pricing month as much as a launch month — and the promo windows are closing.

The most useful lens for May 2026 is not “which model is best” but “which pricing structure is sustainable for my workload.” Gemini 3.5 Flash at $1.50/$9.00, Composer 2.5 Standard at $0.50/$2.50, and Grok Build at $1.00/$2.00 represent a genuinely new pricing tier for frontier-class coding and agent intelligence. Three months ago, sub-$1.00/Mtok input was only available on lower-capability or open-weight models. Today, all three of these are production-grade with documented benchmark anchors and vendor support channels.

The Flash-first inversion at I/O 2026 is Google confirming what the pricing data has been showing for six months: smaller, faster, cheaper models are not compromises — they are the correct architecture for agent loops that run thousands of tasks per hour. Gemini 3.5 Pro will land in June as the capability ceiling for tasks that require it, but the default for new agentic deployments in Q3 2026 is Flash-tier economics, not Pro-tier economics.

Three things to act on before the end of May: lock in Composer 2.5 workflows before the first-week promo ends tomorrow, download April Copilot usage reports before June 1 billing changes, and confirm your GPT-5.5 sessions stay under 272K tokens if you want to avoid the full-session 2× input / 1.5× output surcharge. All three are actionable today. All three will cost more if you wait.

Make sense of the May 2026 model landscape

From tracker to production-ready decisions.

We help engineering and product teams navigate model selection, pricing analysis, and agentic stack architecture decisions — translating the May 2026 release wave into build recommendations for your specific workload.

Free consultationExpert guidanceTailored solutions
What we work on

AI model selection and agentic stack strategy

  • Model selection for agentic workloads
  • Pricing-event monitoring and budget planning
  • Copilot / Cursor / Antigravity stack architecture
  • Enterprise AI deployment security (sandboxes + MCP)
  • Agentic SEO and AI-first content infrastructure
FAQ · May 2026 Model Tracker

The questions developers and product teams ask about the May 2026 model releases.

Gemini 3.5 Flash (GA May 19, 2026) was the single highest-impact launch by downstream reach: it replaced gemini-3-flash-preview as the default model in the Gemini app and AI Mode in Google Search worldwide on launch day, serving 900 million Gemini app MAU and 1 billion AI Mode MAU per Pichai's I/O stage metrics. Composer 2.5 was the most significant pricing event for coding-focused teams — delivering the lowest effective per-task cost ($0.10 uncached at 100K in / 20K out) of any frontier-class model at launch. Microsoft Copilot Studio computer-use GA on May 13 was the most significant enterprise-automation deployment event, expanding computer-use agent availability to all commercial geographies.