Microsoft's MAI models arrived at Build 2026 on June 2, 2026, and the announcement is bigger than any one model: Microsoft launched seven in-house MAI (Microsoft AI) models in a single keynote, formally ending its long-running posture as a company that mostly resold OpenAI's frontier work to Azure customers.

The flagship is MAI-Thinking-1, described by Microsoft as Microsoft AI's first reasoning model. Around it sits a family that spans coding, image generation, transcription, and voice — and a deliberate distribution strategy that puts these models on OpenRouter, Fireworks, and Baseten as well as Azure. The subtext, delivered by Microsoft AI CEO Mustafa Suleyman, is that intelligence is now a function of compute, and Microsoft intends to own the stack that produces it.

This analysis covers what actually shipped, the architecture and Microsoft-reported benchmarks behind MAI-Thinking-1, why the "zero-distillation" training claim is a procurement argument rather than a marketing line, and the new three-way model-selection framework Azure builders inherit after the April 2026 OpenAI partnership renegotiation. Every benchmark figure below is Microsoft-reported unless stated otherwise.

Key takeaways

01
Seven first-party MAI models, one keynote.Microsoft unveiled seven in-house MAI models at Build 2026 — spanning reasoning, coding, image, transcription, and voice — ending its posture as a pure OpenAI reseller and making Azure a genuine multi-source platform.
02
MAI-Thinking-1 is the reasoning flagship.Microsoft's first reasoning model is a sparse Mixture-of-Experts design — roughly 35B active of ~1T total parameters, 256K context. Per Microsoft benchmarks it scores 97.0% on AIME 2025 and is competitive with Claude Opus 4.6 on SWE-Bench Pro.
03
Zero distillation is the enterprise hook.Microsoft says MAI-Thinking-1 was trained from scratch with no third-party model outputs and on commercially licensed data. For regulated buyers facing legal scrutiny over training-data lineage, clean provenance is becoming the price of admission.
04
MAI-Code-1-Flash brings 5B-scale coding to Copilot.A 5-billion-parameter coding model trained on GitHub Copilot's production harnesses began rolling out across Copilot plans on June 2. Per Microsoft benchmarks it leads Claude Haiku 4.5 on SWE-Bench Pro while using up to 60% fewer tokens.
05
Azure builders now choose across three tiers.After the April 2026 partnership change removed OpenAI exclusivity, Azure developers pick between first-party MAI (clean IP, Azure-first), OpenAI on Azure (frontier), or open-weight models from Foundry's 11,000+ catalog — each with distinct cost and lock-in tradeoffs.

01 — What ShippedA seven-model family, launched in one keynote.

Microsoft used the Build 2026 keynote in San Francisco to present a full in-house model family rather than a single headline release. Mustafa Suleyman, Executive VP and CEO of Microsoft AI, framed the MAI lineup as the output of an internal training stack Microsoft calls its "Hill-Climbing Machine" — a repeatable pipeline spanning accelerator co-design through reinforcement learning, built so that Microsoft no longer depends on a single external lab for its core model capability.

The two announcements that matter most for builders are the reasoning flagship and the coding model: MAI-Thinking-1, Microsoft AI's first reasoning model, available in private preview on Microsoft Foundry as of June 2; and MAI-Code-1-Flash, a compact coding model that began rolling out across GitHub Copilot plans the same day. The family also includes MAI-Image-2.5 (plus a Flash variant), MAI-Transcribe-1.5, and MAI-Voice-2 — collectively the seven models Microsoft cited, with the Flash variants counted within that number.

Reasoning flagship

MAI-Thinking-1

~35B active · ~1T total · 256K context

Microsoft AI's first reasoning model, a sparse Mixture-of-Experts design. Per Microsoft benchmarks: 97.0% on AIME 2025, competitive with Claude Opus 4.6 on SWE-Bench Pro. Private preview on Microsoft Foundry; pricing TBA via Azure AI Foundry.

microsoft.ai/models/mai-thinking-1

Coding model

MAI-Code-1-Flash

5B params · adaptive thinking

A 5-billion-parameter coding model trained on GitHub Copilot's production tool harnesses, not academic benchmarks. Rolling out across Copilot plans from June 2 via the VS Code model picker — no extra setup.

microsoft.ai/news/mai-code-1-flash

Launch snapshot

The MAI family debuted at Build 2026 on June 2, 2026. MAI-Thinking-1 is in private preview on Microsoft Foundry with a public preview on MAI Playground forthcoming and Chat Completions API compatibility; pricing is TBA via Azure AI Foundry (not published as of the announcement). All seven models are also distributed on OpenRouter, Fireworks AI, and Baseten in addition to Azure — and accessible through the GitHub Models sandbox with a free GitHub account, broadening access beyond paying enterprise customers.

The multi-provider distribution is the tell. A first-party model that Microsoft also publishes on third-party inference platforms is not a captive Azure feature — it is a product positioned to win on merit in the open market. Microsoft also used Build to announce supporting infrastructure: Azure Cobalt 200 VMs (Microsoft claims a 50% performance improvement over the prior generation, optimized for agentic workloads), the Azure HorizonDB enterprise PostgreSQL service, and a joint frontier-model partnership with Mayo Clinic aimed at hospital deployment — a signal of where MAI customization is headed for regulated sectors.

02 — Inside The FlagshipMAI-Thinking-1: medium-weight scale, frontier-adjacent claims.

MAI-Thinking-1 is architected as a sparse Mixture-of-Experts model: Microsoft reports roughly 35 billion active parameters out of approximately 1 trillion total, with a 256,000-token context window — about 600 pages of text. That puts it in a deliberately medium-weight class for a reasoning model, and the pitch is that it punches well above that weight. Its 256K window sits mid-range against the long-context field we tracked in our AI context window comparison — generous for enterprise document work, short of the 1M+ tier some rivals advertise.

On benchmarks — all Microsoft-reported — MAI-Thinking-1 scores 97.0% on AIME 2025 and 94.5% on AIME 2026, strong mathematical and scientific reasoning for its size class. On SWE-Bench Pro, Microsoft reports a score that is competitive with Claude Opus 4.6; independent tech coverage corroborated the claim that the two are matched on that benchmark, while the specific score remains vendor-reported. The chart below shows MAI-Thinking-1's reported scores; read them as Microsoft's numbers pending independent evaluation.

MAI-Thinking-1 · Microsoft-reported benchmarks

Source: Microsoft AI (vendor-reported)

AIME 2025Competition math · per Microsoft benchmarks

97.0%

AIME 2026Competition math · per Microsoft benchmarks

94.5%

SWE-Bench ProSoftware engineering · competitive with Opus 4.6

vendor-stated

Microsoft also leans on a preference study to argue real-world quality beyond benchmark scores: in blind, side-by-side evaluations run by independent human raters on Surge across 1,276 single- and multi-turn tasks, raters preferred MAI-Thinking-1 over Claude Sonnet 4.6 on dimensions including task comprehension, instruction following, detail, clarity, and user value. The honest caveat: this evaluation was commissioned by Microsoft, so treat it as a vendor-run signal of usefulness — promising, but not a substitute for running the model against your own prompts. For how to do that kind of comparison rigorously, see our roundup of independent evaluations of frontier reasoning models.

"Intelligence is now a function of compute."— Mustafa Suleyman, CEO of Microsoft AI · Build 2026 keynote

03 — ProvenanceThe real moat may be clean IP, not raw scores.

The most strategically interesting MAI-Thinking-1 claim is not a benchmark — it's the training-data story. Microsoft says the model was trained from scratch with zero distillation from third-party models: no GPT-series outputs, no Anthropic model outputs, no borrowed reasoning traces, and pre-training that excluded AI-generated content, using enterprise-grade, commercially licensed data. Microsoft brands this "Capabilities Learned, Not Inherited."

Here is why that matters commercially, and why most coverage underplays it. Through early 2026, enterprise legal teams in regulated industries began scrutinizing the training-data lineage of popular open models — the distillation controversy around DeepSeek R1 being the most visible example — because a model whose outputs may have been derived from another vendor's model carries unresolved IP and indemnification questions. A model with a documented clean-lineage claim sidesteps that review. For financial services, healthcare, and defense buyers, provenance stops being a footnote and becomes a procurement gate.

Why provenance is a procurement argument

For regulated buyers, a documented zero-distillation training claim plus commercially licensed data is an IP-indemnification story that models with murkier lineage cannot easily match. That is the concrete enterprise wedge MAI-Thinking-1 is built around — and the part of the announcement most likely to move enterprise architecture decisions, regardless of where the benchmark numbers land once independently verified.

Our read: the zero-distillation claim is the most durable part of the MAI pitch. Benchmark leadership is transient — every lab leapfrogs the others quarterly — but a clean, auditable training lineage is a structural property that a buyer's legal team can sign off on once and rely on. Expect Microsoft to make provenance, not raw capability, the centerpiece of its enterprise MAI sales motion. As always, "trained on commercially licensed data" is a claim worth getting in contract language, not just in a keynote slide.

04 — Coding EconomicsMAI-Code-1-Flash: 5B that lands in your Copilot picker.

For most developers, the MAI release that changes day-to-day work isn't the reasoning flagship — it's the coding model. MAI-Code-1-Flash is a 5-billion-parameter model trained directly on GitHub Copilot's production tool harnesses rather than academic benchmarks, which is to say it was optimized for the actual edit-run-fix loop developers use rather than for leaderboard problems. It features what Microsoft calls adaptive thinking — allocating its reasoning budget by task complexity. (Note this is Microsoft's own dynamic-budget mechanism, distinct in implementation from extended-thinking features in other model families.)

The economics are the headline. Per Microsoft benchmarks, MAI-Code-1-Flash scores 51.2% on SWE-Bench Pro versus 35.2% for Claude Haiku 4.5 — a roughly 16-point lead in its size class — while using up to 60% fewer tokens than comparable models on complex coding tasks. It also reportedly scored 85.8% on Microsoft's own adversarial coding benchmark of 186 questions across 34 categories. That now sits inside the GitHub Copilot model picker, which reshapes the selection calculus we covered in our guide to GitHub Copilot model selection.

SWE-Bench Pro

MAI-Code-1-Flash (per Microsoft)

51.2%

Microsoft-reported score versus 35.2% for Claude Haiku 4.5 — a ~16-point lead in the small-model coding class. Treat as vendor-reported until independent evaluation lands.

5B params

Token efficiency

Fewer tokens on complex tasks

60%

Microsoft reports up to 60% fewer tokens than comparable models on complex coding tasks. For agentic coding loops where output tokens dominate the bill, per-task token spend often matters more than headline rates.

Cost lever

Distribution

GitHub Copilot plans

All

Rolling out from June 2 across Copilot Free, Pro, Pro+, and Max via the VS Code model picker — no additional setup. Also reachable through the GitHub Models sandbox on a free account.

From June 2

One number to handle carefully: a secondary source listed MAI-Code-1-Flash at roughly $0.75 per million input tokens and $4.50 per million output tokens, but described that pricing as still being finalized — so we treat it as unconfirmed and would not budget against it. Confirm the rate on the official Azure pricing page before committing. When it lands, slot it into a like-for-like comparison using our per-token pricing index for AI agent deployments — and weigh it against the token-efficiency gain, since fewer tokens at a slightly higher rate can still win on total cost.

05 — The Full FamilyBeyond reasoning and code: image, voice, and transcription.

The remaining MAI models target the multimodal surfaces Microsoft already controls — Office, Dynamics, and the developer tools. They matter less to a model-selection decision than MAI-Thinking-1 or MAI-Code-1-Flash, but they show how broadly Microsoft intends to embed first-party AI across its products. The table below is our consolidated view of the family; it pulls together pricing, key Microsoft-reported metrics, and availability that other coverage treats model-by-model.

MAI model

MAI-Thinking-1

Key metric (per Microsoft)

97.0% AIME 2025; competitive with Opus 4.6 on SWE-Bench Pro

Pricing & availability

~35B active / ~1T total, 256K context. Private preview on Microsoft Foundry; public preview on MAI Playground forthcoming. Pricing TBA via Azure AI Foundry.

MAI model

MAI-Code-1-Flash

Key metric (per Microsoft)

51.2% SWE-Bench Pro; up to 60% fewer tokens

Pricing & availability

5B coding model trained on Copilot's production harnesses. Rolling out across all GitHub Copilot plans from June 2. Specific per-token pricing unconfirmed at launch.

MAI model

MAI-Image-2.5

Key metric (per Microsoft)

#3 text-to-image, #2 image-editing on Arena (vendor-stated)

Pricing & availability

Live in PowerPoint, rolling out to OneDrive, plus Foundry & MAI Playground. Microsoft-listed: $5 / $8 / $47 per 1M text-input / image-input / image-output tokens.

MAI model

MAI-Image-2.5-Flash

Key metric (per Microsoft)

Lower-cost variant of MAI-Image-2.5

Pricing & availability

Microsoft-listed: $1.75 per 1M text and image input; $19.50 per 1M image output. Same surfaces as the full model.

MAI model

MAI-Transcribe-1.5

Key metric (per Microsoft)

43 languages; entity biasing for domain keywords

Pricing & availability

Microsoft's benchmarks claim leading word error rate on FLEURS and up to 5x faster than Gemini 3.1 on long audio. 1.5 pricing not yet announced (prior version was $0.36/hour).

MAI model

MAI-Voice-2

Key metric (per Microsoft)

Zero-shot voice prompting; 15+ languages

Pricing & availability

Custom voice from 5–60s of reference audio, no retraining. System-level consent enforcement; Microsoft states no unlicensed cloning. Integrating into VS Code and Dynamics 365 Contact Center.

MAI model	Key metric (per Microsoft)	Pricing & availability
`MAI-Thinking-1`	97.0% AIME 2025; competitive with Opus 4.6 on SWE-Bench Pro	~35B active / ~1T total, 256K context. Private preview on Microsoft Foundry; public preview on MAI Playground forthcoming. Pricing TBA via Azure AI Foundry.
`MAI-Code-1-Flash`	51.2% SWE-Bench Pro; up to 60% fewer tokens	5B coding model trained on Copilot's production harnesses. Rolling out across all GitHub Copilot plans from June 2. Specific per-token pricing unconfirmed at launch.
`MAI-Image-2.5`	#3 text-to-image, #2 image-editing on Arena (vendor-stated)	Live in PowerPoint, rolling out to OneDrive, plus Foundry & MAI Playground. Microsoft-listed: $5 / $8 / $47 per 1M text-input / image-input / image-output tokens.
`MAI-Image-2.5-Flash`	Lower-cost variant of MAI-Image-2.5	Microsoft-listed: $1.75 per 1M text and image input; $19.50 per 1M image output. Same surfaces as the full model.
`MAI-Transcribe-1.5`	43 languages; entity biasing for domain keywords	Microsoft's benchmarks claim leading word error rate on FLEURS and up to 5x faster than Gemini 3.1 on long audio. 1.5 pricing not yet announced (prior version was $0.36/hour).
`MAI-Voice-2`	Zero-shot voice prompting; 15+ languages	Custom voice from 5–60s of reference audio, no retraining. System-level consent enforcement; Microsoft states no unlicensed cloning. Integrating into VS Code and Dynamics 365 Contact Center.

A few items deserve a caveat. MAI-Image-2.5's Arena leaderboard rankings (Microsoft cites #3 text-to-image and #2 image-editing, surpassing Google's Nano Banana 2 on editing) are presented from Microsoft's announcement, not independently verified here. MAI-Transcribe-1.5's speed and accuracy claims are likewise vendor benchmarks. The genuinely novel capabilities — entity biasing for domain-specific transcription, and system-enforced consent on voice cloning — are the parts worth piloting, because they address real enterprise blockers (jargon accuracy, voice-likeness rights) rather than chasing a leaderboard rank.

06 — The OpenAI ContextWhat the April renegotiation actually changed.

MAI doesn't exist in a vacuum — it follows a material change to the Microsoft-OpenAI relationship. On April 27, 2026, the two companies renegotiated their partnership and removed exclusivity: Microsoft's license to OpenAI IP became non-exclusive and runs through 2032, OpenAI can now serve its products across any cloud provider (including AWS and Google Cloud), OpenAI pays Microsoft a capped 20% revenue share through 2030, and Microsoft stopped paying OpenAI a revenue share. Azure remains OpenAI's primary cloud, and OpenAI products still ship first on Azure unless Microsoft opts out.

The common framing — "Microsoft versus OpenAI" — misreads this. Microsoft has invested more than $13 billion in OpenAI since 2019, and as of October 2025 its stake in OpenAI's for-profit arm was valued at roughly $135 billion (about a 27% diluted stake). You do not build first-party models to spite a company you own a quarter of. The accurate framing is portfolio optionality: Microsoft is ensuring that OpenAI's pricing power cannot become a ceiling on Azure's gross margins, and that cost-sensitive workloads where a frontier model is overkill can run on cheaper first-party MAI instead.

"Today, we are announcing an amended agreement to simplify our partnership and the way we work together, grounded in flexibility, certainty, and a focus on delivering the benefits of AI broadly."— OpenAI official statement · April 27, 2026

For Azure builders, the practical consequence is that the once-implicit "default to OpenAI" assumption is gone. With OpenAI now multi-cloud and MAI now first-party, the platform's value proposition shifts from "the place to get GPT" to "the place to choose among the strongest options for each workload" — which is exactly the decision the next section maps.

07 — Decision FrameworkThe new three-tier Azure model choice.

Put the pieces together and Azure builders inherit a genuinely new three-way selection framework after Build 2026. It is a specific instance of the broader build-vs-buy decision for enterprise AI — except now all three options are "buy," differentiated by IP provenance, cost, and lock-in rather than by whether you train your own model. The choice matrix below maps the tradeoffs.

Tier 1 — First-party

MAI on Azure

Best for cost-sensitive, high-volume, or compliance-bound workloads where a documented zero-distillation lineage and commercially licensed training data matter. Azure-first availability, enterprise SLAs, and the cleanest IP story — at the cost of tighter Microsoft-ecosystem alignment.

Clean IP · cost lever

Tier 2 — Partner frontier

OpenAI on Azure

Best for the hardest reasoning and the most capable GPT-series work. Still first-to-ship on Azure, but no longer Azure-exclusive after the April 2026 renegotiation — so there's less of an Azure premium for choosing it, and more leverage to negotiate.

Frontier capability

Tier 3 — Open-weight

Foundry's 11,000+ catalog

Best for fine-tuning, sovereignty-bound deployment, and price-sensitive bulk inference. Azure AI Foundry hosts 11,000+ models from Microsoft, OpenAI, Anthropic, DeepSeek, xAI, Meta, Mistral, Cohere, NVIDIA, and more — pick by license, capability, and self-hosting needs.

Flexibility · sovereignty

The most underappreciated point: these tiers are not mutually exclusive. A well-architected production system routes by task class — MAI-Code-1-Flash for high-volume routine coding inside Copilot, a frontier model for the genuinely hard reasoning, and an open-weight model from Foundry where fine-tuning or data residency is the binding constraint. The April renegotiation is what makes this routing economically sensible: with no exclusivity premium pinning you to one provider, optimizing per-workload is now the default-correct strategy rather than a niche one.

08 — ImplicationsWhat it means for agencies and engineering teams.

For teams already building on Azure, MAI is less a disruption than a new set of defaults to re-evaluate. The practical moves are concrete: pilot MAI-Code-1-Flash in Copilot on real repositories and measure token spend against your current model, not the leaderboard; for any regulated workload, ask vendors for the training-data lineage in writing and treat MAI's zero-distillation claim as a comparison baseline; and re-run your model-routing logic now that the exclusivity premium is gone.

Looking forward, the more important shift is structural. When the largest cloud provider ships a first-party model family with a clean-IP narrative and prices it as a cost lever beneath frontier models, it normalizes a two-speed pattern: expensive frontier models for the small share of genuinely hard tasks, and cheaper, governed, in-house models for the high-volume remainder. Over the next few quarters, expect provenance and per-task cost — not raw benchmark leadership — to become the primary axes on which enterprise model decisions are made. That is the trend MAI accelerates, and it favors teams disciplined enough to route deliberately rather than defaulting to whatever model is loudest this quarter.

If you're weighing where MAI, OpenAI, and open-weight models fit across your own pipelines, our AI digital transformation engagements start with exactly this kind of comparative, cost-aware model evaluation — benchmarked on your prompts and your governance constraints, not a vendor slide.

09 — ConclusionMicrosoft stops reselling and starts owning.

The shape of Microsoft's AI strategy, June 2026

The MAI launch is less about one benchmark than about who owns the stack.

Build 2026 marked the moment Microsoft stopped being primarily an OpenAI distribution channel and became a first-party model lab in its own right. Seven MAI models, a reasoning flagship with frontier-adjacent Microsoft-reported numbers, and a coding model already inside every Copilot picker — all shipped in a single keynote and distributed well beyond Azure.

The honest framing is that the benchmark claims are vendor-reported and await independent evaluation, the headline pricing for the flagship is still TBA, and the preference study was commissioned by Microsoft. None of that diminishes the strategic significance. The durable advantage here is the zero-distillation, commercially-licensed-data provenance story — a structural property a buyer's legal team can sign off on, in a market where training-data lineage is becoming a procurement gate.

For builders, the takeaway is practical: the era of defaulting to one model on Azure is over. After Build 2026, the right move is to choose deliberately across first-party MAI, partner-frontier OpenAI, and open-weight Foundry models — routing by provenance, cost, and capability per workload. The teams that win the next phase won't be the ones chasing the highest benchmark; they'll be the ones who match the right model to the right job, and can prove where that model came from.

Microsoft's MAI Models at Build 2026: First-Party AI Bet

01 — What ShippedA seven-model family, launched in one keynote.

MAI-Thinking-1

MAI-Code-1-Flash

02 — Inside The FlagshipMAI-Thinking-1: medium-weight scale, frontier-adjacent claims.

MAI-Thinking-1 · Microsoft-reported benchmarks

03 — ProvenanceThe real moat may be clean IP, not raw scores.

04 — Coding EconomicsMAI-Code-1-Flash: 5B that lands in your Copilot picker.

MAI-Code-1-Flash (per Microsoft)

Fewer tokens on complex tasks

GitHub Copilot plans

05 — The Full FamilyBeyond reasoning and code: image, voice, and transcription.

06 — The OpenAI ContextWhat the April renegotiation actually changed.

07 — Decision FrameworkThe new three-tier Azure model choice.

MAI on Azure

OpenAI on Azure

Foundry's 11,000+ catalog

08 — ImplicationsWhat it means for agencies and engineering teams.

09 — ConclusionMicrosoft stops reselling and starts owning.

The MAI launch is less about one benchmark than about who owns the stack.

A clean-IP first-party model changes how enterprises choose what to deploy.

Model-selection engagements

The questions we get every week.

Continue exploring frontier releases.

Enterprise-Governed AI Coding Lands in VS Code Copilot

Dataverse Meets Claude, Cursor and Copilot via MCP

Grok 4.3 on Amazon Bedrock: xAI Goes Enterprise 2026

ChatGPT Lockdown Mode: The AI Data-Exfiltration Control