MiniMax M3 launched on May 31, 2026, and the pitch is bold: the first open-weight release to hold frontier coding, a one-million-token context window, and native multimodality at the same time. The architecture behind it is MiniMax Sparse Attention, the launch price is a fraction of closed frontier, and the API was live day one. The honest framing matters just as much as the headline.
Two things temper the excitement, and a credible read has to surface both. First, every benchmark MiniMax published was run on its own infrastructure with no independent validation at launch. Second, the open weights were not actually available on launch day; MiniMax committed to releasing them within roughly ten days, and the license was unconfirmed. So the "fully open" story is, for now, a promise rather than a delivered fact.
This guide covers what shipped, the sparse-attention design that makes 1M context affordable, the vendor-stated benchmark picture, the awkward timing against Claude Opus 4.8, the pricing math that makes the token plans genuinely interesting, and a clear decision framework for who should adopt now versus wait.
- 01Three capabilities in one open-weight model.MiniMax bills M3 as the first open-weight release to combine frontier coding, a 1M-token context window, and native multimodal input (images, video, computer use) in a single model.
- 02Sparse Attention is the headline engine.MiniMax Sparse Attention (MSA) selects relevant blocks of uncompressed key-values instead of running full quadratic attention, cutting per-token compute to a vendor-stated 1/20th of M2 at 1M-token context.
- 03Every benchmark is vendor-run.SWE-Bench Pro 59.0%, Terminal-Bench 2.1 66.0%, OSWorld-Verified 70.06%, BrowseComp 83.5, and MCP Atlas 74.2% were all produced on MiniMax infrastructure with no independent validation confirmed at launch.
- 04Level with Opus 4.7, behind Opus 4.8.M3's comparisons targeted Claude Opus 4.7. On directly comparable agent benchmarks it trails the three-days-older Opus 4.8 by roughly 10 to 14 points, while undercutting closed-frontier pricing dramatically.
- 05Open weights were pending, not shipped.At launch the weights were not yet on Hugging Face and the license was unconfirmed. MiniMax committed to an open-weight release within roughly ten days. Treat enterprise on-prem plans as a near-term promise.
01 — What ShippedAn API release, with open weights committed within days.
What went live on May 31 was the model behind an API, not a weights drop. M3 appeared same-day on OpenRouter under minimax/minimax-m3, exposed a 1M-token context window, and shipped with day-one compatibility for IDE integrations including Claude Code, Cursor, Roo Code, and Cline. The API uses a toggleable thinking mode: on for deep reasoning and long-horizon planning, off for low-latency completion.
MiniMax positions M3 as a clean version break from the M2 line, not a decimal increment. Its direct predecessor was M2.7, a self-evolving model that the company reported was handling a meaningful share of its internal reinforcement-learning workflow autonomously. M3 carries that agentic-first philosophy forward and adds native multimodality. If you followed the lineage, our MiniMax M2.7 and MiniMax M2.5 guides set the context for how aggressively this team has been iterating.
M3 API
Available via the MiniMax platform and on OpenRouter under minimax/minimax-m3 from launch day, with day-one support for Claude Code, Cursor, Roo Code, and Cline. The model card lists a 1M-token context window.
Open weights
MiniMax committed to publishing open weights and a technical report within roughly ten days of launch. At launch the weights were not yet available and the exact license was unconfirmed. Verify before planning any on-prem deployment.
One detail builders should not over-read: M3's total parameter count is undisclosed. The M2.7 predecessor was a 229B-total / 9.8B-active mixture-of-experts model, but MiniMax has not confirmed M3 inherits those figures, so we treat the size as unknown rather than carrying forward old numbers. Always read the official model card and license text once the weights actually publish.
02 — The Three-Way JamWhy coding, context, and multimodality have been incompatible.
The reason this release reads as ambitious is that those three properties have historically pulled against each other. Quadratic attention scaling makes a genuinely usable million-token window expensive, which is why long-context models have leaned on compression tricks that trade away precision. Multimodality bolted on after the fact has tended to weaken visual reasoning rather than strengthen it. And frontier-level coding has typically demanded the kind of dense compute budgets that push long-context inference costs past what most teams will pay.
MiniMax's framing is that M3 breaks all three constraints at once: a sparse-attention design that keeps long context affordable, a training corpus that was multimodal from step zero, and agentic coding scores it claims rival closed frontier. Whether the model fully delivers on that promise is exactly the question independent benchmarks have not yet answered. The architecture is real and documented; the capability claims are vendor-stated.
Interleaved pretraining corpus
MiniMax states M3 was pretrained on over 100 trillion tokens of natively interleaved text, image, and video data from step zero, rather than fitting a vision adapter onto a text model after the fact.
OmniDocBench (vendor)
MiniMax reports M3 scoring above Gemini 3.1 Pro on OmniDocBench document understanding, attributing the result to the multimodal training pipeline. No independent confirmation at launch.
SVG-Bench (vendor)
On SVG-Bench, which measures turning a visual into code, MiniMax claims M3 surpasses Claude Opus 4.7. Video benchmarks were run on up to 1,024 frames, but no numeric video scores were published at launch.
03 — MiniMax Sparse AttentionMSA: block-level selection of real key-values.
The technical centerpiece is MiniMax Sparse Attention (MSA). Rather than computing full quadratic self-attention across the whole sequence, MSA performs block-level selection on the key-value cache. The distinction MiniMax draws against latent-compression approaches matters: where some designs compress key-values into a smaller latent space, MSA selects relevant blocks of uncompressed grouped-query key-values. The claimed benefit is that precision and prefix-caching compatibility are preserved, because the model still attends to the actual stored representations rather than a lossy summary.
The kernel engineering is part of the story. MiniMax describes a "KV outer gather Q" pattern, where key-value blocks form the outer loop and every query that hits a given block is batched together so that block is read from memory exactly once, in contiguous rather than scattered access. MiniMax claims this runs more than four times faster than open-source sparse-attention alternatives such as Flash-Sparse-Attention or flash-moba. As with the rest of the architecture claims, that figure is vendor-stated.
MSA efficiency vs M2 at 1M-token context · vendor-stated
Source: MiniMax M3 blog (vendor-stated; not independently validated)Read the bars carefully. The compute figure is the model's claimed reduction to roughly one-twentieth of M2's per-token cost at one million tokens, and the speedups are the vendor's prefill and decode numbers on its own hardware. The earlier MiniMax teaser cited more precise figures of about 9.7 times prefill and 15.6 times decode; the final launch materials rounded these to more than nine and more than fifteen times. Either way, none of it has been reproduced by a third party, so treat it as a design claim worth testing rather than a settled result.
MSA operates on a standard GQA backbone but utilizes block-level selection on real, uncompressed Key-Values.— Elie Bakouch, Prime Intellect (AI training infrastructure)
04 — BenchmarksThe numbers, every one of them vendor-run.
MiniMax published a strong agentic benchmark sheet. Before quoting any of it, set expectations: these scores were produced on MiniMax infrastructure and had no independent validation at launch. If you are unsure what these evaluations actually measure, our SWE-Bench Pro and Terminal-Bench 2.1 guide breaks down the methodology. The headline figures: SWE-Bench Pro 59.0%, Terminal-Bench 2.1 66.0%, OSWorld-Verified 70.06% for computer-use task completion, BrowseComp 83.5 for autonomous web search, and MCP Atlas 74.2% for tool use.
M3 benchmark sheet · vendor-stated, not independently validated
Source: MiniMax M3 blog + VentureBeat (all M3 scores vendor-stated)Two long-horizon autonomy demos give a more textured sense of what M3 can attempt. In one, MiniMax reports the model ran roughly twelve hours without human intervention, produced 18 commits and 23 experimental figures, and reproduced an ICLR 2025 award-winning paper with a vendor-stated reproduction score of 0.650. In another, M3 reportedly improved NVIDIA Hopper FP8 GEMM hardware utilization from 7.6% to 71.3% across 147 submissions over about a day with no reference solution, where comparable models gave up after a few dozen attempts. Both demos are vendor-reported.
Every one of those numbers is vendor-run, on MiniMax's own infrastructure.— Thomas Wiegold, independent researcher
That caveat is the editorial spine of this release. A useful contrast: other recent frontier launches had independent intelligence-index numbers within roughly a day. M3 did not, at launch. The disciplined move is to wait for independent leaderboard and arena results before treating any of these figures as production-grade evidence, then run your own evaluation on the prompts you actually care about.
05 — The Timing GapM3 was benchmarked against the wrong Opus.
Here is the framing most day-of coverage either missed or buried. M3 launched on May 31, three days after Claude Opus 4.8 shipped on May 28. MiniMax's comparisons were set against Claude Opus 4.7, the pre-Opus-4.8 frontier. On the agent benchmarks where a direct comparison exists, the newer Opus 4.8 leads M3 by double-digit margins: SWE-Bench Pro 69.2% versus 59.0%, Terminal-Bench 2.1 74.6% versus 66.0%, and OSWorld-Verified 83.4% versus 70.06%.
That is not a reason to dismiss M3 — it is a reason to frame it correctly. The accurate read is that M3 lands roughly level with Opus 4.7 on agentic work while costing a small fraction of the closed price, and trails the three-days-older Opus 4.8 by ten to fourteen points on directly comparable evals. Stated that way, the story stays compelling without overclaiming. For the closed-frontier reference point, see our Claude Opus 4.8 release coverage.
M3 vs the model it did not compare against · Claude Opus 4.8
Source: VentureBeat (M3 scores vendor-stated; Opus 4.8 from independent benchmarks)06 — Pricing & PlansWhere M3 is genuinely disruptive.
The cost story is where M3 makes its strongest case. Launch pricing is $0.30 per million input tokens and $1.20 per million output on a limited-time 50% promotion, with a standard rate of $0.60 and $2.40 after. At standard rates that is roughly 8% to 20% of leading closed-frontier per-token pricing. Requests up to 512K input tokens bill at the standard rate; longer contexts cost more, with the exact surcharge not publicly disclosed at launch. For broader context on how this sits against the rest of the market, see our API pricing comparison.
The subscription tiers are the more interesting wrinkle. MiniMax offers shared multimodal quota across text, image, speech, and music: a Plus tier at roughly 1.7 billion tokens a month with 3 to 4 concurrent agents, a Max tier at roughly 5.1 billion tokens with 4 to 5 concurrent agents, and an Ultra tier at roughly 9.8 billion tokens with 6 to 7 concurrent agents. For high-volume agentic builders the breakeven math is striking, which is what the table below works out.
Tokens per month
Around 1.7 billion shared tokens a month with 3 to 4 concurrent agents. At the $0.60 standard input rate, 1.7 billion input tokens alone would run on the order of a thousand dollars on pay-as-you-go.
Tokens per month
Around 5.1 billion shared tokens with 4 to 5 concurrent agents, plus a small daily allowance of video clips. The middle tier for teams running several agents in parallel through the day.
Tokens per month
Around 9.8 billion shared tokens with 6 to 7 concurrent agents and a larger daily video allowance. Aimed at heavy multi-agent workloads where pay-as-you-go would be far more expensive.
Solo developer, multi-agent
~1.7B tokens and 3-4 concurrent agents. The equivalent pay-as-you-go input cost alone is on the order of ~$1,000/month at standard rates, so the subscription is a large discount for steady high-volume use. Confirm billing cadence before committing.
Small team, parallel agents
~5.1B tokens and 4-5 concurrent agents. The natural fit when several builders or pipelines hit M3 through the day and the workload is predictable enough to favor a flat rate over metered spend.
Heavy multi-agent workloads
~9.8B tokens and 6-7 concurrent agents. For sustained automation at scale this dwarfs pay-as-you-go pricing, but only if your usage actually approaches the quota each month.
Bursty or unpredictable usage
$0.30/$1.20 promo or $0.60/$2.40 standard per 1M tokens. Best when volume is low or spiky and you would not get near a subscription quota. Watch the 512K long-context billing threshold for big-document workloads.
07 — The Open-Weight CaveatA Day-0 promise, not a delivered reality.
The "open-weight" label is doing a lot of work in the launch messaging, so be precise about it. On May 31 the weights were not on Hugging Face, and the license was unconfirmed — the candidates named in coverage were a permissive open license, but nothing was settled. MiniMax committed to publishing the weights and a technical report within roughly ten days. For anyone planning on-prem deployment, fine-tuning, or sovereignty-bound use, that means the open story is a near-term commitment to verify, not a capability you could act on at launch.
There is also a governance consideration that belongs in any enterprise evaluation, even for API use. As a Chinese company, MiniMax operates under China's 2017 National Intelligence Law, which obligates domestic firms to support and cooperate with state intelligence work. That applies to API-routed prompts regardless of where the user sits. It is not a reason to rule M3 out, but teams handling sensitive or regulated data should account for it alongside the usual model-selection criteria.
08 — Who Should SwitchA decision framework for builders and teams.
M3 is not a universal default, but it is a strong fit for specific profiles today and a wait-and-watch for others. The deciding factors are usage volume, tolerance for unvalidated benchmarks, and whether you need open weights now or can act on the API while they ship.
Cost-sensitive parallel agents
If you run many concurrent agents on a predictable workload, the token plans and per-token pricing make M3 hard to ignore. Benchmark it on your own tasks against your current default, then decide on the subscription tier that matches real usage.
Documents, images, video at scale
Native multimodality plus a 1M-token window at this price is a genuinely differentiated combination. Validate the multimodal quality on your data, since the supporting scores are vendor-stated and the numeric video benchmarks were not published.
Maximum capability, cost secondary
If you need the strongest agentic coding available and price is a secondary concern, the current independent picture favors closed frontier such as Claude Opus 4.8, which leads M3's vendor numbers on the comparable benchmarks.
On-prem or compliance-bound
Wait. The open weights and license were not confirmed at launch, and the data-governance considerations need accounting for. Revisit once the weights publish, the license is known, and independent evaluations land.
For most agencies and engineering teams the right first step is a scoped evaluation: run M3 on the prompts and repositories you actually care about, measure token spend and latency against your current default, and decide per-workload rather than per-headline. If you want help structuring that comparison, our AI digital transformation engagements start with exactly this kind of model-selection eval, and our development team can wire the winning model into your agent stack.
09 — ConclusionA real release with an asterisk.
A compelling open-weight option — once the verification arrives.
MiniMax M3 is an ambitious release that claims something genuinely new for open models: frontier coding, a million-token context window, and native multimodality fused into one model, powered by a sparse attention design that makes long context affordable rather than aspirational. The pricing, especially the token-plan breakeven math, is the most immediately actionable part of the story.
The honest framing keeps two facts in view. The benchmarks are vendor-run and unvalidated, and the open weights were committed within days rather than shipped at launch. M3 lands roughly level with the prior Opus 4.7 frontier on agentic work and trails the newer Opus 4.8 by ten to fourteen points on the comparable evals — not the agentic frontier, but a serious option at a small fraction of the closed price.
The broader signal is that open-weight competition is now setting the cost floor for agentic work, and pushing closed frontier on price even when it cannot match it on capability. The practical move is the same one that always wins: wait for independent results, run your own evals on the workloads you care about, and let the numbers you can verify decide.