AI DevelopmentNew Release10 min readPublished May 31, 2026

Open-weight, 1M context, agentic · 1/20th the compute of M2 · vendor-stated benchmarks only

MiniMax M3: 1M Context, Open-Weight, Agentic Frontier

MiniMax M3 launched on May 31, 2026, billed as the first open-weight model to fuse frontier coding, a 1M-token context window, and native multimodality in one release. The engine is MiniMax Sparse Attention, and the price is a fraction of closed frontier. The catch: every benchmark is vendor-run, and the weights had not shipped at launch.

DA
Digital Applied Team
Senior strategists · Published May 31, 2026
PublishedMay 31, 2026
Read time10 min
Sources8 primary + analysis
Compute at 1M context
1/20th
per-token vs M2 (vendor)
Decode speedup at 1M
15×
vs full-attention M2 (vendor)
Launch input price
$0.30
per 1M tokens (50% promo)
SWE-Bench Pro
59.0%
vendor-stated · trails Opus 4.8

MiniMax M3 launched on May 31, 2026, and the pitch is bold: the first open-weight release to hold frontier coding, a one-million-token context window, and native multimodality at the same time. The architecture behind it is MiniMax Sparse Attention, the launch price is a fraction of closed frontier, and the API was live day one. The honest framing matters just as much as the headline.

Two things temper the excitement, and a credible read has to surface both. First, every benchmark MiniMax published was run on its own infrastructure with no independent validation at launch. Second, the open weights were not actually available on launch day; MiniMax committed to releasing them within roughly ten days, and the license was unconfirmed. So the "fully open" story is, for now, a promise rather than a delivered fact.

This guide covers what shipped, the sparse-attention design that makes 1M context affordable, the vendor-stated benchmark picture, the awkward timing against Claude Opus 4.8, the pricing math that makes the token plans genuinely interesting, and a clear decision framework for who should adopt now versus wait.

Key takeaways
  1. 01
    Three capabilities in one open-weight model.MiniMax bills M3 as the first open-weight release to combine frontier coding, a 1M-token context window, and native multimodal input (images, video, computer use) in a single model.
  2. 02
    Sparse Attention is the headline engine.MiniMax Sparse Attention (MSA) selects relevant blocks of uncompressed key-values instead of running full quadratic attention, cutting per-token compute to a vendor-stated 1/20th of M2 at 1M-token context.
  3. 03
    Every benchmark is vendor-run.SWE-Bench Pro 59.0%, Terminal-Bench 2.1 66.0%, OSWorld-Verified 70.06%, BrowseComp 83.5, and MCP Atlas 74.2% were all produced on MiniMax infrastructure with no independent validation confirmed at launch.
  4. 04
    Level with Opus 4.7, behind Opus 4.8.M3's comparisons targeted Claude Opus 4.7. On directly comparable agent benchmarks it trails the three-days-older Opus 4.8 by roughly 10 to 14 points, while undercutting closed-frontier pricing dramatically.
  5. 05
    Open weights were pending, not shipped.At launch the weights were not yet on Hugging Face and the license was unconfirmed. MiniMax committed to an open-weight release within roughly ten days. Treat enterprise on-prem plans as a near-term promise.

01What ShippedAn API release, with open weights committed within days.

What went live on May 31 was the model behind an API, not a weights drop. M3 appeared same-day on OpenRouter under minimax/minimax-m3, exposed a 1M-token context window, and shipped with day-one compatibility for IDE integrations including Claude Code, Cursor, Roo Code, and Cline. The API uses a toggleable thinking mode: on for deep reasoning and long-horizon planning, off for low-latency completion.

MiniMax positions M3 as a clean version break from the M2 line, not a decimal increment. Its direct predecessor was M2.7, a self-evolving model that the company reported was handling a meaningful share of its internal reinforcement-learning workflow autonomously. M3 carries that agentic-first philosophy forward and adds native multimodality. If you followed the lineage, our MiniMax M2.7 and MiniMax M2.5 guides set the context for how aggressively this team has been iterating.

Live day one
M3 API
1M context · thinking mode toggle

Available via the MiniMax platform and on OpenRouter under minimax/minimax-m3 from launch day, with day-one support for Claude Code, Cursor, Roo Code, and Cline. The model card lists a 1M-token context window.

openrouter.ai/minimax/minimax-m3
Committed within days
Open weights
Hugging Face + GitHub · license TBC

MiniMax committed to publishing open weights and a technical report within roughly ten days of launch. At launch the weights were not yet available and the exact license was unconfirmed. Verify before planning any on-prem deployment.

Pending release · ~10-day window
Release snapshot
MiniMax M3 launched May 31, 2026 as an open-weight release committed within days. The API and OpenRouter listing went live same-day; the weights and technical report were promised for a follow-on window on Hugging Face and GitHub. Launch pricing on a 50% promotion is $0.30 / $1.20 per 1M tokens (input / output), with a standard rate of $0.60 / $2.40 after the promotion. Total parameter count was not disclosed at launch.

One detail builders should not over-read: M3's total parameter count is undisclosed. The M2.7 predecessor was a 229B-total / 9.8B-active mixture-of-experts model, but MiniMax has not confirmed M3 inherits those figures, so we treat the size as unknown rather than carrying forward old numbers. Always read the official model card and license text once the weights actually publish.

02The Three-Way JamWhy coding, context, and multimodality have been incompatible.

The reason this release reads as ambitious is that those three properties have historically pulled against each other. Quadratic attention scaling makes a genuinely usable million-token window expensive, which is why long-context models have leaned on compression tricks that trade away precision. Multimodality bolted on after the fact has tended to weaken visual reasoning rather than strengthen it. And frontier-level coding has typically demanded the kind of dense compute budgets that push long-context inference costs past what most teams will pay.

MiniMax's framing is that M3 breaks all three constraints at once: a sparse-attention design that keeps long context affordable, a training corpus that was multimodal from step zero, and agentic coding scores it claims rival closed frontier. Whether the model fully delivers on that promise is exactly the question independent benchmarks have not yet answered. The architecture is real and documented; the capability claims are vendor-stated.

Trained natively multimodal
Interleaved pretraining corpus
100T

MiniMax states M3 was pretrained on over 100 trillion tokens of natively interleaved text, image, and video data from step zero, rather than fitting a vision adapter onto a text model after the fact.

Vendor-stated
Document understanding
OmniDocBench (vendor)
Lead

MiniMax reports M3 scoring above Gemini 3.1 Pro on OmniDocBench document understanding, attributing the result to the multimodal training pipeline. No independent confirmation at launch.

vs Gemini 3.1 Pro
Visual-to-code
SVG-Bench (vendor)
Lead

On SVG-Bench, which measures turning a visual into code, MiniMax claims M3 surpasses Claude Opus 4.7. Video benchmarks were run on up to 1,024 frames, but no numeric video scores were published at launch.

vs Claude Opus 4.7

03MiniMax Sparse AttentionMSA: block-level selection of real key-values.

The technical centerpiece is MiniMax Sparse Attention (MSA). Rather than computing full quadratic self-attention across the whole sequence, MSA performs block-level selection on the key-value cache. The distinction MiniMax draws against latent-compression approaches matters: where some designs compress key-values into a smaller latent space, MSA selects relevant blocks of uncompressed grouped-query key-values. The claimed benefit is that precision and prefix-caching compatibility are preserved, because the model still attends to the actual stored representations rather than a lossy summary.

The kernel engineering is part of the story. MiniMax describes a "KV outer gather Q" pattern, where key-value blocks form the outer loop and every query that hits a given block is batched together so that block is read from memory exactly once, in contiguous rather than scattered access. MiniMax claims this runs more than four times faster than open-source sparse-attention alternatives such as Flash-Sparse-Attention or flash-moba. As with the rest of the architecture claims, that figure is vendor-stated.

MSA efficiency vs M2 at 1M-token context · vendor-stated

Source: MiniMax M3 blog (vendor-stated; not independently validated)
M2 full attention (baseline)Per-token compute at 1M-token context
100%
M3 with MSAPer-token compute at 1M context · vendor-stated
~5%
Prefill speedupM3 vs full-attention M2 at 1M · vendor-stated
>9×
Decode speedupM3 vs full-attention M2 at 1M · vendor-stated
>15×

Read the bars carefully. The compute figure is the model's claimed reduction to roughly one-twentieth of M2's per-token cost at one million tokens, and the speedups are the vendor's prefill and decode numbers on its own hardware. The earlier MiniMax teaser cited more precise figures of about 9.7 times prefill and 15.6 times decode; the final launch materials rounded these to more than nine and more than fifteen times. Either way, none of it has been reproduced by a third party, so treat it as a design claim worth testing rather than a settled result.

MSA operates on a standard GQA backbone but utilizes block-level selection on real, uncompressed Key-Values.— Elie Bakouch, Prime Intellect (AI training infrastructure)

04BenchmarksThe numbers, every one of them vendor-run.

MiniMax published a strong agentic benchmark sheet. Before quoting any of it, set expectations: these scores were produced on MiniMax infrastructure and had no independent validation at launch. If you are unsure what these evaluations actually measure, our SWE-Bench Pro and Terminal-Bench 2.1 guide breaks down the methodology. The headline figures: SWE-Bench Pro 59.0%, Terminal-Bench 2.1 66.0%, OSWorld-Verified 70.06% for computer-use task completion, BrowseComp 83.5 for autonomous web search, and MCP Atlas 74.2% for tool use.

M3 benchmark sheet · vendor-stated, not independently validated

Source: MiniMax M3 blog + VentureBeat (all M3 scores vendor-stated)
SWE-Bench ProVendor 59.0% · ahead of DeepSeek V4 Pro 55.4%
59.0%
Open lead
Terminal-Bench 2.1Vendor 66.0% · roughly level with Opus 4.7
66.0%
≈ Opus 4.7
BrowseCompVendor 83.5 · claims to exceed Opus 4.7's 79.3
83.5
Vendor lead
MCP AtlasVendor 74.2% · narrowly over DeepSeek V4 Pro 73.6%
74.2%
Narrow lead
OSWorld-VerifiedVendor 70.06% · trails Opus 4.8 at 83.4%
70.06%
Opus 4.8
SWE-fficiencyVendor 34.8% · harder agentic efficiency eval
34.8%
Lower band
KernelBench HardVendor 28.8% · low-level kernel synthesis
28.8%
Lower band
M3 (vendor-stated)Where a comparison model leads

Two long-horizon autonomy demos give a more textured sense of what M3 can attempt. In one, MiniMax reports the model ran roughly twelve hours without human intervention, produced 18 commits and 23 experimental figures, and reproduced an ICLR 2025 award-winning paper with a vendor-stated reproduction score of 0.650. In another, M3 reportedly improved NVIDIA Hopper FP8 GEMM hardware utilization from 7.6% to 71.3% across 147 submissions over about a day with no reference solution, where comparable models gave up after a few dozen attempts. Both demos are vendor-reported.

Every one of those numbers is vendor-run, on MiniMax's own infrastructure.— Thomas Wiegold, independent researcher

That caveat is the editorial spine of this release. A useful contrast: other recent frontier launches had independent intelligence-index numbers within roughly a day. M3 did not, at launch. The disciplined move is to wait for independent leaderboard and arena results before treating any of these figures as production-grade evidence, then run your own evaluation on the prompts you actually care about.

05The Timing GapM3 was benchmarked against the wrong Opus.

Here is the framing most day-of coverage either missed or buried. M3 launched on May 31, three days after Claude Opus 4.8 shipped on May 28. MiniMax's comparisons were set against Claude Opus 4.7, the pre-Opus-4.8 frontier. On the agent benchmarks where a direct comparison exists, the newer Opus 4.8 leads M3 by double-digit margins: SWE-Bench Pro 69.2% versus 59.0%, Terminal-Bench 2.1 74.6% versus 66.0%, and OSWorld-Verified 83.4% versus 70.06%.

That is not a reason to dismiss M3 — it is a reason to frame it correctly. The accurate read is that M3 lands roughly level with Opus 4.7 on agentic work while costing a small fraction of the closed price, and trails the three-days-older Opus 4.8 by ten to fourteen points on directly comparable evals. Stated that way, the story stays compelling without overclaiming. For the closed-frontier reference point, see our Claude Opus 4.8 release coverage.

M3 vs the model it did not compare against · Claude Opus 4.8

Source: VentureBeat (M3 scores vendor-stated; Opus 4.8 from independent benchmarks)
SWE-Bench ProM3 vendor 59.0% · Opus 4.8 69.2%
69.2%
Opus 4.8 +10.2
Terminal-Bench 2.1M3 vendor 66.0% · Opus 4.8 74.6%
74.6%
Opus 4.8 +8.6
OSWorld-VerifiedM3 vendor 70.06% · Opus 4.8 83.4%
83.4%
Opus 4.8 +13.3
M3 (vendor-stated)Claude Opus 4.8 (independent)
The honest read
M3 is not the new agentic frontier; Claude Opus 4.8, released three days earlier, leads it by double digits on every directly comparable agent benchmark. What M3 is: a credible open-weight option that lands near the prior frontier on agentic work at a small fraction of closed-frontier pricing. Both of those statements are true at the same time.

06Pricing & PlansWhere M3 is genuinely disruptive.

The cost story is where M3 makes its strongest case. Launch pricing is $0.30 per million input tokens and $1.20 per million output on a limited-time 50% promotion, with a standard rate of $0.60 and $2.40 after. At standard rates that is roughly 8% to 20% of leading closed-frontier per-token pricing. Requests up to 512K input tokens bill at the standard rate; longer contexts cost more, with the exact surcharge not publicly disclosed at launch. For broader context on how this sits against the rest of the market, see our API pricing comparison.

The subscription tiers are the more interesting wrinkle. MiniMax offers shared multimodal quota across text, image, speech, and music: a Plus tier at roughly 1.7 billion tokens a month with 3 to 4 concurrent agents, a Max tier at roughly 5.1 billion tokens with 4 to 5 concurrent agents, and an Ultra tier at roughly 9.8 billion tokens with 6 to 7 concurrent agents. For high-volume agentic builders the breakeven math is striking, which is what the table below works out.

Plus subscription
Tokens per month
1.7B

Around 1.7 billion shared tokens a month with 3 to 4 concurrent agents. At the $0.60 standard input rate, 1.7 billion input tokens alone would run on the order of a thousand dollars on pay-as-you-go.

Entry tier
Max subscription
Tokens per month
5.1B

Around 5.1 billion shared tokens with 4 to 5 concurrent agents, plus a small daily allowance of video clips. The middle tier for teams running several agents in parallel through the day.

Team tier
Ultra subscription
Tokens per month
9.8B

Around 9.8 billion shared tokens with 6 to 7 concurrent agents and a larger daily video allowance. Aimed at heavy multi-agent workloads where pay-as-you-go would be far more expensive.

Heavy tier
Plus · ~$20/mo
Solo developer, multi-agent

~1.7B tokens and 3-4 concurrent agents. The equivalent pay-as-you-go input cost alone is on the order of ~$1,000/month at standard rates, so the subscription is a large discount for steady high-volume use. Confirm billing cadence before committing.

Subscription wins for steady use
Max · ~$50/mo
Small team, parallel agents

~5.1B tokens and 4-5 concurrent agents. The natural fit when several builders or pipelines hit M3 through the day and the workload is predictable enough to favor a flat rate over metered spend.

Subscription for predictable load
Ultra · ~$120/mo
Heavy multi-agent workloads

~9.8B tokens and 6-7 concurrent agents. For sustained automation at scale this dwarfs pay-as-you-go pricing, but only if your usage actually approaches the quota each month.

Subscription for sustained scale
Pay-as-you-go
Bursty or unpredictable usage

$0.30/$1.20 promo or $0.60/$2.40 standard per 1M tokens. Best when volume is low or spiky and you would not get near a subscription quota. Watch the 512K long-context billing threshold for big-document workloads.

PAYG for low or spiky volume
Confirm before you commit
Subscription pricing references reflect launch-day reporting, and at least one tier was reported as annually billed rather than month-to-month. Treat the dollar figures and quotas as indicative and verify the current terms and billing cadence on the MiniMax platform before purchasing.

07The Open-Weight CaveatA Day-0 promise, not a delivered reality.

The "open-weight" label is doing a lot of work in the launch messaging, so be precise about it. On May 31 the weights were not on Hugging Face, and the license was unconfirmed — the candidates named in coverage were a permissive open license, but nothing was settled. MiniMax committed to publishing the weights and a technical report within roughly ten days. For anyone planning on-prem deployment, fine-tuning, or sovereignty-bound use, that means the open story is a near-term commitment to verify, not a capability you could act on at launch.

There is also a governance consideration that belongs in any enterprise evaluation, even for API use. As a Chinese company, MiniMax operates under China's 2017 National Intelligence Law, which obligates domestic firms to support and cooperate with state intelligence work. That applies to API-routed prompts regardless of where the user sits. It is not a reason to rule M3 out, but teams handling sensitive or regulated data should account for it alongside the usual model-selection criteria.

Verification checklist
Before treating M3 as production-ready, confirm three things directly from primary sources: that the open weights have actually published and under which license, that an independent evaluation corroborates the vendor benchmark numbers for your workload class, and that the current API pricing and long-context billing threshold match what you budgeted.

08Who Should SwitchA decision framework for builders and teams.

M3 is not a universal default, but it is a strong fit for specific profiles today and a wait-and-watch for others. The deciding factors are usage volume, tolerance for unvalidated benchmarks, and whether you need open weights now or can act on the API while they ship.

High-volume agent builder
Cost-sensitive parallel agents

If you run many concurrent agents on a predictable workload, the token plans and per-token pricing make M3 hard to ignore. Benchmark it on your own tasks against your current default, then decide on the subscription tier that matches real usage.

Pilot M3 now
Long-context multimodal work
Documents, images, video at scale

Native multimodality plus a 1M-token window at this price is a genuinely differentiated combination. Validate the multimodal quality on your data, since the supporting scores are vendor-stated and the numeric video benchmarks were not published.

Evaluate on your corpus
Top-of-stack agentic coding
Maximum capability, cost secondary

If you need the strongest agentic coding available and price is a secondary concern, the current independent picture favors closed frontier such as Claude Opus 4.8, which leads M3's vendor numbers on the comparable benchmarks.

Stay with closed frontier
Sovereignty / regulated data
On-prem or compliance-bound

Wait. The open weights and license were not confirmed at launch, and the data-governance considerations need accounting for. Revisit once the weights publish, the license is known, and independent evaluations land.

Wait for weights + audits

For most agencies and engineering teams the right first step is a scoped evaluation: run M3 on the prompts and repositories you actually care about, measure token spend and latency against your current default, and decide per-workload rather than per-headline. If you want help structuring that comparison, our AI digital transformation engagements start with exactly this kind of model-selection eval, and our development team can wire the winning model into your agent stack.

09ConclusionA real release with an asterisk.

The shape of open frontier, May 2026

A compelling open-weight option — once the verification arrives.

MiniMax M3 is an ambitious release that claims something genuinely new for open models: frontier coding, a million-token context window, and native multimodality fused into one model, powered by a sparse attention design that makes long context affordable rather than aspirational. The pricing, especially the token-plan breakeven math, is the most immediately actionable part of the story.

The honest framing keeps two facts in view. The benchmarks are vendor-run and unvalidated, and the open weights were committed within days rather than shipped at launch. M3 lands roughly level with the prior Opus 4.7 frontier on agentic work and trails the newer Opus 4.8 by ten to fourteen points on the comparable evals — not the agentic frontier, but a serious option at a small fraction of the closed price.

The broader signal is that open-weight competition is now setting the cost floor for agentic work, and pushing closed frontier on price even when it cannot match it on capability. The practical move is the same one that always wins: wait for independent results, run your own evals on the workloads you care about, and let the numbers you can verify decide.

Evaluate open-weight frontier the honest way

Pick the model your workload actually needs — evidence first.

Our team helps businesses evaluate, benchmark, and operate frontier and open-weight models — including MiniMax M3 — for agentic coding, long-context retrieval, and multimodal workloads, delivered in days not quarters.

Free consultationExpert guidanceTailored solutions
What we work on

Model-selection engagements

  • M3 benchmarking against closed frontier on your corpus
  • Multimodal and long-context workload evaluation
  • Token-plan vs pay-as-you-go cost modeling
  • Multi-vendor routing — M3 / Opus 4.8 / GPT-5.5 / Gemini
  • Governance & verification before production rollout
FAQ · MiniMax M3 guide

The questions we get every week.

MiniMax M3 is an agentic frontier model that launched on May 31, 2026. MiniMax bills it as the first open-weight release to combine frontier-level coding, a 1-million-token context window, and native multimodal input (images, video, and computer use) in a single model. It went live the same day via the MiniMax platform and on OpenRouter under minimax/minimax-m3, with day-one compatibility for IDE integrations including Claude Code, Cursor, Roo Code, and Cline. The model's total parameter count was not disclosed at launch, and the open weights were committed for release within roughly ten days rather than shipped on day one.