Kimi K2.7-Code is Moonshot AI's latest coding model, released June 12, 2026 as an open-source, coding-focused successor to Kimi K2.6 — with weights live on Hugging Face under a Modified MIT license. Moonshot's announcement leads with three numbers: +21.8% over K2.6 on Kimi Code Bench v2, +11.0% on Program Bench, and +31.5% on MLS Bench Lite, alongside a claim that matters at least as much for production use — roughly 30% lower reasoning-token usage than K2.6.
The release is deliberately paired with distribution. K2.7-Code is available immediately through the Kimi API at platform.moonshot.ai and through Kimi Code, Moonshot's terminal-first coding agent. A “6x High-Speed Mode” is announced as coming soon. That pairing is the strategic read: Moonshot is not just shipping weights, it is shipping a subscription coding platform around them — the same model-plus-plan playbook Anthropic runs with Claude Code.
This post covers what shipped, the benchmark deltas and their first-party caveat, the token-efficiency angle, the long-horizon coding upgrades, the Kimi Code CLI and its membership plans, API access, and where K2.7-Code sits in the June 2026 field. For the foundation this release builds on, see our Kimi K2.6 deep dive from April.
- 01A coding-first release, not a general flagship.K2.7-Code is built on K2.6 and tuned specifically for coding and agentic tasks — code generation, debugging, tool use, and multi-step programming workflows. Moonshot positions it as the coding successor; K2.6 remains the general-purpose anchor of the K2 line.
- 02The headline deltas are vendor-run benchmarks.Kimi Code Bench v2 (+21.8%, 50.9 to 62.0), Program Bench (+11.0%), and MLS Bench Lite (+31.5%) are Moonshot's own benchmark suites. The deltas are directionally meaningful but are first-party numbers — wait for SWE-Bench Pro, Terminal-Bench, and Aider re-runs before treating them as settled.
- 03The 30% reasoning-token cut may be the real story.Moonshot reports roughly 30% lower reasoning-token usage versus K2.6 with less overthinking. For agentic coding workloads where output tokens dominate the bill, a 30% thinking-token reduction compounds across every step of a long run — it is a cost and latency claim, not just a quality claim.
- 04Open weights under Modified MIT keep the self-host path.Both the code repository and the model weights ship under a Modified MIT license on Hugging Face. For regulated industries and teams with data-residency constraints, K2.7-Code continues the K2 line's status as the most permissively licensed frontier-adjacent coding stack.
- 05The platform pairing is the strategic signal.K2.7-Code launches inside Kimi Code — Moonshot's open-source terminal agent — with membership plans listed from $19/month and a 6x High-Speed Mode announced as coming soon. Moonshot is competing on the full stack: model, CLI, and subscription economics.
01 — Release OverviewWhat Moonshot shipped on June 12 — and what it left open.
Moonshot AI announced Kimi K2.7-Code on June 12, 2026 via kimi.com/code and published the full model weights to Hugging Face the same day. The model card describes K2.7-Code as a coding-focused agentic model built on Kimi K2.6, with substantial improvements on real-world long-horizon coding tasks and reduced thinking-token usage. It inherits the K2.6 lineage — the 1T-class mixture-of-experts architecture that shipped in April with 12-hour autonomous runs and 300-agent swarms.
Three things are confirmed at launch: open weights under a Modified MIT license, availability through the Kimi API and Kimi Code, and the vendor-reported benchmark and efficiency deltas covered below. Two things are not: standalone K2.7-Code API pricing had not been published at the time of writing, and the 6x High-Speed Mode is “coming soon” with no date or pricing attached. Treat both as open items rather than assuming K2.6's numbers carry over.
Announced and open-sourced same day
Announcement via kimi.com/code; weights live at huggingface.co/moonshotai/Kimi-K2.7-Code. Code repository and weights both ship under a Modified MIT license.
Built on April's general flagship
K2.7-Code is the coding-focused successor to K2.6 (April 20, 2026) — the 1T-class MoE that set open-source SOTA on SWE-Bench Pro (58.6) at its release. K2.6 remains the general-purpose model in the line.
Kimi API + Kimi Code
Available today via the Kimi API at platform.moonshot.ai and inside Kimi Code, Moonshot's open-source terminal coding agent. Self-hosting via the Hugging Face weights is the third path.
High-Speed Mode announced
Moonshot says a 6x High-Speed Mode is coming soon. No launch date, pricing, or technical details were disclosed at release — treat it as a roadmap item, not a shipping feature.
Moonshot's release notes, verbatim: improved coding and agent performance over K2.6 — +21.8% on Kimi Code Bench v2, +11.0% on Program Bench, and +31.5% on MLS Bench Lite; reasoning efficiency with “less overthinking” and 30% lower reasoning-token usage compared to K2.6; and improved instruction following with higher end-to-end coding task success rates. Source: kimi.com/code.
02 — BenchmarksThe benchmark deltas — and the first-party caveat.
The three headline numbers are real improvements on real test suites — but all three suites are Moonshot's own. Kimi Code Bench v2, Program Bench, and MLS Bench Lite are first-party benchmarks, which means the deltas measure progress against Moonshot's internal definition of coding quality, not against the community standards the industry ranks models by. That does not make them meaningless: a 50.9-to-62.0 jump on the vendor's hardest internal coding suite is the kind of generational delta labs rarely fake, because their own users verify it within days. It does mean the cross-vendor comparisons have to wait.
The early third-party signal is promising on tool use specifically. According to an early independent analysis of the K2.7-Code model card, the model scores 81.1% on MCPMark Verified — a benchmark testing correct tool invocation via the Model Context Protocol — ahead of Claude Opus 4.8's 76.4% on the same suite. The same analysis notes Opus 4.8 and GPT-5.5 still lead on raw code generation. If that pattern holds under wider testing, K2.7-Code's fine-tuning bet is clear: win the agentic loop (tool calls, MCP, multi-step execution) rather than the single-shot completion.
What to watch next: SWE-Bench Pro and Terminal-Bench 2.0 re-runs. K2.6 posted an open-source SOTA 58.6 on SWE-Bench Pro in April — if K2.7-Code's internal +21.8% translates even partially to SWE-Bench Pro, it would pressure the closed frontier on the benchmark that matters most for repo-scale work. As of publication, no independent leaderboard had re-run with K2.7-Code — qualify any number you see today accordingly.
K2.7-Code vs K2.6 — vendor-reported deltas
Source: Moonshot AI K2.7-Code release notes (kimi.com/code), June 12, 2026 — all three are vendor-run benchmark suites03 — Reasoning Efficiency30% fewer reasoning tokens is a cost claim, not just a quality claim.
The most operationally interesting line in the announcement is the efficiency one: roughly 30% lower reasoning-token usage than K2.6, framed as “less overthinking.” Reasoning models burn a large share of their output budget on thinking tokens — the internal chain-of-thought that precedes every tool call and code edit. In an agentic coding session that runs hundreds or thousands of steps, that overhead compounds: every plan, every retry, every verification pass pays the thinking tax again.
A 30% reduction, if it holds in practice, lands in three places at once. First, cost — output tokens are typically the expensive side of any API price card, and thinking tokens bill as output. Second, latency — fewer thinking tokens per step means faster steps, which matters more in interactive CLI sessions than in batch runs. Third, run depth — at a fixed context and budget, a model that thinks 30% leaner can take more steps before hitting limits, which is exactly the constraint long-horizon agentic work runs into first. Our June 2026 AI coding tool pricing guide covers how token economics translate to per-seat costs across the major stacks.
The caveat mirrors the benchmark caveat: 30% is Moonshot's measurement on Moonshot's workloads. Token efficiency is workload-dependent — a model that thinks leaner on routine edits may still think long on novel problems. Teams evaluating K2.7-Code should measure thinking-token share on their own repos before building cost models around the 30% figure.
04 — Long-Horizon CodingLong-horizon coding: finishing tasks, not just starting them.
The third pillar of the announcement is long-horizon coding: improved instruction following and higher end-to-end task success rates. This builds directly on what made K2.6 notable in April — 12-hour autonomous coding runs, 4,000+ tool calls, and agent swarms scaling to 300 parallel sub-agents. K2.6 proved the K2 line could sustain very long sessions; K2.7-Code's pitch is that more of those sessions now end in a merged, working result instead of a long transcript that needs human rescue.
End-to-end success rate is arguably the metric that separates agentic coding tools in 2026. Benchmark suites score isolated patches; production teams care whether a multi-hour refactor lands without intervention. Instruction following is the underrated half of that equation — long runs fail less often from bad code than from drift, where the agent gradually departs from the original constraints. Moonshot claiming gains on both fronts, paired with the reasoning-token cut, describes a model tuned to stay on task longer for less money. The K2 line's trajectory here has been consistent since K2.5 — the open-weight MoE that Cursor built Composer 2 on — each release has pushed run length and autonomy rather than chat-benchmark scores.
05 — Platform PairingKimi Code: the CLI and the plans behind the model.
K2.7-Code does not ship into a vacuum — it ships into Kimi Code, Moonshot's terminal-first coding agent, and that pairing is the strategic half of the release. The Kimi Code CLI is open source on GitHub, rewritten in TypeScript and distributed via npm. It runs autonomous multi-step workflows from the terminal — reading and editing code, running shell commands, searching files, and fetching web pages — with built-in coder, explore, and plan subagents that run in isolated contexts, per MarkTechPost's coverage of the CLI release. The shape will be familiar to anyone who has used Claude Code — our terminal coding tools comparison maps the category.
Access runs through Kimi's membership tiers. Per Kimi's pricing page, plans at the time of writing start at $19/month (Moderato) with Kimi Code access included, rising through Allegretto ($39), Allegro ($99), and Vivace ($199) — higher tiers adding larger Kimi Code credit allowances, Agent Swarm scaling toward 300 parallel sub-agents, and cloud deployment options. Pricing and tier contents can change; verify against the live page before committing a team.
The entry point for individual developers
Kimi chat with the latest K2-line models, agent credits, Deep Research, and Kimi Code access. The tier most individual developers should trial first — enough to evaluate K2.7-Code against a real repo.
More credits for daily-driver use
Larger Kimi Code credit allowance for developers using the CLI as a daily driver rather than an occasional tool. The step up when Moderato's credits run out mid-month.
Agent Swarm at working scale
Unlocks larger Agent Swarm parallelism and bigger quotas — the tier aimed at parallelizable work like batch refactors and large-scale generation where sub-agent fan-out pays for itself.
The full-scale tier
Top listed tier: maximum Kimi Code credits, Agent Swarm toward the 300-sub-agent ceiling, and cloud deployment options. Comparable in price to Claude Max-class plans — compare on credits per dollar for your workload.
06 — API & SpeedAPI access, pricing TBD, and the 6x High-Speed Mode.
For teams integrating directly, K2.7-Code is available via the Kimi API at platform.moonshot.ai. Standalone K2.7-Code pricing had not been published at the time of writing. For reference, K2.6 has listed at roughly $0.67-$0.95 per million input tokens and $3.39-$4.00 per million output tokens across OpenRouter and the official platform, with context caching discounting repeated input substantially — but do not assume those rates carry over to K2.7-Code until Moonshot publishes its card.
Two integration notes. First, Moonshot's platform documentation has historically supported using Kimi models inside third-party agents — including Claude Code, Cline, and Roo Code — via compatible API endpoints, which lowers the switching cost of a trial to an environment-variable change. Second, the announced 6x High-Speed Mode signals Moonshot is following the speed-tier pattern the closed labs established — Anthropic's fast mode on Opus 4.8 runs 2.5x speed at 2x price. No pricing or date exists for Moonshot's version yet; if the 6x figure refers to output speed at a viable price, it would be a meaningful differentiator for interactive agentic sessions.
07 — Competitive ContextWhere K2.7-Code lands in the June 2026 coding-model field.
K2.7-Code enters a crowded quarter. Anthropic shipped Opus 4.8 on May 28; Alibaba's Qwen 3.7 Max took Terminal-Bench 2.0 and SWE-Bench Pro wins in late May as a closed model; and DeepSeek's V4 preview reset the efficiency bar for open weights in April. The matrix below positions the realistic alternatives a team would evaluate this month — with the caveat that K2.7-Code's independent benchmark placements are days away, not in hand.
The open-weight coding specialist · new today
Released June 12, 2026. Open weights, Modified MIT, coding-focused on the K2.6 1T-class MoE lineage. Vendor-reported: +21.8% Kimi Code Bench v2, 30% fewer reasoning tokens, improved end-to-end task success. Early third-party signal: 81.1% MCPMark Verified (ahead of Opus 4.8's 76.4 on that suite). API pricing TBD; Kimi Code plans from $19/mo listed. Pick when: you want frontier-adjacent agentic coding with a self-host path and low platform cost.
The closed frontier default · May 28, 2026
Anthropic's most capable general-access model: $5/$25 per Mtok, 1M context, Dynamic Workflows orchestration in Claude Code, fast mode at 2x price for 2.5x speed. Still the reference point for raw code generation and the deepest agent harness ecosystem. Pick when: maximum capability and tooling maturity outweigh cost and weights access.
The closed benchmark leader · May 20, 2026
Alibaba's flagship: 1M context, $2.50/$7.50 per Mtok, launch wins on Terminal-Bench 2.0 (69.7), SWE-Bench Pro (60.6), and MCP-Atlas (76.4) against Opus 4.6 Max baselines. Closed weights — no self-hosting. Pick when: you want top coding benchmarks at mid-tier pricing and weights access doesn't matter.
The efficiency open-weight rival · Apr 24, 2026
Two-model preview: V4-Pro (1.6T/49B active) and V4-Flash (284B/13B), both 1M context with dramatic FLOPs and KV-cache reductions vs V3.2. The open-weight alternative optimized for context length and inference economics rather than coding-specific tuning. Pick when: 1M context and serving efficiency dominate your requirements.
08 — Action GuideWhat dev teams and agencies should do now.
Run a contained trial this week. The evaluation cost is unusually low: a $19/month Moderato plan or an API key, the open-source CLI, and one real mid-complexity task — a multi-file refactor or a failing-test fix — on a branch. Score it on end-to-end completion without intervention, since that is the axis Moonshot claims to have improved. Compare against your current stack on the same task; our terminal tools comparison provides a scoring frame.
Measure the token-efficiency claim yourself. If you adopt K2.7-Code for agentic work, instrument thinking-token share per session before trusting the 30% figure in cost models. Token efficiency varies by workload, and the difference between a vendor benchmark and your monorepo is exactly where budget surprises live.
Treat the self-host path as the strategic option, not the default. Modified MIT weights make K2.7-Code one of the few frontier-adjacent coding models a regulated team can run inside its own perimeter — but a 1T-class MoE is a serious serving commitment. Most teams should start on the hosted API and keep self-hosting as negotiating leverage and a compliance fallback. For organizations weighing that build-out, our AI transformation services cover model selection and deployment strategy, and our web development team ships with these agentic stacks daily.
Wait for independent numbers before re-platforming. A migration decision made on vendor benchmarks alone is premature. SWE-Bench Pro, Terminal-Bench 2.0, and Aider leaderboard re-runs typically land within one to two weeks of a major open-weight release. The trial can start today; the platform decision should wait for the third-party data.
The open-source coding race is now a platform race.
K2.7-Code's benchmark deltas will get the headlines, but the durable story is structural. Moonshot shipped a coding-specialized open-weight model, an open-source TypeScript CLI, a subscription ladder from $19 to $199 a month, and a speed tier on the roadmap — on the same day. That is the full Claude Code playbook, executed with open weights and lower listed prices. Whether the model itself holds up against Opus 4.8 and Qwen 3.7 Max on independent suites is a question the next two weeks will answer.
The two claims worth tracking are the unglamorous ones: the 30% reasoning-token reduction and the end-to-end task success rate. Cost-per-completed-task — not benchmark position — is what decides which agentic stack a team standardizes on, and both claims attack that metric directly. If they survive independent verification, K2.7-Code becomes the strongest open-weight answer yet to the closed coding frontier. If they don't, K2.6 remains an excellent model and Moonshot remains a lab that ships every quarter. Either way, the price of finding out is one evening and $19.