AI Development8 min readFeatured Guide

Google Gemini 3.1 Pro: Benchmarks, Pricing & Guide

Gemini 3.1 Pro scores 77.1% on ARC-AGI-2 and 2887 Elo on LiveCodeBench at $2/$12M tokens. Full benchmarks, pricing, and competitive comparison guide.

Digital Applied Team
February 19, 2026
8 min read
77.1%

ARC-AGI-2 Score

2,887

LiveCodeBench Pro Elo

$2/$12

Per 1M Tokens (Input/Output)

1M

Context Window (Tokens)

Key Takeaways

2x reasoning leap: ARC-AGI-2 jumps from 31.1% to 77.1%, the largest single-generation reasoning gain among frontier models.
Benchmark leader: #1 on 12+ of 18 tracked benchmarks, dominating coding, reasoning, and agentic tasks.
Same price, more power: $2/$12 per 1M tokens — identical to Gemini 3 Pro, making it the best value flagship model.
Not unbeatable everywhere: Opus 4.6 edges out on SWE-Bench Verified and expert tasks; GPT-5.3-Codex leads specialized coding.
Available now in preview: Accessible via AI Studio, AntiGravity, Vertex AI, Gemini CLI, and the Gemini app.

Google released Gemini 3.1 Pro on February 19, 2026 — their most capable model yet, described as "designed for tasks where a simple answer isn't enough." It delivers a 2x+ reasoning performance boost over Gemini 3 Pro, tops most major benchmarks, and maintains the same $2/$12 pricing. This is Google's strongest play for the frontier AI crown.

The numbers speak for themselves: 77.1% on ARC-AGI-2 (up from 31.1%), 2887 Elo on LiveCodeBench Pro, 94.3% on GPQA Diamond, and #1 rankings on 12 of 18 tracked benchmarks. Gemini 3.1 Pro is the first ".1" increment between major Gemini versions — Google previously used ".5" for mid-cycle updates — and the jump in capability justifies the naming change.

What's New in Gemini 3.1 Pro

Gemini 3.1 Pro represents a fundamental shift in how Google delivers mid-cycle updates. Previous Gemini versions used ".5" increments (e.g., 2.0 Flash), but the ".1" naming signals a tighter, more targeted improvement cycle focused on reasoning depth and agentic capability rather than broad architecture changes.

The core improvement is a 2x+ reasoning boost. On ARC-AGI-2 — the benchmark that tests novel problem-solving without memorized patterns — Gemini 3.1 Pro scores 77.1% compared to Gemini 3 Pro's 31.1%. That 46 percentage point jump is the largest single-generation reasoning gain seen in any frontier model family. Google attributes this to more efficient thinking, where the model extracts more insight per compute token during its reasoning chain.

What's New
Key improvements over Gemini 3 Pro
  • 77.1% ARC-AGI-2 (up from 31.1%)
  • 2887 Elo on LiveCodeBench Pro (+18%)
  • New "Medium" thinking level
  • Improved agentic and SWE capabilities
Unchanged
What stays the same from Gemini 3 Pro
  • $2/$12 per million tokens
  • 1M token context window
  • 64K token output limit
  • Multimodal input support

Complete Benchmark Breakdown

Google published results across 19 benchmarks covering academic reasoning, coding, agentic tasks, multimodal understanding, and knowledge. The comparison includes Gemini 3 Pro, Claude Sonnet 4.6, Claude Opus 4.6, GPT-5.2, and GPT-5.3-Codex where available. Gemini 3.1 Pro leads on more benchmarks than any other model.

Reasoning & Knowledge

Benchmark3.1 Pro3 ProOpus 4.6GPT-5.2
HLE (No Tools)44.4%37.5%40.0%34.5%
HLE (Search+Code)51.4%45.8%53.1%45.5%
ARC-AGI-277.1%31.1%68.8%52.9%
GPQA Diamond94.3%91.9%91.3%92.4%
MMMLU92.6%91.8%91.1%89.6%
MMMU Pro80.5%81.0%73.9%79.5%
BrowseComp85.9%59.2%84.0%65.8%

Coding & Software Engineering

Benchmark3.1 Pro3 ProOpus 4.6GPT-5.2
SWE-Bench Verified80.6%76.2%80.8%80.0%
SWE-Bench Pro (Public)54.2%43.3%55.6%
LiveCodeBench Pro (Elo)288724392393
Terminal-Bench 2.068.5%56.9%65.4%54.0%
SciCode59%56%52%52%

Agentic Tasks

Benchmark3.1 Pro3 ProSonnet 4.6Opus 4.6
APEX-Agents33.5%18.4%29.8%
GDPval-AA (Elo)1317119516331606
τ²-bench Retail90.8%85.3%91.7%91.9%
τ²-bench Telecom99.3%98.0%97.9%99.3%
MCP Atlas69.2%54.1%61.3%59.5%
MRCR v2 128k84.9%77.0%84.9%84.0%

The standout result is ARC-AGI-2 — Gemini 3.1 Pro's 77.1% represents a 2.5x improvement over Gemini 3 Pro and exceeds Opus 4.6 by over 8 percentage points. On LiveCodeBench Pro, the 2887 Elo rating places it significantly ahead of both GPT-5.2 (2393) and Gemini 3 Pro (2439). The MCP Atlas score of 69.2% also demonstrates strong tool-use coordination, leading all models tested.

Where Gemini 3.1 Pro Leads (and Where It Doesn't)

Where Gemini 3.1 Pro Dominates

Gemini 3.1 Pro holds the #1 position on at least 12 of 18 tracked benchmarks. Its strongest leads are in novel reasoning (ARC-AGI-2: 77.1%, 12% ahead of Opus 4.6), competitive coding (LiveCodeBench Pro: 2887 Elo, 21% above GPT-5.2), graduate-level science (GPQA Diamond: 94.3%), scientific coding (SciCode: 59%), autonomous agent tasks (APEX-Agents: 33.5%), tool coordination (MCP Atlas: 69.2%), and web research (BrowseComp: 85.9%).

Reasoning

77.1% ARC-AGI-2, 94.3% GPQA Diamond, 44.4% HLE (no tools). Best-in-class on novel problem-solving and graduate-level science.

Coding

2887 Elo LiveCodeBench, 80.6% SWE-Bench, 68.5% Terminal-Bench. Dominates competitive coding and matches top models on SWE tasks.

Agentic

33.5% APEX-Agents, 69.2% MCP Atlas, 99.3% telecom tool use. Best autonomous agent and tool coordination performance.

Where Competitors Still Lead

Gemini 3.1 Pro is not the best everywhere. Claude Sonnet 4.6 and Opus 4.6 dominate GDPval-AA expert tasks (1633 and 1606 Elo respectively vs 1317 for Gemini 3.1 Pro). Opus 4.6 narrowly leads SWE-Bench Verified (80.8% vs 80.6%) and Humanity's Last Exam with tools (53.1% vs 51.4%). GPT-5.3-Codex leads specialized coding on Terminal-Bench 2.0 (77.3%) and SWE-Bench Pro (56.8%).

AreaLeaderScore3.1 Pro
GDPval-AA (Expert Tasks)Sonnet 4.61633 Elo1317 Elo
SWE-Bench VerifiedOpus 4.680.8%80.6%
Terminal-Bench 2.0GPT-5.3-Codex77.3%68.5%
SWE-Bench ProGPT-5.3-Codex56.8%54.2%
HLE (Search+Code)Opus 4.653.1%51.4%

The pattern is clear: Gemini 3.1 Pro is the best general-purpose model available, but specialized models still win in their niches. Claude excels at expert office tasks and real-world SWE, while GPT-5.3-Codex is the specialist for dedicated coding workflows.

Pricing and Value Analysis

Gemini 3.1 Pro maintains the same pricing as Gemini 3 Pro — a massive performance upgrade at zero additional cost. At $2 per million input tokens and $12 per million output tokens, it's significantly cheaper than Claude Opus 4.6 ($15/$75) and competitive with Sonnet 4.6 ($3/$15). Context caching can reduce costs by up to 75%.

ModelInput (per 1M)Output (per 1M)Context
Gemini 3.1 Pro$2.00$12.001M tokens
Gemini 3.1 Pro (>200K)$4.00$18.001M tokens
Claude Sonnet 4.6$3.00$15.001M tokens
Claude Opus 4.6$15.00$75.001M tokens
GPT-5.2$2.50$10.001M tokens

$2 / $12

Input / Output per 1M tokens

Up to 75%

Savings with Context Caching

7.5x Cheaper

Than Claude Opus 4.6 (Input)

The value proposition is clear: Gemini 3.1 Pro leads on more benchmarks than Opus 4.6 while costing 7.5x less on input and 6.25x less on output. Even against the similarly priced GPT-5.2 ($2.50 input), Gemini 3.1 Pro delivers meaningfully better results on most benchmarks. For teams optimizing cost-to-performance, this is the strongest option available.

Where to Access Gemini 3.1 Pro

Gemini 3.1 Pro is available in preview across Google's full ecosystem, with general availability coming soon. Developers can start building immediately through multiple platforms.

  • Google AI Studio — Free-tier access with rate limits, ideal for prototyping and experimentation
  • Google AntiGravity IDE — Google's agentic development environment, purpose-built for AI-assisted coding
  • Vertex AI — Enterprise deployment with GCP infrastructure, SLAs, and compliance
  • Gemini CLI — Terminal-based access for developers who prefer command-line workflows
  • Android Studio — Integrated access for mobile developers building Android applications
  • Gemini App — Consumer access on Pro and Ultra plans for general-purpose use
  • NotebookLM — Research and analysis tool with Gemini 3.1 Pro powering document understanding (Pro/Ultra)
  • Gemini Enterprise — Google Workspace integration for business teams

What This Means for Developers and Businesses

For Developers

Gemini 3.1 Pro is the best-in-class coding model when combining competitive coding (2887 Elo LiveCodeBench), real-world SWE (80.6% SWE-Bench), and massive 1M token context at $2/$12 pricing. The new Medium thinking level lets developers balance cost and reasoning depth per request — use Low for autocomplete, Medium for code review, and High for complex debugging.

For Agencies and Businesses

With the strongest general-purpose performance across reasoning, coding, and agentic tasks, Gemini 3.1 Pro is the model to evaluate for complex AI workflows. The 69.2% MCP Atlas score means it handles multi-tool coordination well — critical for building automated pipelines that interact with multiple APIs, databases, and services.

For the AI Landscape

Google retakes the lead from Anthropic and OpenAI on most benchmarks with this release. The competition is tighter than ever — Gemini 3.1 Pro dominates general-purpose tasks, Claude excels at expert office work and computer use, and OpenAI's Codex models lead specialized coding. No single model wins everywhere, which is healthy for the ecosystem and gives developers real choices based on their specific needs. For a broader look at the pre-3.1 landscape, see our Opus 4.5 vs GPT-5.2 vs Gemini 3 Pro comparison.

How to Get Started

Getting started with Gemini 3.1 Pro takes minutes. The fastest path is through Google AI Studio, which offers free access with rate limits. For production workloads, use the Gemini API directly or through Vertex AI.

Quick Start via AI Studio

  1. Visit Google AI Studio and sign in with your Google account
  2. Select Gemini 3.1 Pro from the model dropdown
  3. Choose a thinking level: Low, Medium, or High
  4. Start prompting — no API key required for the playground

Recommended Thinking Levels

Low

Simple queries, autocomplete, classification, and summarization. Fastest response, lowest cost.

Medium (New)

Code review, data analysis, document Q&A, and multi-step tasks. Best balance of speed and depth.

High

Complex reasoning, advanced coding, research, and agentic workflows. Maximum capability at higher cost.

Migration from Gemini 3 Pro

Gemini 3.1 Pro is a drop-in replacement for Gemini 3 Pro — same API format, same pricing, same context window. Update the model ID in your API calls and you get the performance upgrade immediately. The only new consideration is the Medium thinking level, which can help optimize cost for workloads that previously used High thinking but don't require maximum reasoning depth. For cost-conscious use cases, consider Gemini 3 Flash as a faster, cheaper alternative for simpler tasks.

Conclusion

Gemini 3.1 Pro is Google's most capable model to date and arguably the strongest general-purpose AI model available. With 77.1% on ARC-AGI-2, 2887 Elo on LiveCodeBench Pro, and #1 rankings on 12+ benchmarks, it delivers a meaningful capability leap over Gemini 3 Pro at exactly the same $2/$12 price point. The new Medium thinking level adds cost optimization flexibility that competitors lack.

That said, no single model wins everywhere. Claude Opus 4.6 and Sonnet 4.6 still lead on expert office tasks (GDPval-AA) and edge out on SWE-Bench Verified. GPT-5.3-Codex dominates specialized coding benchmarks. The right choice depends on your workload — but for teams that need one model to handle reasoning, coding, agentic tasks, and multimodal understanding, Gemini 3.1 Pro is now the default recommendation.

Ready to Build with Gemini 3.1 Pro?

Whether you're deploying agentic AI, building coding assistants, or automating complex workflows, our team can help you leverage frontier models for measurable business results.

Free consultation
Expert AI integration guidance
Tailored solutions

Frequently Asked Questions

Related Guides

Continue exploring AI model developments and frontier benchmarks