AI DevelopmentNew Release8 min readPublished May 19, 2026

Stable model ID · gemini-3.5-flash · 1M input context · new thinking_level API

Gemini 3.5 Flash: Benchmarks, Thinking & API Guide

Google DeepMind released Gemini 3.5 Flash on May 19, 2026 — a Flash-tier model that, on Google's own published benchmark table, leads Claude Opus 4.7 and GPT-5.5 on five separate evaluations. Gemini 3.5 Pro is rolling out next month.

DA
Digital Applied Team
Senior strategists · Published May 19, 2026
PublishedMay 19, 2026
Read time8 min
SourcesGoogle blog + ai.google.dev docs
MCP Atlas
83.6%
multi-step MCP workflows
CharXiv Reasoning
84.2%
multimodal chart synthesis
Input context window
1.05M
65,536 output tokens
Knowledge cutoff
Jan 2025
per ai.google.dev

Google DeepMind announced Gemini 3.5 Flash on May 19, 2026 — the latest model in the 3.x Flash line and the first member of the Gemini 3.5 family. The stable API model ID is gemini-3.5-flash, replacing the gemini-3-flash-preview identifier used during the preview window.

Two things stand out about the launch. First, only the Flash tier ships today — the official announcement confirms that Gemini 3.5 Pro is rolling out next month, which inverts the usual pattern of leading with the top-tier model. Second, Google's own published benchmark table shows a Flash-tier model leading Claude Opus 4.7 and GPT-5.5 on five separate evaluations — the cleanest signal yet that the Flash / Pro tier line inside Google is narrower than the marketing label suggests.

This guide covers what shipped today, the verified benchmark comparison versus Gemini 3.1 Pro, Claude Sonnet 4.6, Claude Opus 4.7, and GPT-5.5, the new thinking_level API surface, the migration path from gemini-3-flash-preview, and the practical question of when to actually reach for 3.5 Flash versus the alternatives. Every number below is anchored to the Google blog post or the ai.google.dev docs page published the same day.

Key takeaways
  1. 01
    Released today, May 19, 2026.Gemini 3.5 Flash is GA across the Gemini app, AI Mode in Google Search, Google Antigravity, the Gemini API (AI Studio and Android Studio), Gemini Enterprise, and the Gemini Enterprise Agent Platform. Stable model ID: gemini-3.5-flash. Gemini 3.5 Pro is rolling out next month.
  2. 02
    A Flash model that beats Pro tiers.On Google's published benchmark table, 3.5 Flash leads all reported models on MCP Atlas (83.6%), Toolathlon (56.5%), Finance Agent v2 (57.9%), CharXiv Reasoning (84.2%), and MMMU-Pro (83.6%) — including Claude Opus 4.7 and GPT-5.5. Google also claims roughly 4x output tokens per second versus other frontier models.
  3. 03
    thinking_level replaces thinking_budget.The new API uses a string enum — minimal, low, medium (default), high — replacing the integer budget. The default level moved from high to medium, and low-effort thinking was retuned for code and agentic tasks. Google also dropped its recommendation for temperature, top_p, and top_k.
  4. 04
    1M-token context, text-out only.1,048,576 input tokens, 65,536 output tokens, knowledge cutoff January 2025. Inputs accept text, image, video, audio, and PDF. Outputs are text only — no image generation, no audio generation, no Live API on 3.5 Flash.
  5. 05
    Computer Use stays on Gemini 3 Flash.Gemini 3.5 Flash does not support Computer Use. The ai.google.dev migration notes are explicit: keep gemini-3-flash-preview for any browser- or desktop-control agent that depends on the Computer Use surface.

01What's newA Flash release that leads before the Pro arrives.

The launch lands with a deliberately small surface area on day one. One model — Gemini 3.5 Flash — across many platforms, with Gemini 3.5 Pro held back for a release next month. The announcement frames 3.5 Flash as a small, fast model that nevertheless outperforms Gemini 3.1 Pro on challenging coding and agentic benchmarks, and runs roughly 4x faster than other frontier models on output tokens per second.

Read against the benchmark table in Section 02, that framing understates the result. 3.5 Flash leads the entire reported field — including Claude Opus 4.7 and GPT-5.5 — on MCP Atlas, Toolathlon, Finance Agent v2, CharXiv Reasoning, and MMMU-Pro. That is unusual for a Flash-tier model.

Release snapshot
Gemini 3.5 Flash launched May 19, 2026. Stable API ID: gemini-3.5-flash. Available across the Gemini app, AI Mode in Google Search, Google Antigravity, the Gemini API (AI Studio and Android Studio), Gemini Enterprise, and the Gemini Enterprise Agent Platform. Gemini 3.5 Pro is in development and rolling out next month. Gemini Spark — a personal AI agent built on 3.5 Flash — is rolling out to trusted testers today, with a Beta planned for Google AI Ultra subscribers in the US next week.
Model ID
gemini-3.5-flash
3.5

Stable identifier across all Gemini API surfaces. Replaces the gemini-3-flash-preview identifier used during the preview window — that name still resolves to the prior-generation model.

stable · GA
Input context
1,048,576 tokens
1.05M

Native 1M-token input window with multimodal inputs (text, image, video, audio, PDF). On long-context retrieval at 1M pointwise (MRCR v2), 3.5 Flash posts the highest reported figure in Google's published table.

65,536 output
Output speed
Output tokens / sec
4x

Google's announcement claims roughly 4x faster output tokens per second than other frontier models. The page does not publish a head-to-head latency number — treat the claim as Google's framing until independent benchmarks land.

per Google announcement

02BenchmarksWhere 3.5 Flash leads, matches, and trails.

The chart below is the five-benchmark headline view: every evaluation where Gemini 3.5 Flash posts the highest score in Google's published table. The full multi-vendor table follows beneath. Methodology source: deepmind.google/models/evals-methodology/gemini-3-5-flash.

Where Gemini 3.5 Flash leads the field

Source: Google evals methodology table for gemini-3-5-flash
MCP AtlasMulti-step workflows using MCP · 3.5 Flash 83.6 · Opus 4.7 79.1
83.6%
Flash wins
CharXiv ReasoningInfo synthesis from complex charts · GPT-5.5 84.1 · Opus 4.7 82.1
84.2%
Flash wins
MMMU-ProMultimodal understanding · GPT-5.5 81.2 · Opus 4.7 75.2
83.6%
Flash wins
Finance Agent v2Financial analysis · GPT-5.5 51.8 · Opus 4.7 51.5
57.9%
Flash wins
ToolathlonReal-world tool use · GPT-5.5 55.6 · 3 Flash 49.4
56.5%
Flash wins

The full table compares 3.5 Flash against Gemini 3 Flash (the predecessor in the 3.x Flash line, released December 17, 2025), Gemini 3.1 Pro, Claude Sonnet 4.6, Claude Opus 4.7, and GPT-5.5. Bold cells mark the field leader per benchmark.

Benchmark3.5 Flash3 Flash3.1 ProSonnet 4.6Opus 4.7GPT-5.5
Terminal-Bench 2.176.2%58.0%70.3%66.1%78.2%
SWE-Bench Pro55.1%49.6%54.2%64.3%58.6%
MCP Atlas83.6%62.0%78.2%69.5%79.1%75.3%
Toolathlon56.5%49.4%55.6%
OSWorld-Verified78.4%65.1%76.2%72.5%78.0%78.7%
Finance Agent v257.9%42.6%43.0%51.0%51.5%51.8%
GDPval-AA (Elo)165612041314167617531769
CharXiv Reasoning84.2%80.3%83.3%72.4%82.1%84.1%
MMMU-Pro83.6%81.2%80.5%74.5%75.2%81.2%
Blueprint-Bench 233.6%0.0%26.5%6.7%24.5%36.2%
MRCR v2 (128k avg)77.3%67.2%84.9%84.9%59.3%94.8%
MRCR v2 (1M pointwise)26.6%22.1%26.3%
Humanity's Last Exam40.2%33.7%44.4%33.2%46.9%41.4%
ARC-AGI-272.1%33.6%77.1%58.3%75.8%84.6%
Source: Google evals methodology page for gemini-3-5-flash. Em-dash indicates the score is not published for that model. Bold marks the field leader per row.

Where it doesn't lead

Pro-tier strengths persist where you'd expect them. Opus 4.7 wins on SWE-Bench Pro (64.3%) and Humanity's Last Exam (46.9%). GPT-5.5 takes Terminal-Bench 2.1 (78.2%), GDPval-AA (1769 Elo), OSWorld-Verified (78.7%), Blueprint-Bench 2 (36.2%), MRCR v2 at 128k (94.8%), and ARC-AGI-2 (84.6%). Long-context retrieval at the 128k slice is the most obvious soft spot for 3.5 Flash versus the Pro tier — if your pipeline depends on dense mid-context retrieval rather than agentic tool calls, GPT-5.5 still leads by a wide margin.

"Outperforms Gemini 3.1 Pro on challenging coding and agentic benchmarks — true on the table, but 3.1 Pro still beats 3.5 Flash on MRCR v2 at 128k (84.9% vs 77.3%). Read the rows, not the headline."— Our reading of Google's published benchmark table, May 19, 2026

03Thinking APIthinking_level replaces thinking_budget.

The biggest developer-facing change is the thinking control surface. The integer thinking_budget parameter that shipped with Gemini 3 Flash Preview is replaced by a string enum, thinking_level. The new values are minimal, low, medium (default), and high.

Two notes from the ai.google.dev migration page. The default level dropped from high to medium — meaning a naive port from gemini-3-flash-preview to gemini-3.5-flash will silently reason less than the old preview did unless you opt back in. Separately, Google retuned low-effort thinking specifically for code and agentic tasks, so "low" is no longer a hint to skip reasoning — it is a tuned setting that holds up on coding workflows.

# Python — generate with thinking_level
from google import genai

client = genai.Client()

response = client.models.generate_content(
    model="gemini-3.5-flash",
    contents="Plan a 3-step MCP workflow to file a GitHub issue.",
    config={
        "thinking_config": {
            "thinking_level": "medium"  # minimal | low | medium | high
        }
    },
)

print(response.text)

Other API changes worth knowing

  • Google removed its recommendation for temperature, top_p, and top_k. Strip hardcoded sampling values from production configs unless an eval explicitly justifies the override.
  • Function responses now require matching id and name fields between the function call and the response. Update any tool-execution layer that previously relied on positional matching.
  • Thought preservation across multi-turn is now automatic. Google flags that this can increase token usage on long sessions — budget accordingly.
  • Thinking with encrypted reasoning context now combines with structured outputs, code execution on images, and the full tool set (Search grounding, URL context, function calling) without the prior compatibility caveats.

04SpecsModalities, features, and the not-supported list.

The ai.google.dev model reference for gemini-3.5-flash gives the load-bearing specs you need before adopting. Inputs span text, image, video, audio, and PDF. Output is text only — image generation, audio generation, and the Live API are explicitly not supported on this model. Anyone shipping voice agents or live conversational UX needs to stay on a different model in the Gemini family.

Supported
Tools, thinking, caching, batch
from ai.google.dev model reference

Batch API, context caching, code execution (including on images), file search, flex inference, priority inference, function calling, grounding with Google Maps, search grounding, structured outputs, thinking with encrypted reasoning context, and URL context — all GA on gemini-3.5-flash.

12 features GA
Not supported
Computer Use, image / audio output, Live
stay on 3 Flash Preview for these

Computer Use is unavailable — use gemini-3-flash-preview for browser/desktop agents. Image and audio generation are unavailable — route to the Gemini 2.5 Flash Image and TTS models. The Live API is unavailable — keep Gemini 3.1 Flash Live for streaming conversational UX.

4 capabilities gated

Pricing was not stated in the Google blog post or the ai.google.dev docs page on launch day. The official Gemini API pricing page now lists Gemini 3.5 Flash at $1.50 / Mtok input and $9.00 / Mtok output on the standard tier, with context caching at $0.15 / Mtok (plus $1.00 / Mtok-hour storage) and batch / flex tiers at $0.75 / $4.50 — a 50% discount across the board. The priority tier runs $2.70 / $16.20. For reference, GPT-5.5 is $5 / $30 and Claude Opus 4.7 is $5 / $25, so 3.5 Flash sits at roughly 3.3x cheaper on input and 2.8-3.3x cheaper on output than the current Pro tiers from OpenAI and Anthropic.

05MigrationFrom gemini-3-flash-preview to gemini-3.5-flash.

The migration is small in lines of code and slightly larger in behavior. Three changes touch most codebases.

1. Rename the model

# Before — preview identifier
model = "gemini-3-flash-preview"

# After — stable identifier
model = "gemini-3.5-flash"

2. Swap thinking_budget for thinking_level

# Before — integer budget
config = {
    "thinking_config": {"thinking_budget": 8192}
}

# After — string enum (default is "medium")
config = {
    "thinking_config": {"thinking_level": "medium"}
}

Picking the right level is the actual decision. If your previous runs used a high thinking budget, set high explicitly — the new default of medium will silently reduce reasoning effort otherwise. If your workload is mostly tool calling or code generation, try low first — Google retuned it for that exact case.

3. Drop sampling parameter overrides

# Remove these from your default config
# temperature = 0.7
# top_p = 0.95
# top_k = 40

Google removed the prior recommendation for these values. Defaults are considered tuned for the model. Keep them only if you have a specific eval that justifies the override.

Don't migrate if you need Computer Use
Gemini 3.5 Flash does notsupport Computer Use. The official "What's new" guidance is explicit: keep using gemini-3-flash-preview for any browser- or desktop-control agent that depends on the Computer Use surface. This is the one workload class where the preview identifier remains the recommended target.

For the broader picture of how Gemini's agentic stack composes — Antigravity, the IDE surface, and what 3.5 Flash means alongside them — our existing guide on Google Antigravity and the Gemini agentic IDE still applies; only the Flash-tier model behind the agent has changed.

06AvailabilitySix surfaces today, 3.5 Pro next month.

Gemini 3.5 Flash ships across Google's full surface area on day one. The announcement lists six platforms:

  • Gemini app — global rollout.
  • AI Mode in Google Search — global rollout.
  • Google Antigravity — developer platform.
  • Gemini API — Google AI Studio and Android Studio.
  • Gemini Enterprise Agent Platform.
  • Gemini Enterprise.

Two related releases ship alongside the model. Gemini 3.5 Pro is confirmed in development and rolling out next month — the announcement does not give a specific date. Gemini Spark, a personal AI agent built on 3.5 Flash, is rolling out to trusted testers today, with a Beta for Google AI Ultra subscribers in the US planned for next week.

For operators tracking the broader frontier-model picture, our cross-vendor comparison on GPT vs Opus vs Gemini at the Pro tier covers how the Gemini lineup stacks up to OpenAI and Anthropic on coding, reasoning, and price-performance — the 3.5 Flash result sharpens that picture without redrawing it.

07Decision matrixWhen to actually reach for 3.5 Flash.

On the published benchmarks, Gemini 3.5 Flash makes the most sense when you want frontier-grade agentic behavior at a Flash-tier latency and cost profile. The matrix below maps the verified scores to concrete workload picks.

MCP / agentic workflows
Multi-step tool orchestration

83.6% MCP Atlas leads the entire reported field including Opus 4.7 (79.1%) and GPT-5.5 (75.3%). 56.5% Toolathlon also leads. For MCP-driven agents and general tool-use orchestration, 3.5 Flash is the strongest single pick today.

Pick 3.5 Flash
Multimodal reasoning
Charts, docs, images

84.2% CharXiv Reasoning and 83.6% MMMU-Pro lead the field. For chart-heavy document analysis, multimodal RAG over PDFs, and information synthesis from complex visuals, 3.5 Flash leads Opus 4.7 and GPT-5.5 simultaneously.

Pick 3.5 Flash
Finance / analyst workflows
Domain-specialised reasoning

57.9% Finance Agent v2 leads the field by 6+ points over GPT-5.5 (51.8%) and Opus 4.7 (51.5%). For financial analysis, decision support, and analyst-style synthesis, 3.5 Flash punches well above its weight.

Pick 3.5 Flash
Complex code edits
Repo-scale software engineering

Opus 4.7 still leads SWE-Bench Pro at 64.3% versus 3.5 Flash 55.1%. For complex multi-file software-engineering tasks where benchmark headroom matters, Opus 4.7 remains the better pick.

Stay with Opus 4.7
Hardest reasoning
Abstract puzzles and academic eval

GPT-5.5 leads ARC-AGI-2 at 84.6% (3.5 Flash 72.1%). Opus 4.7 leads Humanity's Last Exam at 46.9% (3.5 Flash 40.2%). For the hardest reasoning tasks, route to the Pro tier of either OpenAI or Anthropic.

Route by benchmark
Computer Use agents
Browser / desktop control

Computer Use is not supported on 3.5 Flash. The official migration guidance is to keep gemini-3-flash-preview for any agent that depends on the Computer Use surface. Image and audio generation similarly stay on Gemini 2.5 Flash Image and TTS.

Stay with 3 Flash Preview

08ConclusionA Flash release that redraws the tier line.

The shape of frontier Flash, May 2026

The Flash / Pro distinction inside Google is narrower than the marketing label suggests.

Gemini 3.5 Flash is the first Flash-tier model from any frontier lab that posts the highest score in a multi-vendor benchmark table on five separate evaluations — including against Claude Opus 4.7 and GPT-5.5. The honest framing is that the wins cluster on agentic and multimodal benchmarks, not on hardest-reasoning evals like ARC-AGI-2 or Humanity's Last Exam. But for the workloads most operators actually deploy — MCP-driven agents, multimodal document analysis, finance and analyst synthesis — 3.5 Flash is the strongest single pick today.

The thinking_level API change is small in code surface and meaningful in behavior. Defaults moved from high to medium, low-effort thinking was retuned for code and agentic tasks, and sampling-parameter recommendations were dropped entirely. Treat the migration as a re-evaluation prompt, not a like-for-like swap — run your own evals before locking in production thinking levels.

The bigger signal is the release order. Shipping Flash first and holding Pro for next month says Google trusts the Flash tier to carry the headline. With pricing now on the table — $1.50 / $9.00 per Mtok versus $5 / $25-30 for the Pro tiers from Anthropic and OpenAI — the cost picture lands hard: roughly 3.3x cheaper input and 2.8-3.3x cheaper output. On capability and price-performance together, the Pro tier has a real question to answer when it lands next month.

Ship on the right model per workload

The Flash tier just caught the Pro tier on agentic benchmarks.

Our team helps operators evaluate frontier model releases, route workloads across Gemini / Claude / GPT, and ship production agentic systems — including the migration from preview to stable model IDs.

Free consultationExpert guidanceTailored solutions
What we work on

Frontier model engagements

  • Per-workload model selection — Gemini / Claude / GPT
  • MCP-driven agent pipelines on Gemini 3.5 Flash
  • Multimodal document analysis at 1M context
  • Migration from preview to stable model IDs
  • Cost & governance programs across vendors
FAQ · Gemini 3.5 Flash launch

The questions we get on day one.

Gemini 3.5 Flash is Google DeepMind's next-generation Flash-tier model, released on May 19, 2026. The stable API model ID is gemini-3.5-flash. It is positioned as a fast, low-cost frontier model with strong agentic and multimodal performance. On Google's own benchmark table, it leads Claude Opus 4.7 and GPT-5.5 on MCP Atlas, Toolathlon, Finance Agent v2, CharXiv Reasoning, and MMMU-Pro, while running roughly 4x faster than other frontier models on output tokens per second.