Google DeepMind announced Gemini 3.5 Flash on May 19, 2026 — the latest model in the 3.x Flash line and the first member of the Gemini 3.5 family. The stable API model ID is gemini-3.5-flash, replacing the gemini-3-flash-preview identifier used during the preview window.
Two things stand out about the launch. First, only the Flash tier ships today — the official announcement confirms that Gemini 3.5 Pro is rolling out next month, which inverts the usual pattern of leading with the top-tier model. Second, Google's own published benchmark table shows a Flash-tier model leading Claude Opus 4.7 and GPT-5.5 on five separate evaluations — the cleanest signal yet that the Flash / Pro tier line inside Google is narrower than the marketing label suggests.
This guide covers what shipped today, the verified benchmark comparison versus Gemini 3.1 Pro, Claude Sonnet 4.6, Claude Opus 4.7, and GPT-5.5, the new thinking_level API surface, the migration path from gemini-3-flash-preview, and the practical question of when to actually reach for 3.5 Flash versus the alternatives. Every number below is anchored to the Google blog post or the ai.google.dev docs page published the same day.
- 01Released today, May 19, 2026.Gemini 3.5 Flash is GA across the Gemini app, AI Mode in Google Search, Google Antigravity, the Gemini API (AI Studio and Android Studio), Gemini Enterprise, and the Gemini Enterprise Agent Platform. Stable model ID: gemini-3.5-flash. Gemini 3.5 Pro is rolling out next month.
- 02A Flash model that beats Pro tiers.On Google's published benchmark table, 3.5 Flash leads all reported models on MCP Atlas (83.6%), Toolathlon (56.5%), Finance Agent v2 (57.9%), CharXiv Reasoning (84.2%), and MMMU-Pro (83.6%) — including Claude Opus 4.7 and GPT-5.5. Google also claims roughly 4x output tokens per second versus other frontier models.
- 03thinking_level replaces thinking_budget.The new API uses a string enum — minimal, low, medium (default), high — replacing the integer budget. The default level moved from high to medium, and low-effort thinking was retuned for code and agentic tasks. Google also dropped its recommendation for temperature, top_p, and top_k.
- 041M-token context, text-out only.1,048,576 input tokens, 65,536 output tokens, knowledge cutoff January 2025. Inputs accept text, image, video, audio, and PDF. Outputs are text only — no image generation, no audio generation, no Live API on 3.5 Flash.
- 05Computer Use stays on Gemini 3 Flash.Gemini 3.5 Flash does not support Computer Use. The ai.google.dev migration notes are explicit: keep gemini-3-flash-preview for any browser- or desktop-control agent that depends on the Computer Use surface.
01 — What's newA Flash release that leads before the Pro arrives.
The launch lands with a deliberately small surface area on day one. One model — Gemini 3.5 Flash — across many platforms, with Gemini 3.5 Pro held back for a release next month. The announcement frames 3.5 Flash as a small, fast model that nevertheless outperforms Gemini 3.1 Pro on challenging coding and agentic benchmarks, and runs roughly 4x faster than other frontier models on output tokens per second.
Read against the benchmark table in Section 02, that framing understates the result. 3.5 Flash leads the entire reported field — including Claude Opus 4.7 and GPT-5.5 — on MCP Atlas, Toolathlon, Finance Agent v2, CharXiv Reasoning, and MMMU-Pro. That is unusual for a Flash-tier model.
gemini-3.5-flash
Stable identifier across all Gemini API surfaces. Replaces the gemini-3-flash-preview identifier used during the preview window — that name still resolves to the prior-generation model.
1,048,576 tokens
Native 1M-token input window with multimodal inputs (text, image, video, audio, PDF). On long-context retrieval at 1M pointwise (MRCR v2), 3.5 Flash posts the highest reported figure in Google's published table.
Output tokens / sec
Google's announcement claims roughly 4x faster output tokens per second than other frontier models. The page does not publish a head-to-head latency number — treat the claim as Google's framing until independent benchmarks land.
02 — BenchmarksWhere 3.5 Flash leads, matches, and trails.
The chart below is the five-benchmark headline view: every evaluation where Gemini 3.5 Flash posts the highest score in Google's published table. The full multi-vendor table follows beneath. Methodology source: deepmind.google/models/evals-methodology/gemini-3-5-flash.
Where Gemini 3.5 Flash leads the field
Source: Google evals methodology table for gemini-3-5-flashThe full table compares 3.5 Flash against Gemini 3 Flash (the predecessor in the 3.x Flash line, released December 17, 2025), Gemini 3.1 Pro, Claude Sonnet 4.6, Claude Opus 4.7, and GPT-5.5. Bold cells mark the field leader per benchmark.
| Benchmark | 3.5 Flash | 3 Flash | 3.1 Pro | Sonnet 4.6 | Opus 4.7 | GPT-5.5 |
|---|---|---|---|---|---|---|
| Terminal-Bench 2.1 | 76.2% | 58.0% | 70.3% | — | 66.1% | 78.2% |
| SWE-Bench Pro | 55.1% | 49.6% | 54.2% | — | 64.3% | 58.6% |
| MCP Atlas | 83.6% | 62.0% | 78.2% | 69.5% | 79.1% | 75.3% |
| Toolathlon | 56.5% | 49.4% | — | — | — | 55.6% |
| OSWorld-Verified | 78.4% | 65.1% | 76.2% | 72.5% | 78.0% | 78.7% |
| Finance Agent v2 | 57.9% | 42.6% | 43.0% | 51.0% | 51.5% | 51.8% |
| GDPval-AA (Elo) | 1656 | 1204 | 1314 | 1676 | 1753 | 1769 |
| CharXiv Reasoning | 84.2% | 80.3% | 83.3% | 72.4% | 82.1% | 84.1% |
| MMMU-Pro | 83.6% | 81.2% | 80.5% | 74.5% | 75.2% | 81.2% |
| Blueprint-Bench 2 | 33.6% | 0.0% | 26.5% | 6.7% | 24.5% | 36.2% |
| MRCR v2 (128k avg) | 77.3% | 67.2% | 84.9% | 84.9% | 59.3% | 94.8% |
| MRCR v2 (1M pointwise) | 26.6% | 22.1% | 26.3% | — | — | — |
| Humanity's Last Exam | 40.2% | 33.7% | 44.4% | 33.2% | 46.9% | 41.4% |
| ARC-AGI-2 | 72.1% | 33.6% | 77.1% | 58.3% | 75.8% | 84.6% |
| Source: Google evals methodology page for gemini-3-5-flash. Em-dash indicates the score is not published for that model. Bold marks the field leader per row. | ||||||
Where it doesn't lead
Pro-tier strengths persist where you'd expect them. Opus 4.7 wins on SWE-Bench Pro (64.3%) and Humanity's Last Exam (46.9%). GPT-5.5 takes Terminal-Bench 2.1 (78.2%), GDPval-AA (1769 Elo), OSWorld-Verified (78.7%), Blueprint-Bench 2 (36.2%), MRCR v2 at 128k (94.8%), and ARC-AGI-2 (84.6%). Long-context retrieval at the 128k slice is the most obvious soft spot for 3.5 Flash versus the Pro tier — if your pipeline depends on dense mid-context retrieval rather than agentic tool calls, GPT-5.5 still leads by a wide margin.
"Outperforms Gemini 3.1 Pro on challenging coding and agentic benchmarks — true on the table, but 3.1 Pro still beats 3.5 Flash on MRCR v2 at 128k (84.9% vs 77.3%). Read the rows, not the headline."— Our reading of Google's published benchmark table, May 19, 2026
03 — Thinking APIthinking_level replaces thinking_budget.
The biggest developer-facing change is the thinking control surface. The integer thinking_budget parameter that shipped with Gemini 3 Flash Preview is replaced by a string enum, thinking_level. The new values are minimal, low, medium (default), and high.
Two notes from the ai.google.dev migration page. The default level dropped from high to medium — meaning a naive port from gemini-3-flash-preview to gemini-3.5-flash will silently reason less than the old preview did unless you opt back in. Separately, Google retuned low-effort thinking specifically for code and agentic tasks, so "low" is no longer a hint to skip reasoning — it is a tuned setting that holds up on coding workflows.
# Python — generate with thinking_level
from google import genai
client = genai.Client()
response = client.models.generate_content(
model="gemini-3.5-flash",
contents="Plan a 3-step MCP workflow to file a GitHub issue.",
config={
"thinking_config": {
"thinking_level": "medium" # minimal | low | medium | high
}
},
)
print(response.text)Other API changes worth knowing
- Google removed its recommendation for
temperature,top_p, andtop_k. Strip hardcoded sampling values from production configs unless an eval explicitly justifies the override. - Function responses now require matching
idandnamefields between the function call and the response. Update any tool-execution layer that previously relied on positional matching. - Thought preservation across multi-turn is now automatic. Google flags that this can increase token usage on long sessions — budget accordingly.
- Thinking with encrypted reasoning context now combines with structured outputs, code execution on images, and the full tool set (Search grounding, URL context, function calling) without the prior compatibility caveats.
04 — SpecsModalities, features, and the not-supported list.
The ai.google.dev model reference for gemini-3.5-flash gives the load-bearing specs you need before adopting. Inputs span text, image, video, audio, and PDF. Output is text only — image generation, audio generation, and the Live API are explicitly not supported on this model. Anyone shipping voice agents or live conversational UX needs to stay on a different model in the Gemini family.
Tools, thinking, caching, batch
Batch API, context caching, code execution (including on images), file search, flex inference, priority inference, function calling, grounding with Google Maps, search grounding, structured outputs, thinking with encrypted reasoning context, and URL context — all GA on gemini-3.5-flash.
Computer Use, image / audio output, Live
Computer Use is unavailable — use gemini-3-flash-preview for browser/desktop agents. Image and audio generation are unavailable — route to the Gemini 2.5 Flash Image and TTS models. The Live API is unavailable — keep Gemini 3.1 Flash Live for streaming conversational UX.
Pricing was not stated in the Google blog post or the ai.google.dev docs page on launch day. The official Gemini API pricing page now lists Gemini 3.5 Flash at $1.50 / Mtok input and $9.00 / Mtok output on the standard tier, with context caching at $0.15 / Mtok (plus $1.00 / Mtok-hour storage) and batch / flex tiers at $0.75 / $4.50 — a 50% discount across the board. The priority tier runs $2.70 / $16.20. For reference, GPT-5.5 is $5 / $30 and Claude Opus 4.7 is $5 / $25, so 3.5 Flash sits at roughly 3.3x cheaper on input and 2.8-3.3x cheaper on output than the current Pro tiers from OpenAI and Anthropic.
05 — MigrationFrom gemini-3-flash-preview to gemini-3.5-flash.
The migration is small in lines of code and slightly larger in behavior. Three changes touch most codebases.
1. Rename the model
# Before — preview identifier
model = "gemini-3-flash-preview"
# After — stable identifier
model = "gemini-3.5-flash"2. Swap thinking_budget for thinking_level
# Before — integer budget
config = {
"thinking_config": {"thinking_budget": 8192}
}
# After — string enum (default is "medium")
config = {
"thinking_config": {"thinking_level": "medium"}
}Picking the right level is the actual decision. If your previous runs used a high thinking budget, set high explicitly — the new default of medium will silently reduce reasoning effort otherwise. If your workload is mostly tool calling or code generation, try low first — Google retuned it for that exact case.
3. Drop sampling parameter overrides
# Remove these from your default config
# temperature = 0.7
# top_p = 0.95
# top_k = 40Google removed the prior recommendation for these values. Defaults are considered tuned for the model. Keep them only if you have a specific eval that justifies the override.
gemini-3-flash-preview for any browser- or desktop-control agent that depends on the Computer Use surface. This is the one workload class where the preview identifier remains the recommended target.For the broader picture of how Gemini's agentic stack composes — Antigravity, the IDE surface, and what 3.5 Flash means alongside them — our existing guide on Google Antigravity and the Gemini agentic IDE still applies; only the Flash-tier model behind the agent has changed.
06 — AvailabilitySix surfaces today, 3.5 Pro next month.
Gemini 3.5 Flash ships across Google's full surface area on day one. The announcement lists six platforms:
- Gemini app — global rollout.
- AI Mode in Google Search — global rollout.
- Google Antigravity — developer platform.
- Gemini API — Google AI Studio and Android Studio.
- Gemini Enterprise Agent Platform.
- Gemini Enterprise.
Two related releases ship alongside the model. Gemini 3.5 Pro is confirmed in development and rolling out next month — the announcement does not give a specific date. Gemini Spark, a personal AI agent built on 3.5 Flash, is rolling out to trusted testers today, with a Beta for Google AI Ultra subscribers in the US planned for next week.
For operators tracking the broader frontier-model picture, our cross-vendor comparison on GPT vs Opus vs Gemini at the Pro tier covers how the Gemini lineup stacks up to OpenAI and Anthropic on coding, reasoning, and price-performance — the 3.5 Flash result sharpens that picture without redrawing it.
07 — Decision matrixWhen to actually reach for 3.5 Flash.
On the published benchmarks, Gemini 3.5 Flash makes the most sense when you want frontier-grade agentic behavior at a Flash-tier latency and cost profile. The matrix below maps the verified scores to concrete workload picks.
Multi-step tool orchestration
83.6% MCP Atlas leads the entire reported field including Opus 4.7 (79.1%) and GPT-5.5 (75.3%). 56.5% Toolathlon also leads. For MCP-driven agents and general tool-use orchestration, 3.5 Flash is the strongest single pick today.
Charts, docs, images
84.2% CharXiv Reasoning and 83.6% MMMU-Pro lead the field. For chart-heavy document analysis, multimodal RAG over PDFs, and information synthesis from complex visuals, 3.5 Flash leads Opus 4.7 and GPT-5.5 simultaneously.
Domain-specialised reasoning
57.9% Finance Agent v2 leads the field by 6+ points over GPT-5.5 (51.8%) and Opus 4.7 (51.5%). For financial analysis, decision support, and analyst-style synthesis, 3.5 Flash punches well above its weight.
Repo-scale software engineering
Opus 4.7 still leads SWE-Bench Pro at 64.3% versus 3.5 Flash 55.1%. For complex multi-file software-engineering tasks where benchmark headroom matters, Opus 4.7 remains the better pick.
Abstract puzzles and academic eval
GPT-5.5 leads ARC-AGI-2 at 84.6% (3.5 Flash 72.1%). Opus 4.7 leads Humanity's Last Exam at 46.9% (3.5 Flash 40.2%). For the hardest reasoning tasks, route to the Pro tier of either OpenAI or Anthropic.
Browser / desktop control
Computer Use is not supported on 3.5 Flash. The official migration guidance is to keep gemini-3-flash-preview for any agent that depends on the Computer Use surface. Image and audio generation similarly stay on Gemini 2.5 Flash Image and TTS.
08 — ConclusionA Flash release that redraws the tier line.
The Flash / Pro distinction inside Google is narrower than the marketing label suggests.
Gemini 3.5 Flash is the first Flash-tier model from any frontier lab that posts the highest score in a multi-vendor benchmark table on five separate evaluations — including against Claude Opus 4.7 and GPT-5.5. The honest framing is that the wins cluster on agentic and multimodal benchmarks, not on hardest-reasoning evals like ARC-AGI-2 or Humanity's Last Exam. But for the workloads most operators actually deploy — MCP-driven agents, multimodal document analysis, finance and analyst synthesis — 3.5 Flash is the strongest single pick today.
The thinking_level API change is small in code surface and meaningful in behavior. Defaults moved from high to medium, low-effort thinking was retuned for code and agentic tasks, and sampling-parameter recommendations were dropped entirely. Treat the migration as a re-evaluation prompt, not a like-for-like swap — run your own evals before locking in production thinking levels.
The bigger signal is the release order. Shipping Flash first and holding Pro for next month says Google trusts the Flash tier to carry the headline. With pricing now on the table — $1.50 / $9.00 per Mtok versus $5 / $25-30 for the Pro tiers from Anthropic and OpenAI — the cost picture lands hard: roughly 3.3x cheaper input and 2.8-3.3x cheaper output. On capability and price-performance together, the Pro tier has a real question to answer when it lands next month.