AI DevelopmentPlaybook12 min readPublished May 23, 2026

The migration guide that covers what breaks — not just what changed.

Migrate to Gemini 3.5 Flash: Three Gotchas Nobody Else Covers

Earlier this week at Google I/O 2026, Gemini 3.5 Flash went GA with stable model ID gemini-3.5-flash. Three migration gotchas no other coverage is publishing: Computer Use does not migrate; 3.5 Flash is 3× pricier on input than gemini-3-flash-preview; and gemini-2.0-flash shuts down in eight days. Here are the code diffs, the pricing matrix, and the routing decision.

DA
Digital Applied Team
Senior engineers · Published May 23, 2026
PublishedMay 23, 2026
Read time12 min
Sources11
Input pricing
$1.50
per Mtok standard
3× pricier than 3-flash-preview
Output pricing
$9.00
per Mtok standard
50% off via Batch API
Stable model ID
gemini-3.5-flash
replaces 3-flash-preview
GA: May 19, 2026
2.0 Flash shutdown
June 1
2026 — 8 days away
Migrate immediately

Gemini 3.5 Flash went generally available on May 19, 2026 at Google I/O with stable model ID gemini-3.5-flash, replacing the gemini-3-flash-preview identifier used during the December 2025 to May 2026 preview window. This migration playbook covers the code-level diffs, the pricing math across four migration paths, and the three breaking-change gotchas that no other coverage is publishing — starting with the fact that Computer Use does not migrate to 3.5 Flash.

The stakes are compressed by hard deadlines. gemini-2.0-flashshuts down on June 1, 2026 — eight days from this post's publish date. If your codebase still references gemini-2.0-flash, it stops working next week. On the other end, gemini-2.5-pro has until October 16, 2026 before its shutdown, giving those teams a five-month runway — but the pricing math may make an earlier migration worthwhile. And for Gemini 3 Pro users: that model shut down on March 9, 2026, so anyone on gemini-3-pro-preview has already migrated to gemini-3.1-pro-preview. Gemini 3.5 Flash is now a third option for those teams — a capability downshift in exchange for meaningfully lower cost.

This guide covers the deprecation timeline and urgency, the proprietary pricing matrix across four predecessor models, the SDK and package rename, the three breaking-change gotchas (thinking_level enum, Computer Use, function-calling id requirement), the model routing decision matrix, the Managed Agents API context, and a cost-per-task analysis across five agentic workloads. For benchmark numbers, see the sibling deep-dive: Gemini 3.5 Flash: Benchmarks, Thinking and API Guide— this post does not duplicate that table.

Key takeaways
  1. 01
    gemini-2.0-flash shuts down June 1 — eight days away.As of this post's publish date of May 23, 2026, the gemini-2.0-flash shutdown is eight days out. The replacement path is gemini-2.5-flash (not gemini-3.5-flash — the pricing jump is extreme). Source: Gemini deprecations page, retrieved 2026-05-24. If your production code still calls gemini-2.0-flash, this is the most urgent item in this guide.
  2. 02
    Computer Use does NOT migrate to gemini-3.5-flash — keep it on gemini-3-flash-preview.The gemini-3.5-flash model page explicitly lists Computer Use as unsupported. gemini-3-flash-preview supports Computer Use and currently has no announced shutdown date. Any browser-control or desktop-agent workload using the Computer Use API must stay on gemini-3-flash-preview. Migrating those workloads to gemini-3.5-flash breaks them silently — the API call succeeds but Computer Use tool calls are not executed.
  3. 03
    3.5 Flash is 3× MORE expensive on input than 3-flash-preview.Gemini 3.5 Flash prices at $1.50 input / $9.00 output per Mtok. gemini-3-flash-preview was $0.50 input / $3.00 output. Most Flash-to-Flash migration coverage assumes a cost drop — it does not apply here. Migrating from 3-flash-preview to 3.5 Flash is a deliberate capability upgrade, not a cost optimization. Run the cost-per-task math in Section 09 before migrating high-volume pipelines.
  4. 04
    thinking_budget integer is replaced by a thinking_level enum — breaking change.The thinking_config API changed from an integer thinking_budget (used with Gemini 2.5 Pro and earlier) to a string enum thinking_level: 'minimal' | 'low' | 'medium' | 'high'. The default is 'medium' for Gemini 3.5 Flash. Passing thinking_budget: 8192 to the new SDK will either throw or silently be ignored depending on the SDK version. The code diffs in Section 04 show the exact migration for both Python and JavaScript.
  5. 05
    Function calling now requires a matching id on every functionResponse.All Gemini 3.x models generate a unique id for every function call. When returning results manually, the functionResponse must include the matching id from the original call. SDKs handle this automatically in tool-use mode, but any team that constructs functionResponse objects manually — common in multi-step agentic loops — must add the id field. Bare function_response objects without ids will fail. Source: Gemini function-calling docs, retrieved 2026-05-24.

01Deprecation TimelineThe shutdown calendar: four models, four different clocks.

Before choosing a migration path, orient against the actual shutdown dates. These are sourced verbatim from the Gemini deprecations page (retrieved 2026-05-24), not from third-party coverage:

gemini-2.0-flash — shutdown June 1, 2026. Eight days from this post's publish date. Replacement: gemini-2.5-flash. This is the most urgent migration in the Gemini family right now. If you're on 2.0 Flash for a cost-first pipeline — it was priced at $0.10 input / $0.40 output per Mtok — the cost math changes dramatically regardless of which model you move to. See the pricing matrix in Section 02.

gemini-3-pro-preview — shut down March 9, 2026. Already gone. The gemini-pro-latest alias moved to gemini-3.1-pro-preview on March 6, 2026, and Gemini 3 Pro Preview was shut down three days later. If you were on Gemini 3 Pro, you have already completed one forced migration — to gemini-3.1-pro-preview. Gemini 3.5 Flash is now a potential second step: a capability downshift (from Pro-class to Flash-class) in exchange for lower cost. Whether that tradeoff makes sense for your workload is what Section 07's routing decision matrix is designed to answer. For context on the March migration, see the Gemini 3 Pro to 3.1 Deep Think migration playbook.

gemini-2.5-pro — shutdown October 16, 2026. Replacement: gemini-3.1-pro-preview. Five-month runway. This is the longest-runway migration in the current Gemini deprecation schedule. Teams on 2.5 Pro can evaluate 3.5 Flash as a cost-reduction option without urgency pressure. Gemini 3.5 Flash is approximately 17% cheaper on input than 2.5 Pro for prompts under 200k tokens ($1.50 vs $1.25 — note: 2.5 Pro is $1.25 input at ≤200k but $10.00 output vs 3.5 Flash's $9.00; the output saving is real). For deeper benchmarks and capability comparisons, see Gemini 3.1 Pro benchmarks and pricing guide.

gemini-3-flash-preview — no shutdown date announced. This is intentional on Google's part because Computer Use is supported on gemini-3-flash-preview and not on gemini-3.5-flash. Until Google ships Computer Use support on a stable-ID model, gemini-3-flash-preview will remain the only live path for browser- and desktop-control agents.

June 1, 2026 — Eight days from publish

gemini-2.0-flash shuts down on June 1, 2026. If your production code calls this model, you have eight days to migrate before your requests start failing. The recommended path is gemini-2.5-flash (not 3.5 Flash — the pricing jump from 2.0 Flash to 3.5 Flash is 15× on input tokens). Source: Gemini deprecations page, retrieved 2026-05-24.

02Pricing MatrixFour migration paths, one pricing matrix — with routing recommendations.

Every other Gemini 3.5 Flash pricing post compares against a single predecessor. Below is the matrix across all four active predecessor paths, with per-Mtok input/output pricing sourced from the Gemini API pricing page (retrieved 2026-05-24) and a routing recommendation for each path. Gemini 3.5 Flash standard pricing: $1.50 input / $9.00 output per Mtok; $0.15 cached; Batch API 50% off ($0.75 / $4.50). These figures were current as of May 24, 2026 — always verify against the live pricing page before making budget commitments.

2.0 Flash → 3.5 Flash
Shutdown in 8 days — urgent, but 15× price jump

BEFORE: $0.10 input / $0.40 output per Mtok. AFTER: $1.50 input / $9.00 output. Delta: +1,400% input / +2,150% output. Context window: unchanged (1M in / 8K out → 1M in / 65K out — output limit improves). Computer Use: 2.0 Flash did not support Computer Use. Shutdown: June 1, 2026. Recommended action: For cost-first pipelines, migrate to gemini-2.5-flash (not 3.5 Flash) to minimize cost increase. Only migrate to 3.5 Flash if your workload specifically needs the enhanced agentic/coding capabilities.

Migrate to 2.5-flash first — cheaper bridge
2.5 Pro → 3.5 Flash
5-month runway · 17% cheaper on input, 10% cheaper on output

BEFORE: $1.25 input / $10.00 output (≤200k tokens); $2.50 / $15.00 (>200k). AFTER: $1.50 input / $9.00 output. Delta at ≤200k: +20% input / −10% output. Delta at >200k: −40% input / −40% output. Context window: unchanged (1M). Computer Use: 2.5 Pro did not support Computer Use. Shutdown: Oct 16, 2026. Recommended action: Strong candidate for migration at ≤200k prompts if output volume is high (output savings outweigh input increase). At >200k prompts, 3.5 Flash is meaningfully cheaper on both dimensions. Run the cost-per-task math in Section 09.

Migrate now if output-heavy — evaluate at ≤200k
3 Flash Preview → 3.5 Flash
3× price increase — only migrate non-Computer-Use workloads

BEFORE: $0.50 input / $3.00 output per Mtok (text/image/video). AFTER: $1.50 input / $9.00 output. Delta: +200% input / +200% output. Context window: unchanged (1M in / 65K out — same as preview). Computer Use: STAYS on gemini-3-flash-preview. Shutdown: no date announced. Recommended action: This is NOT a drop-in migration for workloads using Computer Use — those must stay on gemini-3-flash-preview. For all other workloads (RAG, coding, structured output), the upgrade to 3.5 Flash is a deliberate capability buy at 3× cost. Evaluate against your quality requirements.

Non-Computer-Use only — verify workload first
3.1 Pro Preview → 3.5 Flash
Capability downshift · pricing comparison requires live verification

gemini-3.1-pro-preview is where Gemini 3 Pro users landed after the March 9, 2026 forced migration. A standalone pricing row for gemini-3.1-pro-preview was not visible on the pricing page retrieved 2026-05-24 — see the Gemini API pricing page (ai.google.dev/gemini-api/docs/pricing) directly, or reference the Gemini 3.1 Pro benchmarks and pricing guide for verified figures. Migrating from 3.1 Pro to 3.5 Flash is a deliberate Pro→Flash capability downshift. Only appropriate for workloads where 3.5 Flash's benchmark performance is sufficient. Recommended action: benchmark on your task before committing.

Benchmark first — this is a capability downgrade

One pattern stands out across all four paths: the only migration where cost goes downon both input and output is the 2.5 Pro migration at large context (>200k tokens) — where 3.5 Flash saves 40% on both dimensions. For all other paths, migrating to 3.5 Flash represents a capability upgrade bought at higher cost. Budget models (2.0 Flash, 3-flash-preview) that migrate to 3.5 Flash see dramatic cost increases. Plan accordingly.

03SDK MigrationPackage rename first: google-generativeai is the legacy SDK.

Before touching model IDs or API parameters, ensure you are on the current Google GenAI SDK. The old package names are the legacy SDK and will not receive new Gemini 3.x features. The Gemini SDK migration guide (retrieved 2026-05-24) documents the full package transition:

Python — package rename:

# LEGACY (do NOT use for new work)
pip install google-generativeai
import google.generativeai as genai

# CURRENT
pip install -U -q "google-genai"
from google import genai

JavaScript / TypeScript — package rename:

// LEGACY (do NOT use for new work)
// npm install @google/generative-ai
import { GoogleGenerativeAI } from "@google/generative-ai";

// CURRENT
// npm install @google/genai
import { GoogleGenAI } from "@google/genai";

Go — module rename: was github.com/google/generative-ai-go, now google.golang.org/genai.

Client initialization after the rename:

# Python (new SDK)
client = genai.Client()
response = client.models.generate_content(
    model="gemini-3.5-flash",
    contents="Summarize this document."
)
print(response.text)
// JavaScript (new SDK)
const ai = new GoogleGenAI({});
const response = await ai.models.generateContent({
  model: "gemini-3.5-flash",
  contents: "Summarize this document.",
});
console.log(response.text);

Source: verbatim from Gemini text-generation docs, retrieved 2026-05-24. For workloads building full agentic pipelines on top of 3.5 Flash, the cross-vendor function-calling guide covers the SDK surface patterns across OpenAI, Anthropic, and Google simultaneously — useful context before the gotchas below.

04Gotcha 1thinking_budget integer is gone thinking_level enum replaces it.

This is the breaking change that silently misfires for teams that copy-paste code from Gemini 2.5 Pro examples into a 3.5 Flash context. The thinking_budget integer parameter used in earlier Gemini 2.x and some 3.x thinking-mode configurations is replaced by a thinking_level string enum on Gemini 3.5 Flash.

Accepted values: "minimal", "low", "medium" (default), "high". Source: Gemini thinking docs, retrieved 2026-05-24.

Important: Gemini 3.1 Pro does NOT support the "minimal" thinking level — that value is specific to 3.5 Flash. Gemini 3.5 Flash defaults to "medium" rather than the "high"dynamic default that 3.1 Pro uses. Google does not publish a per-thinking-level price discount — the pricing table does not tier on thinking level, so avoid inventing a "minimal-thinking discount" in your cost models.

Python — thinking config migration:

# LEGACY — Gemini 2.5 Pro style (thinking_budget integer)
# config = types.GenerateContentConfig(
#     thinking_config=types.ThinkingConfig(thinking_budget=8192)
# )

# CURRENT — Gemini 3.5 Flash (thinking_level enum)
from google import genai
from google.genai import types

response = client.models.generate_content(
    model="gemini-3.5-flash",
    contents="Explain this architecture diagram.",
    config=types.GenerateContentConfig(
        thinking_config=types.ThinkingConfig(thinking_level="low")
    )
)

JavaScript — thinking config migration:

// LEGACY — integer thinking_budget
// thinkingConfig: { thinkingBudget: 8192 }

// CURRENT — enum thinking_level
import { GoogleGenAI, ThinkingLevel } from "@google/genai";
const ai = new GoogleGenAI({});

const response = await ai.models.generateContent({
  model: "gemini-3.5-flash",
  contents: "Explain this architecture diagram.",
  config: {
    thinkingConfig: { thinkingLevel: ThinkingLevel.LOW },
  },
});

For the full thinking-capability context including benchmark implications, see Gemini 3.5 Flash: Benchmarks, Thinking and API Guide. This post focuses on the migration diff, not the capability depth.

05Gotcha 2Computer Use does not migrate — keep browser agents on gemini-3-flash-preview.

This is the single most consequential gotcha in the entire Gemini 3.5 Flash migration, and it is not mentioned in any mainstream coverage of the I/O 2026 launch. The Gemini 3.5 Flash model page explicitly lists the following as unsupported on gemini-3.5-flash: Live API, Computer Use, Image generation, Audio generation. Retrieved 2026-05-24.

What this means in practice: any team running browser-control agents or desktop-automation agents via the Computer Use API on gemini-3-flash-preview must stay on that model. The stable-ID migration from gemini-3-flash-preview to gemini-3.5-flash is safe for most workloads — but it is a breaking change for Computer Use workloads. Your agent will not throw an error on the API call; it will simply stop executing browser actions.

The good news: gemini-3-flash-preview has no announced shutdown date as of 2026-05-24. Google is preserving it specifically because Computer Use has no migration path to a stable-ID model yet. You can keep Computer Use workloads on gemini-3-flash-preview indefinitely until Google ships Computer Use support on a stable-ID model.

For a broader view of the agentic AI landscape this week — including other Computer Use developments — see the Agentic AI week in review for May 19-23, 2026.

Supported
Capabilities on gemini-3.5-flash
12

Batch API, Caching, Code execution, File search, Flex inference, Function calling, Grounding with Google Maps, Priority inference, Search grounding, Structured outputs, Thinking, URL context. Source: ai.google.dev model page, retrieved 2026-05-24.

Stable — migrate freely
Unsupported
Missing on gemini-3.5-flash
4

Live API, Computer Use, Image generation, Audio generation. Any workload that relies on these four capabilities must NOT migrate to gemini-3.5-flash. Computer Use stays on gemini-3-flash-preview. Source: ai.google.dev model page, retrieved 2026-05-24.

Breaking if migrated
Context window
Same as predecessor models
1Mtokens in

1,048,576 input tokens and 65,536 output tokens — identical to gemini-3-flash-preview. Knowledge cutoff: January 2025. Modalities: Text, Image, Video, Audio, PDF input; Text output only. Source: ai.google.dev model page, retrieved 2026-05-24.

No regression on context
Speed claim
Output tokens/sec vs frontier models

According to Google's Gemini 3.5 launch blog (Kavukcuoglu, Dean, Vinyals, Shazeer, May 19 2026): 'When looking at output tokens per second, it is 4 times faster than other frontier models.' Third-party independent benchmarks had not landed as of the publish date — treat as Google's framing.

Google-reported, unverified

06Gotcha 3Function calling now requires a matching id on every functionResponse.

All Gemini 3.x models generate a unique id for every function call they emit. When your code returns results manually via functionResponse, the response object must include the matching id from the original call. Source: Gemini function-calling docs, retrieved 2026-05-24.

If you are using the SDK's automatic tool-use mode, this is handled for you — the SDK reads the id from the model response and attaches it to the function response automatically. The breaking change only affects teams that construct functionResponse objects manually — a pattern common in multi-step agentic loops where the orchestration layer intercepts tool calls and routes them to external services.

Python — function response migration:

# LEGACY — Gemini 2.5 Pro style (no id required)
# function_response_part = types.Part(
#     function_response=types.FunctionResponse(
#         name="get_weather",
#         response={"result": "72°F, sunny"}
#     )
# )

# CURRENT — Gemini 3.x (id required — read it from the function call)
from google.genai import types

# First, capture the function call id from the model's response
function_call = response.candidates[0].content.parts[0].function_call
call_id = function_call.id  # e.g. "call_abc123"

# Then return the result with the matching id
function_response_part = types.Part(
    function_response=types.FunctionResponse(
        id=call_id,          # REQUIRED in Gemini 3.x
        name="get_weather",
        response={"result": "72°F, sunny"}
    )
)

Parallel function calling works identically across 3.x models: the model can return multiple function calls in one turn, and the results can come back in any order — the API maps each result to its originating call via the id. This makes parallel tool-use loops cleaner once you adopt the pattern.

The three gotchas — Computer Use staying on 3-flash-preview, the 3× price increase vs gemini-3-flash-preview, and the thinking_level enum change — are each potentially production-breaking. None of them appear in any mainstream I/O 2026 coverage. That's why this guide exists.Digital Applied analysis, May 23, 2026

07Routing DecisionWhich model for which workload — the four-path decision matrix.

With Gemini 3.5 Flash GA and Gemini 3.5 Pro arriving “next month” (June 2026, per the Gemini 3.5 launch blog), developers face up to four active routing options simultaneously. The matrix below uses capability and cost signals from the research above to assign each workload type to its recommended model.

gemini-3.5-flash
Agentic coding, structured output, long-context analysis
Recommended for most new workloads

GA with stable model ID. Best fit: agentic coding loops (MCP Atlas 83.6%, Terminal-Bench 76.2%), structured output pipelines, long-context RAG over 200K tokens where 2.5 Pro pricing hurts. 4× output-token speed reportedly. $1.50/$9.00 per Mtok. NOT for Computer Use, Live API, image/audio generation.

GA — May 19, 2026
gemini-3-flash-preview
Browser agents, desktop control, Computer Use
Only option for Computer Use

No shutdown date announced. The only active Gemini model supporting Computer Use. Keep all browser-control and desktop-automation workloads here until Google ships Computer Use on a stable-ID model. $0.50/$3.00 per Mtok — significantly cheaper than 3.5 Flash for high-volume non-Computer-Use workloads too.

No shutdown — Computer Use only
gemini-2.5-pro / gemini-3.1-pro-preview
Complex reasoning, Pro-class tasks — hold until 3.5 Pro
Hold for June 2026 — 3.5 Pro arriving

gemini-2.5-pro shuts down Oct 16, 2026. For Pro-class workloads (complex multi-step reasoning, advanced code generation), hold for gemini-3.5-pro which Google confirmed for June 2026 ('We look forward to rolling it out next month'). Don't rush a lateral migration from 2.5 Pro to 3.5 Flash for Pro-class tasks — wait six weeks for the proper upgrade.

Wait for 3.5 Pro in June 2026
gemini-2.5-flash
Budget pipelines migrating from 2.0 Flash— the cost-first bridge
2.0 Flash shutdown bridge (June 1 deadline)

For teams on gemini-2.0-flash who need to migrate before June 1, 2026 but cannot absorb the 15× cost jump to gemini-3.5-flash: gemini-2.5-flash is the cost-effective bridge. The exact 2.5-flash pricing is on the Gemini API pricing page. This recommendation applies to cost-first RAG, bulk classification, and high-volume summarization workloads. Quality-critical workloads may benefit from the 3.5 Flash upgrade despite the cost increase.

June 1 bridge — cost-first path

08Managed AgentsManaged Agents API + the May 26 Interactions schema break.

Alongside Gemini 3.5 Flash GA, Google also released the Managed Agents API into public preview on May 19, 2026. The general-purpose agent identifier is antigravity-preview-05-2026. Source: Gemini API release notes, May 19, 2026 entry. This is the API that backs Antigravity 2.0 — which according to TechCrunch's I/O coverage, ships with gemini-3.5-flash as its default backing model. Developers building agentic loops on 3.5 Flash can route through Managed Agents (sandboxed Linux execution, web browsing) rather than self-hosting an orchestration harness.

Critical deadline for Managed Agents users: May 26, 2026. A breaking Interactions API schema change goes live on May 26, 2026 — three days from this post's publish date. The outputs field becomes steps and the response format config changes. The legacy schema is removed on June 8, 2026. Source: Gemini API release notes, May 6, 2026 entry. If you are building on the Managed Agents API today, this schema break must be in your sprint plan this week.

Gemini 3.5 Flash is also the default backing model for AI Mode in Google Search, which reportedly reaches 1 billion monthly active users as of May 19, 2026. That production scale is the strongest signal that the model's reliability and throughput have been validated at infrastructure scale, not just in benchmarks. For context on the broader I/O week, see our complete I/O 2026 announcement guide. For the post-I/O marketing operations playbook, see the agent-first marketing ops post-I/O playbook.

09Cost AnalysisFive agentic workloads: what does each migration actually cost?

Pricing tables are abstract. The following analysis runs five representative agentic workloads through the per-Mtok pricing from the Gemini API pricing page (retrieved 2026-05-24) to produce monthly cost estimates. The caching row assumes a 30% cache-hit rate and a one-hour TTL at $0.15 per Mtok for cached input — these are illustrative assumptions, not guaranteed savings. All per-Mtok figures are sourced from ai.google.dev/gemini-api/docs/pricing. Verify live before committing budget.

Methodology: monthly cost = (queries per month) × (tokens per query) ÷ 1,000,000 × (price per Mtok). Cache saving = (30% of input tokens) × ($0.15 − standard input rate). Numbers are rounded to two significant figures. For the ROI framing on AI agent investments, see the sibling AI agent ROI calculator for enterprise business cases.

Monthly cost: gemini-3.5-flash vs predecessors (standard tier, no caching)

Pricing: ai.google.dev/gemini-api/docs/pricing, retrieved 2026-05-24. Workload volumes illustrative. 30% cache-hit assumption not reflected in bars above — adds further savings for caching-eligible workloads.
RAG retrieval — 1M queries/mo (5k in / 500 out)gemini-2.5-pro vs gemini-3.5-flash: $8,750 → $8,000/mo | gemini-3-flash-preview: $2,750/mo
$8,000/mo
Agentic coding loop — 10k loops/mo (20k in / 5k out)gemini-2.5-pro: $550/mo | gemini-3.5-flash: $345/mo (−37%) | gemini-3-flash-preview: $115/mo
$345/mo
Multimodal extraction — 100k docs/mo (50k in / 1k out)gemini-2.5-pro: $7,250/mo | gemini-3.5-flash: $8,400/mo (+16%) | gemini-3-flash-preview: $2,800/mo
$8,400/mo
Long-context analysis — 1k jobs/mo (800k in / 5k out)gemini-2.5-pro (>200k tier): $2,045/mo | gemini-3.5-flash: $1,245/mo (−39%) | Winner: 3.5 Flash
$1,245/mo
Tool-use agent — 100k turns/mo (8k in / 2k out)gemini-2.5-pro: $3,200/mo | gemini-3.5-flash: $3,000/mo (−6%) | gemini-3-flash-preview: $1,000/mo
$3,000/mo

Two patterns emerge from the workload analysis. First, the only case where migrating to 3.5 Flash delivers clear cost savings versus 2.5 Pro is the long-context workload (800k+ token inputs) — where 3.5 Flash is approximately 39% cheaper because it does not apply 2.5 Pro's >200k surcharge. Second, for any workload that would run on gemini-3-flash-preview, migrating to 3.5 Flash is always a cost increase (the 3× input premium is consistent across all five workloads). The upgrade should be justified by quality requirements, not cost optimization. For teams managing AI agent budgets at scale, our AI transformation services include cost-optimization reviews that run this analysis against your actual production token volumes.

Conclusion

Migration is not binary — there are four active paths and three things that break.

Gemini 3.5 Flash is a genuine capability upgrade for agentic coding, long-context analysis, and structured-output workloads. Google's claim of 4× output-token throughput versus frontier models — while unverified by independent benchmarks as of this post's publish date — aligns with the model's positioning as the default backing model for both AI Mode (reportedly 1B MAU) and Antigravity 2.0. The Managed Agents API entering public preview on the same day extends the 3.5 Flash surface further.

But the migration is not a simple model-ID swap. Computer Use workloads break silently. The thinking_budget integer pattern throws on the new SDK. Any manually constructed functionResponse without an id fails. And the cost math is counterintuitive: 3.5 Flash is 3× more expensive than the model it replaces by lineage (gemini-3-flash-preview), while being only 17% cheaper than 2.5 Pro on input at modest context lengths. Neither “Flash is cheap” nor “Flash is cheaper than Pro” is a reliable frame — it depends on your predecessor model and your token distribution.

The checklist: audit your codebase forgemini-2.0-flash references before June 1. Keep Computer Use on gemini-3-flash-preview. Update the package import before changing model IDs. Run the cost-per-task math against your actual token volumes, not abstract pricing tables. And if you are on Pro-class workloads, wait six weeks for Gemini 3.5 Pro rather than doing a lateral Flash migration now.

Migrate your Gemini stack with confidence

From breaking changes to production-ready migration.

We help engineering and product teams navigate Gemini API migrations — from cost audits and routing decisions to agentic pipeline architecture on Gemini 3.5 Flash and the Managed Agents API.

Free consultationExpert guidanceTailored solutions
What we work on

Gemini API strategy and migration

  • Gemini API migration audit (model ID, pricing, gotchas)
  • Agentic pipeline architecture on 3.5 Flash
  • Cost-per-task analysis across Gemini model tiers
  • Managed Agents API + Interactions schema migration
  • Computer Use agent routing and isolation strategy
FAQ · Gemini 3.5 Flash Migration

The questions developers ask about migrating to Gemini 3.5 Flash.

The stable model ID is gemini-3.5-flash, released as GA on May 19, 2026 at Google I/O 2026. It replaces the gemini-3-flash-preview identifier used during the December 2025 to May 2026 preview window. The gemini-3-flash-preview model is not being shut down immediately — it currently has no announced shutdown date, primarily because it supports Computer Use and gemini-3.5-flash does not. For codebases that do not use Computer Use, migrating from gemini-3-flash-preview to gemini-3.5-flash is generally safe, with the caveat that you are accepting a 3× input-cost increase in exchange for the capability upgrades. Source: Gemini API release notes and model page, retrieved 2026-05-24.