AI DevelopmentNew Release14 min readPublished June 26, 2026

Native computer use · browser, mobile, desktop · 78.4 OSWorld-Verified · preview, not GA

Gemini 3.5 Flash Computer Use: Agentic Automation

Google added computer use as a native, built-in tool inside Gemini 3.5 Flash on June 24, 2026 — the same production model used for function calling, Search grounding, and Maps, with no separate computer-use model to route to. It is a public preview, not general availability. The headline benchmark (78.4 on OSWorld-Verified) is self-reported, but the architecture shift and the pricing are the durable story.

DA
Digital Applied Team
Senior AI engineers · Published Jun 26, 2026
PublishedJun 26, 2026
Read time14 min
SourcesGoogle docs + 7 more
OSWorld-Verified
78.4
self-reported · GPT-5.5 at 78.7
Input price
$1.50/1M
GPT-5.5 $5 · exactly 30% per token
Context window
1M
8x the 128K standalone model
Output speed
289t/s
Google-stated · ~4x faster

Gemini 3.5 Flash computer use is now a native, built-in tool — announced June 24, 2026 as a public preview — letting a single Google production model see a screen and operate a browser, mobile device, or desktop without routing to a separate computer-use model. That architectural shift, not the headline benchmark, is the story: perception and action collapse into one inference pass.

This is not the standalone Gemini 2.5 Computer Use model that shipped in October 2025. That was a separate preview built on Gemini 2.5 Pro, browser-focused, capped at 128K tokens of context, and it forced developers to route between it and their main model. The June 24 update folds the capability directly into Gemini 3.5 Flash — Google’s fastest, most economical production model — and pairs it with a million-token context window.

This guide covers what actually shipped and how it differs from the legacy standalone model, an honest read of the OSWorld-Verified benchmark (every score on that board is self-reported), the cost case against GPT-5.5 specifically, the seven configurable safety categories you must lock down before deploying, and how a marketing or operations team can put it to work.

Key takeaways
  1. 01
    Computer use is native now, not a separate model.Announced June 24, 2026 as a public preview, computer use is built into gemini-3.5-flash alongside function calling, Google Search grounding, and Maps. One agent can see, reason, and act across browser, mobile, and desktop with no model-hopping.
  2. 02
    It is a preview, not general availability.The Gemini API changelog labels computer use a public preview as of June 24, 2026. Treat it as pre-production: prototype against it, but verify current status before you wire it into anything that touches real customer data or spend.
  3. 03
    The benchmark is effectively a tie, and self-reported.Gemini 3.5 Flash posts 78.4 on OSWorld-Verified versus GPT-5.5’s 78.7 — within 0.3 points. Every score on the OSWorld-Verified board is self-reported by the model provider, with no independent third-party verification as of June 2026.
  4. 04
    The cost case is specific to GPT-5.5.At $1.50 input / $9 output per million tokens, Gemini 3.5 Flash is priced at exactly 30% — roughly one-third — of GPT-5.5’s $5 / $30. That comparison holds against GPT-5.5, not against every frontier model, which all price differently.
  5. 05
    Safeguards exist, but most are opt-in.Seven configurable safety categories, a safety_decision on every action, plus opt-in enforced user confirmation and automatic task termination on detected prompt injection. Google’s own caveat is the honest one: “no single safeguard is foolproof.”

01What ShippedComputer use as a built-in tool, across three environments.

The change is small in the API and large in architecture. Where a developer previously called a dedicated computer-use model, computer use is now a tool you switch on inside Gemini 3.5 Flash — the same model you already use for function calling and grounding. You declare it with tools=[{"type": "computer_use", "environment": "browser|mobile|desktop"}], and the model returns actions on a normalized 0–1000 coordinate scale that you denormalize to the real screen.

One honest framing point up front: Gemini 3.5 Flash is not strictly the first Gemini model with built-in computer use — Gemini 3 Flash Preview supports it too. What June 24 delivered is the recommended and most capable built-in computer-use model, with browser, mobile, and desktop support and Google’s strongest stated agentic numbers to date.

For a developer the ergonomic change is concrete. Switching computer use on is a tool declaration, not a second integration: the model that already reasons about your task now also returns the click, the keystroke, and the scroll, in the same response stream as its function calls. The standalone path asked you to maintain two model clients, marshal context between them, and reconcile their separate failure modes. Collapsing that into one model is the kind of simplification that changes what a small team can actually ship — fewer moving parts, one bill, and one place to debug when an agent does the wrong thing.

Browser
Web automation
20+ action types

Click, double-click, right-click, type, scroll, navigate, drag-and-drop, hotkeys, and screenshot — the full vocabulary for driving a dashboard, ad console, or CRM in a real browser session.

click · type · navigate
Mobile
App-level control
distinct action set

A separate set tuned for devices: open_app, list_apps, click, type, long_press, drag-and-drop, press_key, go_back, and take_screenshot. The same model, a different interaction grammar.

open_app · long_press
Desktop
OS-level operation
browser-grade actions

Desktop shares the browser action vocabulary, extending control beyond the tab to the operating system — the environment the standalone 2.5 model was not optimized for.

Beyond the browser
Google, in its own words
“With built-in computer use capability, developers can now use 3.5 Flash to reliably build custom agents that can see, reason and take action across browser, mobile and desktop environments.” The capability ships via the Gemini API as a public preview and the Gemini Enterprise Agent Platform, with a Browserbase-hosted demo and a GitHub reference implementation to start from — not as a generally available product.

02What ChangedFrom a separate model to a native capability.

If you read our earlier guide to the standalone Gemini 2.5 Computer Use model, this is the next chapter — and a genuine architectural break, not a version bump. The 2.5 model (gemini-2.5-computer-use-preview-10-2025) was a discrete preview built on Gemini 2.5 Pro, optimized for the browser, and limited to 128K tokens of context. To use it you ran a two-model setup: a main model for reasoning, the computer-use model for screen actions, and your own glue code routing between them.

Native integration removes that hop. The same Gemini 3.5 Flash instance reasons and acts, carries a 1M-token context window — an eight-fold expansion over the standalone model’s 128K — and adds an intent field on every action that the 2.5 model did not emit. The table below traces the full lineage so the shift is visible at a glance.

Computer-use lineage across the Gemini model line, from the standalone Gemini 2.5 Computer Use preview (October 2025) to the native built-in tool in Gemini 3 Flash Preview and Gemini 3.5 Flash. Specifications are from the Gemini API computer-use documentation and Google blogs. OSWorld-Verified figures are self-reported by Google with no independent third-party verification as of June 2026.
Model IDComputer-use integrationEnvironmentsInput contextIntent fieldOSWorld-Verified
gemini-2.5-computer-use-preview-10-2025Standalone preview model (Oct 7, 2025), built on Gemini 2.5 ProBrowser-focused128K tokensNoNot on the board
gemini-3-flash-previewBuilt-in tool (preview)Browser · mobile · desktop1M tokensYes65.1 (self-reported)
gemini-3.5-flashBuilt-in tool · native · recommended (CU added Jun 24, 2026)Browser · mobile · desktop1M tokensYes78.4 (self-reported)

The eight-fold context jump is easy to under-read as a spec line, but it is the difference between an agent that remembers only the last few screenshots and one that holds an entire workflow in view. For a long-horizon task — search-ad setup, then the landing page, then the CRM record — 1M tokens lets the agent keep the whole chain coherent rather than losing the thread between steps.

The legacy model was no toy — it posted around 70% on the Online-Mind2Web web-task benchmark at its October 2025 launch, and for browser-only automation it remains usable. But it was a point solution: a model that did one job and handed control back. The native tool reframes computer use as a capability of a general production model rather than a destination you route traffic to. If you are running the standalone model today, treat 3.5 Flash as the forward path rather than a drop-in swap — the action vocabulary is broadly familiar, but the intent field, the wider environments, and the larger context change what you can ask the agent to attempt.

03The Architecture ShiftOne agent that sees, reasons, and acts.

The underreported value is what the native integration removes. With computer use, Google Search grounding, and Google Maps all available inside one Gemini 3.5 Flash call, a single agent can look at a screen, ground a fact against live search, and pull a location — without handing context between specialized models. Each model hop in a multi-model pipeline is a place where context gets translated, latency stacks, and errors propagate; collapsing perception and action into a single inference pass eliminates that entire class of failure.

See & act
Computer use
screen perception + actions

The agent takes a screenshot, decides the next action, and returns it with the intent behind it. Browser, mobile, and desktop, on a normalized 0–1000 coordinate grid.

The new built-in tool
Verify
Search grounding
live web facts

Google Search grounding lets the same agent check a claim against the live web mid-task — confirm a price, a policy, a competitor’s live offer — without a hand-off to a separate retrieval model.

Already in 3.5 Flash
Locate
Maps
places & geography

Google Maps resolves locations in the same call, so a workflow that needs an address, a store, or a service area does not break out into a third tool. One agent, three native capabilities.

Already in 3.5 Flash

The other quietly important addition is the intent field. Per the Gemini API docs, every action response includes an intent that explains the model’s reasoning for that step — for example, “Click the search box to type the destination.” For a regulated marketing or operations team this is not a nicety; it is an audit trail. You can log exactly why an agent clicked a button in your CRM, which is the difference between an action you can defend and one you cannot. The model runs the loop the docs describe: send a screenshot, receive a function call with an action plus its intent, execute it with denormalized coordinates, capture the next screenshot, and repeat until the task is done. For the broader capability profile, our full Gemini 3.5 Flash benchmark and API guide covers the non-computer-use side.

It is worth dwelling on why the intent field matters beyond debugging. Most coverage treated it as a developer convenience, but for a regulated buyer it is closer to a control. When an agent acts inside a CRM or a billing console, the question a compliance team asks is not “did it work” but “can you show why it did that.” An action log that pairs every click with the model’s stated reason is the difference between an automation you can put in front of an auditor and one you quietly turn off the first time it surprises someone. That is a capability the standalone 2.5 model did not offer, and it is arguably more consequential for enterprise adoption than the three-tenths of a benchmark point that dominated the headlines.

Context
Input token window
1M

1,048,576 input tokens with up to 65,536 output tokens (Google-stated) — an 8x expansion over the standalone 2.5 model’s 128K, enough to hold a full multi-step workflow in memory.

8x vs standalone
Speed
Output throughput
289t/s

Roughly 289 tokens per second, which Google frames as about 4x faster than competing frontier models. Treat both figures as vendor-stated; speed is the model’s central selling point.

Google-stated
Audit
Intent per action
1:1

Every action carries an intent field explaining the model’s reasoning for that step — new in 3.5 Flash relative to the 2.5 model, and the basis for a defensible audit log.

New in 3.5 Flash
"Computer use is now a built-in tool supported in Gemini 3.5 Flash, delivering our best performance yet for agentic computer use tasks."— Google DeepMind, Introducing computer use in Gemini 3.5 Flash, June 24, 2026

Dynamic thinking is on by default in Gemini 3.5 Flash, with the model allocating more compute to harder problems automatically — useful when a screen task occasionally needs deeper reasoning mid-loop. Google positions Antigravity, refreshed as Antigravity 2.0 at I/O 2026 and powered by 3.5 Flash, as the primary platform for building these agents, including parallel-executing subagent workflows.

The screen-driving numbers also sit on top of a model Google positions as broadly strong on agentic work. At I/O, Google reported Gemini 3.5 Flash outperforming Gemini 3.1 Pro on several agentic benchmarks — 76.2% on Terminal-Bench 2.1, 83.6% on MCP Atlas, 1656 Elo on GDPval-AA, and 84.2% on CharXiv multimodal reasoning. Treat those as vendor-stated like everything else here, but they explain why a Flash-tier model can credibly anchor an agent at all: computer use is only as good as the reasoning deciding what to click, and 3.5 Flash is not a lightweight model wearing a screen-control hat.

04BenchmarksThe leaderboard, read honestly.

Gemini 3.5 Flash scores 78.4 on OSWorld-Verified, the benchmark for agentic desktop control. GPT-5.5 sits at 78.7 and Claude Opus 4.7 at 78.0 — a gap of 0.3 points at the top, which independent coverage has described as effectively a three-way tie. The right reading is not that Gemini 3.5 Flash “matches” GPT-5.5 as a definitive claim, but that the three are within any plausible margin of error.

One caveat governs the entire chart below, so state it plainly: every score on the OSWorld-Verified board is self-reported by the model provider. There is no independent third-party verification as of June 2026. Read the bars as vendor claims arranged on a common axis, not as audited results.

OSWorld-Verified · self-reported computer-use scores

Source: OSWorld-Verified scores via The Decoder and MLQ.ai — all figures self-reported by model providers, no independent third-party verification as of June 2026
Claude Fable 5Self-reported · access suspended (export controls)
85.0
Suspended
Claude Opus 4.8Self-reported
83.4
GPT-5.5Self-reported · top of the active field
78.7
Tied tier
Gemini 3.5 FlashSelf-reported · within 0.3 of GPT-5.5
78.4
This release
Claude Sonnet 4.6Self-reported
78.4
Claude Opus 4.7Self-reported
78.0
GPT-5.4 miniSelf-reported
72.1
Gemini 3 FlashSelf-reported · prior built-in CU model
65.1
Gemini 3.5 FlashOther models (self-reported)

The more interesting question is strategic, not numeric. As one analysis of the launch put it, Google embedded computer use directly into its fastest, most economical model rather than reserving it for a premium tier — the opposite of how the other frontier labs have so far packaged the capability. That is a deliberate bet that cost and speed, not a marginal lead on raw capability, are what get agents into production. If the benchmark is a tie, the model that drives the screen for a third of the price tends to win the deployments, and the leaderboard row becomes a footnote to the procurement decision.

There is a real ceiling to acknowledge in the same breath. The absolute top of the self-reported board belongs to the Claude line — Opus 4.8 at 83.4 and Fable 5 at 85.0, the latter with access suspended under export controls. A team chasing the highest possible task success on the hardest desktop workflows may still reach for a premium model. The argument for Gemini 3.5 Flash is not that it is the most capable screen agent; it is that it is the most capable screen agent you can afford to run at volume, which for most real automation is the constraint that actually binds.

Why the headline number is the wrong fight
The “78.4 vs 78.7” story misses what changed. None of these scores captures the thing native integration actually improves — the wall-clock time and error rate of a full task, where removing the model-to-model hop matters more than a fraction of a point on per-step accuracy. For a side-by-side of how the frontier labs approach computer use, see our Claude, OpenAI, and Gemini computer-use comparison.

05The Cost CaseThe number that actually moves a decision.

With the benchmark a tie, price is the lever. Gemini 3.5 Flash is $1.50 per million input tokens and $9.00 per million output tokens, with cached input at $0.15. The widely-quoted “roughly one-third the cost” line is precise, and it is precise against GPT-5.5 specifically: GPT-5.5 lists $5 input and $30 output, so $1.50 is exactly 30% of $5 and $9 is exactly 30% of $30. That clean one-third holds against GPT-5.5 — not against every frontier model, each of which prices differently.

Cost-efficiency comparison of Gemini 3.5 Flash and GPT-5.5, the two models with published per-token pricing. The cost-efficiency index is the OSWorld-Verified score divided by the input price in dollars per million tokens (78.7 divided by 5 equals 15.7; 78.4 divided by 1.5 equals 52.3). OSWorld-Verified scores are self-reported by the model providers, so this index is directional, not audited.
ModelOSWorld-VerifiedInput ($/Mtok)Output ($/Mtok)Cost-efficiency index
GPT-5.578.7self-reported$5.00$30.0015.7
Gemini 3.5 Flash78.4self-reported$1.50$9.0052.3

Read the last column, not the second. On a cost-efficiency index — benchmark score divided by input price — Gemini 3.5 Flash returns 52.3 against GPT-5.5’s 15.7, roughly 3.3x more benchmark per dollar at statistically indistinguishable performance. The “78.4 vs 78.7” headline becomes close to irrelevant once you normalize for cost: you are paying three times less for the same measured capability. Two honest hedges keep this directional rather than definitive — the scores are self-reported, and real-world token mix per session shifts the effective ratio away from the clean per-token figure.

At a marketing team’s volume the gap stops being abstract. Agentic computer-use loops are token-hungry — every screenshot, every intent, every step of the loop adds input and output tokens — so a workflow run hundreds or thousands of times a month accumulates real spend, and it is the output line, billed at $9 against GPT-5.5’s $30, where the difference compounds across a long task. We avoid putting a single dollar figure on it here because the honest number depends on your token mix per session, which varies with how visual and how long each task is. The durable point is narrower and more useful than a hero stat: a model priced at a third of the alternative changes which automations clear the bar of being worth building at all.

Input
Per million tokens
$1.50

Exactly 30% of GPT-5.5’s $5 input rate. The same 30% ratio holds on output ($9 vs $30). This is the GPT-5.5 comparison specifically, not a claim about the whole frontier.

30% of GPT-5.5
Output
Per million tokens
$9.00

Against GPT-5.5’s $30 output rate. For agentic computer-use loops, which generate a lot of intermediate tokens, the output line is where the cost gap compounds across a long task.

vs $30 GPT-5.5
Cached
Cached input
$0.15

Cached input drops to $0.15 per million tokens. For agents that repeatedly re-send a stable system prompt and tool schema across a session, caching compresses the input bill further.

Repeat-prompt savings

06Safety & GovernanceWhat you must lock down before you deploy.

An agent that can click anything in a logged-in browser is exactly as dangerous as that sounds, and Google’s safety design reads like a map of what worries them. The API ships seven configurable safety-policy categories, and every action response carries a safety_decision with a value of regular/allowed, require_confirmation, or blocked. The categories can be overridden via disabled_safety_policies — which is precisely why you need to decide your posture deliberately rather than inherit a default.

The table below maps all seven categories to the marketing and operations failure each one guards against. They line up almost exactly with the threat model for an automation agent: paying for things, editing records, sending messages, agreeing to terms.

The seven configurable safety-policy categories in the Gemini computer-use API mapped to a marketing or operations risk scenario and a suggested posture for autonomous agents. Category names are verbatim from the Gemini API documentation; the risk-scenario and posture columns are Digital Applied editorial interpretation, not part of the API.
Safety categoryWhat it gatesMarketing / ops risk scenarioSuggested posture
FINANCIAL_TRANSACTIONSPayments, purchases, ad-spend commitmentsAn agent books campaign budget or pays an invoice in a billing console without sign-off.Keep enforced confirmation on
SENSITIVE_DATA_MODIFICATIONEdits to sensitive recordsAn agent overwrites a CRM contact, deal value, or customer record while updating the pipeline.Confirmation on
COMMUNICATION_TOOLSending messages and emailAn agent sends an email or direct message to a customer list from a connected inbox.Confirmation on
ACCOUNT_CREATIONCreating new accountsAn agent signs up for a SaaS tool or ad account mid-task.Block for autonomous agents
DATA_MODIFICATIONGeneral data writesAn agent edits a shared spreadsheet, dashboard, or scheduled report.Confirmation on
USER_CONSENT_MANAGEMENTConsent and privacy settingsAn agent toggles a cookie, consent, or subscription preference on a live property.Block for autonomous agents
LEGAL_TERMS_AND_AGREEMENTSAccepting terms and contractsAn agent clicks “I agree” on a vendor’s terms of service or data-processing addendum.Block for autonomous agents

A screen agent introduces a threat ordinary chat models do not face: the attack can arrive through the pixels. A malicious instruction hidden in a webpage, an ad, or a rendered document can try to hijack the agent mid-task — telling it to exfiltrate data or click something it should not. Because the agent operates inside your authenticated sessions, a successful injection inherits your permissions: it is not a chatbot saying something off-policy, it is software acting as a logged-in you. That is the specific reason a screen agent needs more than the safety categories above.

Two further safeguards address that exposure: prompt injection through the pixels themselves. Prompt-injection detection is opt-in — you enable it with enable_prompt_injection_detection — and it scans screenshots for hidden adversarial instructions, blocking execution when it finds them. Google says it applied targeted adversarial training to harden 3.5 Flash against these visual exploits, and offers two opt-in enterprise controls: enforced user confirmation before sensitive or irreversible actions, and automatic task termination on detecting indirect prompt injection. Both are off by default. If you are designing the defensive layer, our prompt-injection defense framework covers the layered approach production agents need.

The caveat Google states plainly
Google’s own line is the one to internalize: “No single safeguard is foolproof.” The categories, the confirmation gate, and the injection detector are opt-in layers, not a guarantee. Treat a screen agent in a logged-in environment as you would a junior employee with your passwords — scoped permissions, human confirmation on anything that spends money or changes a record, and a reviewable log of every action.

07Putting It to WorkWhere a marketing or ops team starts today.

Google named Salesforce, Xero, Shopify, and Ramp among the early adopters building on Gemini 3.5 Flash for automation — supplier identification, invoice OCR, growth forecasting, multi-subagent enterprise tasks. Read those as broad 3.5 Flash adoption rather than proof that each one runs computer-use workloads specifically; the coverage attributes them to the model, not to this preview tool in particular. The practical starting points for a smaller team are narrower and lower-risk, and they map cleanly onto the safety postures above.

The named examples are instructive even with that caveat. Coverage describes Salesforce wiring 3.5 Flash into multi-subagent enterprise tasks on its Agentforce platform, Xero running autonomous accounting steps like supplier identification and tax-form processing, Shopify using parallel subagents for merchant growth forecasting, Ramp reading complex invoices against historical patterns, and Macquarie Bank applying it to customer onboarding. The pattern across them is high-volume, repetitive back-office work where a fast, cheap model pays off — the same shape a marketing or operations team faces, just at a different scale. A smaller team does not need an enterprise platform to start; it needs one well-scoped workflow.

Lowest risk
Campaign & dashboard QA

Point a research-only agent at ad consoles and analytics dashboards to read state and flag anomalies — no writes, no spend. The intent field gives you a reviewable log of what it checked and why.

Start here, read-only
Human-in-loop
CRM data entry & lead routing

Let the agent draft record updates across a browser CRM, but keep enforced confirmation on for SENSITIVE_DATA_MODIFICATION and DATA_MODIFICATION so a person approves each write before it lands.

Confirm every write
Cross-surface
Unified reporting

Use the native stack — computer use plus Search grounding plus Maps in one agent — to pull a screen metric, verify a fact, and resolve a location in a single pass, instead of stitching three tools together.

One agent, one pass
Guardrails first
Anything that spends or sends

Booking ad budget, sending customer email, accepting vendor terms: keep FINANCIAL_TRANSACTIONS, COMMUNICATION_TOOL, and LEGAL_TERMS_AND_AGREEMENTS gated, and turn on prompt-injection detection before the agent touches a logged-in tab.

Gate, then automate

A concrete first project makes this tangible. Take weekly campaign QA: an agent opens each ad platform in a browser, reads spend pacing and policy status across accounts, and writes a single plain-language summary — reading state, never changing it. The intent field documents each check, so the output is auditable from day one, and because nothing is written, the FINANCIAL_TRANSACTIONS and DATA_MODIFICATION categories never come into play. It is the lowest-risk way to learn whether a screen agent is reliable enough on your own surfaces before you hand it anything that writes, spends, or sends.

The honest sequencing is the same one we use with clients: prove value on read-only QA, add human-confirmed writes once the audit log earns trust, and only then widen autonomy behind the safety categories. That scoping — which workflows, which guardrails, which confirmation gates — is exactly where our agentic AI transformation engagements begin, before any model commitment.

08ConclusionThe model becomes the agent.

The shape of computer use, June 2026

Native computer use makes the model the agent, not a step in the pipeline.

The June 24 update is best read as an architecture change wearing a benchmark headline. Folding computer use into Gemini 3.5 Flash — alongside Search grounding and Maps — lets a single agent see, reason, and act in one inference pass, and the new intent field gives that agent the audit trail an enterprise needs to deploy it. The 78.4 OSWorld-Verified score, self-reported like every score on that board, is a tie with GPT-5.5, not a lead.

Keep the framing precise. This is a public preview, not general availability. The cost advantage is real and specific — exactly one-third of GPT-5.5’s per-token price — but it is a GPT-5.5 comparison, not a claim about the whole field. And the safeguards, from the seven safety categories to injection detection, are mostly opt-in, behind Google’s own admission that no single safeguard is foolproof.

The forward read is straightforward. When the cheapest, fastest production model also drives a screen, computer use stops being a premium add-on and starts being a default expectation of an agent platform. The teams that win will not be the ones chasing the top leaderboard row — they will be the ones who wire a tied-but-cheaper model into a real workflow, behind real guardrails, with a log they can defend. That, not a fraction of a benchmark point, is what this release actually changes.

Turn screen agents into governed workflows

Native computer use makes governed agentic automation genuinely affordable.

We help marketing and operations teams design agentic computer-use pipelines on models like Gemini 3.5 Flash — read-only QA first, human-confirmed writes next, and the safety-category guardrails locked down before any autonomy, delivered in days not quarters.

Free consultationExpert guidanceTailored solutions
What we work on

Agentic automation engagements

  • Computer-use agent scoping — which workflows, which guardrails
  • Safety-category posture for marketing and ops risk
  • Audit logging on the intent field for compliance
  • Cost modelling — Gemini 3.5 Flash vs GPT-5.5 per workload
  • Read-only to human-in-loop to autonomous rollout sequencing
FAQ · Gemini 3.5 Flash computer use

The questions we get every week.

Computer use is a built-in tool inside Gemini 3.5 Flash that lets the model see a screen and operate a browser, mobile device, or desktop. Google announced it on June 24, 2026 as a public preview, available through the Gemini API and the Gemini Enterprise Agent Platform, with a Browserbase-hosted demo and a GitHub reference implementation. The important distinction is that it is native: the same gemini-3.5-flash model used for function calling, Google Search grounding, and Maps now also drives the screen, so you do not route between a reasoning model and a separate computer-use model. You declare it as a tool with a browser, mobile, or desktop environment, and the model returns actions on a normalized 0 to 1000 coordinate scale.
Related dispatches

Continue exploring frontier releases.