Gemini 3.5 Live Translate is Google's real-time multilingual CX engine — a streaming speech-to-speech model that, per Google, listens in one language and speaks back in another across 70+ languages with automatic detection, staying only a few seconds behind the speaker. It launched on June 9, 2026 across Google Translate, the Gemini Live API, Google AI Studio, and Google Meet.

The headline most coverage led with — 70+ languages, "a few seconds" of lag — is the least interesting part of the story. The real shift is architectural: Google dropped the cascaded transcribe-then- translate-then-synthesize pipeline that powered Meet's earlier translation in favour of a single audio-to-audio model. That one change is why Google Meet jumps from 5 English-only language pairs to 2,000+ combinations in a single meeting, and why the translated voice can — by Google's account — track the speaker's own pacing instead of sounding flat.

This guide covers what actually shipped and on which surfaces, the architecture gap in plain English, a build-vs-buy decision matrix for multilingual CX teams, how to wire the public-preview Live API, the SynthID watermarking angle that maps directly onto the EU AI Act, and where the field — including DeepL Voice — sits today. Every number below is sourced to a primary or named-secondary citation, and the vendor-stated claims are labelled as such.

Key takeaways

01
A single audio-to-audio model, four surfaces, one day.Gemini 3.5 Live Translate launched June 9, 2026 across Google Translate (iOS/Android), the Gemini Live API (public preview), Google AI Studio (public preview), and Google Meet (private preview for select Workspace customers).
02
70+ languages with automatic detection.No manual language configuration — the model detects the spoken language on the fly and supports translation across 70+ languages, a meaningful step up from turn-based systems that required a pre-set pair.
03
Google Meet jumps from 5 languages to 2,000+ combinations.Meet's prior speech translation (GA January 27, 2026) supported 5 English-only pairs. The new system supports 2,000+ language combinations in a single meeting — by removing the intermediate text step.
04
The Live API is live for builders today; Meet is private preview.Any team with a developer can stand up a multilingual voice agent now via the public-preview Live API. Enterprise marketers can pursue the Meet private preview, with broader rollout planned for the second half of 2026.
05
SynthID watermarking lines up with the EU AI Act.All generated audio is watermarked with Google's SynthID. The EU AI Act's transparency obligation on AI-generated content becomes enforceable August 2, 2026 — a regulatory-readiness signal for enterprise procurement and legal teams.

01 — What LaunchedOne model, four surfaces, in preview.

On June 9, 2026, Google released Gemini 3.5 Live Translate — model ID gemini-3.5-live-translate-preview — and rolled it onto four surfaces at once. The consumer Google Translate apps on iOS and Android picked up a live mode; the Gemini Live API and Google AI Studio opened it in public preview for developers; and Google Meet got it in private preview for select business Workspace customers, with broader enterprise rollout planned for the second half of 2026.

The model accepts spoken audio and returns spoken audio in the target language, detecting the source language automatically across 70+ languages. It is, importantly, a preview release: Google has not committed to production-grade SLAs, stable pricing, or general availability. Treat it as a strong signal of where multilingual CX is heading and a viable surface to pilot — not a finished product to build a mission-critical SLA on.

Consumer · live now

Google Translate

iOS / Android · headphones or Listening Mode

Tap 'Live translate' in the bottom-left corner with headphones connected. A new Android 'Listening Mode' lets users hold the phone to the ear like a call — no Pixel Buds required, removing the earbud dependency the earlier feature carried.

Any connected headphones

Builder · public preview

Gemini Live API

gemini-3.5-live-translate-preview · AI Studio

Audio-only, real-time translation via a translationConfig block. Available today in public preview through the Live API and Google AI Studio. Integration partners include Agora, Fishjam, LiveKit, Pipecat, and Vision Agents.

ai.google.dev/gemini-api

Enterprise · private preview

Google Meet

select Workspace customers · H2 2026 GA planned

Speech translation via a new button in the Meet control row. Currently private preview for select business Workspace customers; broader enterprise rollout is planned for the second half of 2026.

Private preview now

Launch snapshot

Gemini 3.5 Live Translate launched June 9, 2026 across Google Translate, the Gemini Live API and AI Studio (public preview), and Google Meet (private preview). The model card, capability matrix, and safety documentation are public on the Google AI for Developers model page. Per the docs, input and output token limits are 131,072 and 65,536 respectively, and the model is described as a "low-latency, audio-to-audio model optimized for real-time translation of spoken conversations."

02 — The Architecture GapWhy audio-to-audio beats the three-hop pipeline.

Most coverage says "continuous translation" and moves on. The detail that actually matters is what got removed. Older real-time translation — including Meet's previous system — ran a cascaded three-step pipeline: transcribe the speech to text (STT), translate that text, then synthesize the translated text back to speech (TTS). Each hop adds latency and a place for errors to compound — a mistranscription becomes a mistranslation becomes a confidently wrong spoken sentence.

Gemini 3.5 Live Translate replaces all three hops with a single audio-to-audio model: speech goes in, speech comes out, with no intermediate text representation. Google describes the result as streaming continuously rather than waiting for sentence boundaries — which is what lets the system, in Google's words, stay "just a few seconds behind the speaker throughout the session." There is no independent latency benchmark for that claim as of this writing, so treat the "few seconds" figure as Google-stated.

What changed: Meet translation, old pipeline vs Gemini 3.5 Live Translate

Source: Google Blog (Jun 9, 2026) + Google Workspace Updates (Jan 27, 2026 GA). Latency and language figures are vendor-stated.

Pipeline hopsCascaded STT → translate → TTS vs single model

3 → 0

Languages (Meet)5 English-only pairs → 70+ with auto-detect

5 → 70+

Combinations per meeting (Meet)5 → 2,000+ language combinations

5 → 2,000+

Latency (Google-stated)Turn-based wait → continuous, 'a few seconds' behind

few sec

The second-order effect is the one CX teams should care about. When translation no longer routes through a text bottleneck, the translated audio can — per Google — preserve the speaker's intonation, pacing, and pitch rather than producing flat synthesized output. That matters because tone carries meaning: a reassuring support agent and a curt one say the same words. We haven't seen an independent A/B listening test of that claim, so it remains Google-stated — but the architectural reason it's plausible (no TTS voice substitution step) is sound.

"balancing the trade-off between waiting for context to improve quality and translating immediately to stay in sync with the speaker"— Anuda Weerasinghe and Tony Lu, Google (via Technobezz)

That quote from Google's product and engineering leads is the whole game in one sentence. Every real-time translation system lives on a spectrum between waiting (more context, better quality, more lag) and translating immediately (lower lag, riskier on ambiguous phrasing). The continuous-stream architecture is Google's bet that you can sit closer to the immediate end without the quality cliff that turn-based systems hit at every sentence boundary. The consumer side of this evolution is worth watching alongside Google Translate's live headphone mode, which this release extends.

03 — Google MeetFrom 5 languages to 2,000+ combinations, in 133 days.

Here is the speed story. Google Meet reached general availability on its speech translation feature on January 27, 2026 — supporting exactly five language pairs, all English-only: English to and from Spanish, French, German, Portuguese, and Italian, on Workspace Business Standard/Plus and Enterprise Standard/Plus plans. That was the state of the art for Meet roughly four months ago.

On June 9, 2026, that system was effectively obsoleted: 70+ languages, 2,000+ combinations in a single meeting, automatic detection. From GA to superseded in 133 days. For enterprise marketers and CX leaders, the lesson isn't the specific numbers — it's the tempo. Translation capability that you'd have scoped as a year-long roadmap item in 2024 is now shipping and re-shipping inside a single quarter. The table below makes the jump concrete.

Google Meet translation: the January 2026 GA system compared with Gemini 3.5 Live Translate, June 2026.
Dimension	Meet · GA (Jan 2026)	Meet · Gemini 3.5 Live Translate (Jun 2026)
Architecture	Cascaded: STT → text translate → TTS (3 hops)	Single audio-to-audio model (0 intermediate text hops)
Languages supported	5	70+ (automatic detection)
Combinations per meeting	5 (English-only pairs)	2,000+
Latency mode	Turn-based (waits for sentence end)	Continuous streaming · "a few seconds" behind (Google-stated)
Voice preservation	Limited (synthesized voice, not speaker-matched)	Preserves intonation, pacing, pitch (Google-stated)
Rollout status	GA · Business Std/Plus, Enterprise Std/Plus	Private preview → H2 2026 broader rollout planned

One caveat worth setting for stakeholders: the January GA system is generally available across eligible plans today, while the new model is private-preview only. If you're planning multilingual all-hands or customer calls in the next quarter, the five-language GA system is what you can actually rely on right now; the 2,000+ combination experience is a pilot you sign up for, not a switch you flip.

04 — Build vs BuyFour access paths, one decision matrix.

The most useful framing for a CX or marketing leader isn't "is this good?" — it's "which door do I walk through, and when?" There are four distinct ways to put Gemini 3.5 Live Translate to work, and they sort cleanly along two axes: how much engineering you have, and whether your channel is a custom app, a consumer touchpoint, or video meetings. The matrix below maps each path to a realistic timeline and its key constraint.

Build-vs-buy decision matrix for deploying Gemini 3.5 Live Translate across four access paths, with timeline and key constraint for each.
Buyer type	Channel	Access path	Timeline	Key constraint
Developer / startup	Custom voice app	Gemini Live API public preview	Today	Must handle a raw PCM audio pipeline
Developer / startup	Consumer touchpoint (kiosk, event)	Translate Android "Listening Mode"	Today	Single-device, in-person only
Enterprise marketer	Video meetings (sales, support, CS)	Google Meet private preview	June 2026 sign-up	Workspace eligibility required
Enterprise marketer	Full CX voice stack	Via Agora / LiveKit / Pipecat	Today (partner platforms)	Partner SLA / pricing varies
Non-developer team	Internal multilingual meetings	Google Meet → H2 2026 broader rollout	H2 2026 (wait)	No dev required — standard Meet

Have a developer, need it now

Custom voice agent via the Live API

If you have engineering capacity and a contact-centre, webinar, or event use case, the public-preview Live API is the move — you can stand up a multilingual voice agent today. The trade-off is owning the raw audio pipeline and accepting preview-grade stability.

Build on the Live API

Enterprise, video-meeting use case

Pilot Google Meet

If your need is multilingual sales, support, or CS calls and you're on an eligible Workspace plan, sign up for the Meet private preview now. Keep the January GA five-language system as the reliable fallback until broader rollout.

Production CX stack, no Google lock-in

Embed via RTC partners

For a full CX voice stack you don't want tied to Meet, the Live API integrates through Agora, Fishjam, LiveKit, Pipecat, and Vision Agents. This is the path for embedding translation into an existing voice pipeline — but partner SLAs and pricing vary, so scope those before committing.

Integrate via a partner

No engineering, internal-only need

Wait for Meet GA

If you have no developer and the use case is internal multilingual meetings, the disciplined move is to wait for the broader Meet rollout planned for H2 2026. Don't over-build for a capability that's about to arrive as a standard Meet button.

Wait for H2 2026

Our read: most marketing and CX teams should be in one of two camps right now. If you have engineering capacity, prototype a single high-value voice touchpoint on the Live API this quarter — the learning is worth more than the polish. If you don't, sign up for the Meet preview to evaluate, but plan your reliable multilingual coverage around what's GA today. Either way, the voice model is one layer; standing up a production-grade voice agent infrastructure stack needs turn-taking, telephony, observability, and fallback handling around it.

05 — Wiring the APIThe translationConfig block, decoded.

For builders, the Live API is refreshingly narrow by design. You configure translation through a translationConfig block inside generationConfig, with two key fields: targetLanguageCode (a BCP-47 tag like "pl", "es", defaulting to "en") and echoTargetLanguage (a boolean controlling whether input already in the target language is echoed back or silenced). Source language detection is automatic.

The audio specs are exact and worth getting right before you debug a silent stream: input is raw 16-bit PCM at 16 kHz, mono, little-endian; output is raw 16-bit PCM at 24 kHz, mono, little-endian; and the recommended chunk size is 100 ms. Optional inputAudioTranscription and outputAudioTranscription flags return text transcripts alongside the translated audio — useful for accessibility, logging, and compliance trails.

Audio input

Raw 16-bit PCM, mono

16kHz

Little-endian, recommended 100 ms chunks. Text input is explicitly unsupported in translation mode — the model accepts audio only, to hold its real-time latency targets.

Audio-only in

Audio output

Raw 16-bit PCM, mono

24kHz

Little-endian translated speech. Optional inputAudioTranscription and outputAudioTranscription flags add text transcripts for accessibility, logging, and compliance.

Transcripts optional

Token limits

Input · 65K output

131K

Input limit 131,072, output 65,536 per the model card. Function calling, system instructions, and Search grounding are unavailable in translation mode — it is a single-purpose translation surface.

Translation mode only

Security note for client-side apps

For client-side applications, the Live API supports ephemeral token authentication on the v1alpha endpoint. Tokens can lock the translation configuration by default, which prevents end users from tampering with the target language or other settings — the right pattern for a kiosk, event device, or public-facing app where you don't control the client.

Two limitations to plan around. First, translation mode strips the features you might reflexively reach for — no function calling, no system instructions, no Search grounding — because the model is optimised for one job. If your agent needs to do things as well as translate, you'll orchestrate translation as one node in a larger pipeline, not as the whole agent. Second, Google's own docs flag three preview-stage limitations: voice replication can drift across long pauses, language detection can struggle with heavy accents and similar language pairs, and background-audio filtering is available but incomplete. Build your pilot around clean-audio, single-speaker scenarios first.

06 — SynthID & ComplianceThe watermark that maps onto the EU AI Act.

Here's the angle nobody else is connecting. Every audio output Gemini 3.5 Live Translate generates is watermarked with Google's SynthID — an imperceptible marker embedded directly into the audio waveform to flag the content as AI-generated. On its own, that's a responsible-AI footnote. In the context of the EU AI Act, it's a procurement differentiator.

The EU AI Act's transparency obligation requiring machine-readable labels on AI-generated content (Article 50) becomes enforceable on August 2, 2026. Enterprises deploying AI voice translation into European customer interactions will need a story for how generated audio is labelled. SynthID watermarking on every output means the model arrives ahead of that deadline by default — which is exactly the kind of vendor-risk question enterprise legal and procurement teams ask, and which most launch coverage skipped entirely.

For legal & procurement teams

The relevant EU AI Act provision for AI-content labelling is Article 50 (transparency), enforceable August 2, 2026 — not Article 73, which covers a different obligation. Google's SynthID audio watermarking is described as embedded directly into the waveform on every generated output. Verify how this maps to your specific compliance obligations with counsel before relying on it as a control; vendor watermarking is one input to a labelling strategy, not the entirety of one.

07 — The FieldDeepL, the scale story, and what's real.

Google isn't alone in voice. DeepL — long known for text translation — launched DeepL Voice on April 16, 2026, with real-time voice-to-voice translation across 40+ languages targeting enterprise meetings and customer service. The important architectural distinction: DeepL Voice currently runs a cascaded STT → translate → TTS pipeline, not an end-to-end audio model. DeepL's roadmap mentions an end-to-end model in development, but as of this writing it has not shipped. So the head-to-head isn't "two end-to-end models" — it's Google's audio-to-audio approach versus DeepL's cascaded one, with DeepL's historically strong translation quality as its differentiator.

For scale context, Google Translate as a whole processes roughly one trillion words per month across Translate, Search, Lens, and Circle to Search, serving over a billion monthly users across nearly 250 languages and covering about 95% of the world's population. Notably, more than a third of Google Translate's live-translate sessions last longer than five minutes — a signal that people use it for real conversations, not just quick lookups. That's the install base Gemini 3.5 Live Translate is being pushed into.

"over a trillion words being translated for billions of users across our products every month"— Google (Translate product team), 20th anniversary blog

One real-world deployment worth naming carefully: per a Google partner announcement, the Southeast Asian ride-hailing platform Grab is testing Gemini 3.5 Live Translate for real-time driver-passenger communication at pickups, on a platform Google describes as carrying over 10 million voice calls per month. That figure comes from Google's partner announcement rather than Grab's own filings, so treat it as indicative of scale rather than an audited number. The pattern it illustrates, though, is the one CX teams should internalise: a language mismatch at the moment of a pickup is the kind of silent friction that quietly fails interactions no dashboard flags.

08 — The CX PlaybookWhere language mismatch silently costs you.

Strip away the model specs and the question for a CX leader is simple: where in your customer journey does a language mismatch cause churn that never shows up as a complaint? Unanswered support calls. Abandoned chats. Tickets that get opened, half-understood, and quietly closed. The business case for multilingual CX has always leaned on the intuition that people prefer to transact in their own language — an idea long popularised in localisation research — and real-time voice translation is the first technology that lets a small team serve that preference live, without a multilingual headcount.

The strategic move isn't "turn on translation everywhere." It's to map the two or three touchpoints where language is currently a silent failure point, pilot one of them on the access path that matches your resources, and instrument it so you can actually measure the difference — resolution rate, call completion, conversion on previously-stalled segments. Real-time voice is one layer of a multilingual CX strategy; the web layer still needs your multilingual SEO and hreflang strategy done right in parallel, or you'll translate the conversation and lose the customer before they ever start it.

The DA angle

The teams that win here won't be the ones who adopt the flashiest demo — they'll be the ones who pick the right access path for their resources and pilot a single high-value touchpoint with real measurement. Our AI transformation engagements start with exactly that: mapping where language friction costs you, choosing build-vs-buy honestly, and standing up a measured pilot — delivered in days, not quarters.

09 — ConclusionThe capability is here; the discipline is the differentiator.

Real-time multilingual CX, June 2026

The model is the easy part — knowing which door to walk through is the strategy.

Gemini 3.5 Live Translate is a genuine step change in real-time translation, and the reason is architectural, not numeric: removing the cascaded transcribe-translate-synthesize pipeline in favour of a single audio-to-audio model is what unlocks 70+ languages with automatic detection, 2,000+ combinations in a single Meet call, and a translated voice that — by Google's account — keeps the speaker's pacing.

The honest framing matters. The latency, voice-preservation, and partner-scale claims are Google-stated and not yet independently benchmarked, and the model is in preview — no production SLA, no stable pricing, no GA. That's not a reason to wait; it's a reason to pilot rather than bet the contact centre. The Live API is live for builders today, the Meet private preview is open for enterprise evaluation, and the reliable GA fallback is still the five-language system from January.

The differentiator for CX teams won't be access to the model — everyone gets that. It'll be the discipline of mapping where language mismatch silently costs you, picking the access path that matches your engineering reality, layering it onto the rest of your multilingual footprint, and measuring the result. The capability has arrived faster than most roadmaps planned for. The strategy is still yours to get right.

Gemini 3.5 Live Translate: 70+ Languages, real-time multilingual CX

01 — What LaunchedOne model, four surfaces, in preview.

Google Translate

Gemini Live API

Google Meet

02 — The Architecture GapWhy audio-to-audio beats the three-hop pipeline.

What changed: Meet translation, old pipeline vs Gemini 3.5 Live Translate

03 — Google MeetFrom 5 languages to 2,000+ combinations, in 133 days.

04 — Build vs BuyFour access paths, one decision matrix.

Custom voice agent via the Live API

Pilot Google Meet

Embed via RTC partners

Wait for Meet GA

05 — Wiring the APIThe translationConfig block, decoded.

Raw 16-bit PCM, mono

Raw 16-bit PCM, mono

Input · 65K output

06 — SynthID & ComplianceThe watermark that maps onto the EU AI Act.

07 — The FieldDeepL, the scale story, and what's real.

08 — The CX PlaybookWhere language mismatch silently costs you.

09 — ConclusionThe capability is here; the discipline is the differentiator.

The model is the easy part — knowing which door to walk through is the strategy.

Real-time voice removes the language barrier — knowing where to point it is the work.

Multilingual CX engagements

The questions we get every week.

Continue exploring frontier releases.

Google's AI Ad Disclosure Labels: An Advertiser Playbook

Google Ads Now Writes Your Product Ads. Stay in Control

AI Compliance & Governance Glossary 2026: 100 Terms

LLMO Guide 2026: Optimizing Content for LLMs