Google launched Gemini Omni at I/O 2026 today — a new multimodal generation family that accepts text, image, audio, and video inputs in any combination and produces video output, with SynthID watermarking baked into every clip at the model level. Gemini Omni Flash is the first shipping member; it is separate from the Veo line and deliberately positioned as the consumer-first, chat-native counterpart to Veo's developer-first API surface.
The 10-second output cap at launch is not a model limitation — Google's product management director Nicole Brichtova called it “a decision based on a desire to get it into more hands.” The more consequential design choice is what Google held back: voice and speech editing, image-from-audio, audio-from-video, and unfettered real-person depiction are all absent at launch, with Google citing responsible-deployment review as the reason. In a US midterm-election year, the timing is notable.
This guide covers what actually shipped, what was withheld and why, where Omni runs, how pricing maps across AI Plus / Pro / Ultra and Google Flow credits, how Omni compares to Sora 2, Veo 3.1, Kling 2, and Runway Gen-4, and what the conversational editing surface means for creative workflows. Developer API and Omni Pro timelines at the end.
- 01Gemini Omni Flash shipped May 19 — any-to-any multimodal input, 10-second video output.Text, image, audio, and video inputs accepted in any combination. Output is video, capped at 10 seconds per clip at launch. SynthID watermark on every output; no API knob to disable it. The Omni family is separate from the Veo line.
- 02Three consumer surfaces live today — two paid, one free.Gemini app and Google Flow require paid AI Plus ($7.99/mo), AI Pro ($19.99/mo), or AI Ultra ($100 or $200/mo) subscriptions. YouTube Shorts Remix and YouTube Create are free for users 18 and older.
- 03Voice and speech editing were intentionally withheld at launch.Google stated it is still testing the capability to understand how to bring it to users responsibly. Third-party coverage frames this as an election-year deepfake-safety hedge. The model reportedly can do more than what shipped.
- 04Avatar onboarding blocks unfettered real-person depiction.To generate a video featuring your own face, you record yourself speaking a number sequence. That spoken-number handshake is the anti-deepfake control for real-person depiction. Bypassing it is blocked at the model level.
- 05AI Ultra repriced from $250 to $200/mo at I/O — and a new $100/mo entry tier was added.Google published the price reduction in its I/O subscription update. Flow credits per tier: AI Plus 200, AI Pro 1,000, AI Ultra (entry) 10,000, AI Ultra (higher) 25,000. Developer API is expected within weeks; Omni Pro has no committed date.
01 — What LaunchedGemini Omni Flash at I/O — a new family, not a Veo update.
Google announced Gemini Omni at the I/O 2026 keynote this morning. Koray Kavukcuoglu, CTO of Google DeepMind, wrote in the announcement: “Today we're introducing Gemini Omni — a new generation of multimodal models that creates anything from any input, starting with video.” Gemini Omni Flash is the first member of that family to ship. It is not a Veo update, not a Gemini 3.5 variant, and not a successor to Imagen. It is a distinct model family positioned around any-input-to-video generation.
Google's I/O 2026 recap lists Veo 3.1 alongside Omni as two active product lines — not one replacing the other. Veo remains the video-first, developer-API surface on Vertex AI and Google Flow. Omni is the multimodal-input, chat-native surface in the Gemini app and YouTube. Understanding the split is the most important thing for teams deciding which surface to build on.
Veo 3.1 is video-in, video-out — single-shot generation, lives on Vertex AI and the paid Flow surface. Omni is text + image + audio + video in, video out — multimodal input, conversational editing, lives in the Gemini app and YouTube. Both ship alongside each other. Most launch-week coverage conflates them; they are deliberately separate product lines targeting different use cases and different buyer surfaces.
02 — Input / OutputAny-to-any inputs, a deliberate 10-second cap.
Gemini Omni Flash accepts any combination of text, images, audio, and video in a single prompt. Google describes the model as “grounded in Gemini's real-world knowledge” — capability claims include physics (gravity, fluid dynamics), spatial anatomy, lighting consistency, and cultural context. Frame those as Google marketing copy until independent benchmarks arrive; no published third-party evaluation existed as of the I/O launch.
Output at launch is video, capped at 10 seconds per clip. Third-party reporting from WaveSpeed's technical deep-dive suggests the output runs at approximately 720p and 24 fps, though Google did not publish a resolution spec in the May 19 launch post — treat those numbers as reported rather than confirmed.
Nicole Brichtova (Product Management Director, Google DeepMind) was explicit in TechCrunch's coverage: “[The 10-second cap] isn't a model limitation, but rather a decision based both on a desire to get it into more hands.” The strategic logic is clear: a 10-second ceiling keeps per-generation cost low, maps directly to YouTube Shorts, and avoids the per-minute economics that contributed to Sora's standalone app shutdown in March 2026.
Output quality
Third-party reporting suggests 720p at 24 fps. Google did not publish a resolution spec at launch. Treat as reported pending official API docs.
Clip length at launch
A deployment choice, not a model limit — confirmed by Google's product management director. The 10-second ceiling keeps per-generation cost viable for mass YouTube Shorts distribution.
Any-combination input
Text, image, audio, and video accepted in any combination in a single prompt. Google frames Omni as the first multimodal generation model built on Gemini's reasoning and world-knowledge stack.
03 — SynthID WatermarkEvery Omni output is watermarked — no exceptions.
SynthID is Google's imperceptible digital watermark. It embeds an invisible signal in the pixel data of every Omni-generated video — one that survives common post-processing like re-encoding and color grading. Kavukcuoglu confirmed in the launch post: “All videos created with Omni include our imperceptible SynthID digital watermark.” There is no API parameter to disable it. There is no tier that removes it.
SynthID is not new to Omni — it has shipped across Google's image and video generation tools since 2023. According to TechTimes' launch coverage, SynthID has now marked over 100 billion AI-generated images and videos across Google and its partner platforms, with OpenAI, ElevenLabs, and Kakao among the adopters.
Verification is built into three surfaces: the Gemini app, Gemini in Chrome, and Google Search. The practical implication for marketers and content teams is that any Omni-generated clip distributed publicly is identifiable as AI-generated. That's a compliance and disclosure asset in regulated categories; it may be a friction point for campaigns that depend on plausible human authorship.
“All videos created with Omni include our imperceptible SynthID digital watermark.”Koray Kavukcuoglu, CTO Google DeepMind — Google DeepMind blog, May 19, 2026
04 — Safety HedgeVoice/speech editing withheld — Google's election-year framing.
The more interesting story than what shipped is what did not. Voice and speech editing of generated or supplied video is absent at launch. Google stated in its coverage: “We're still working to test [voice and speech editing] to better understand how we can bring this capability to users responsibly.” The phrasing is careful — “responsibly” rather than “technically feasible.” Third-party analysts, including coverage from TechTimes, interpret this as an election-year deepfake-safety hedge given the November 2026 US midterms.
Image-from-audio and audio-from-video generation are also absent, both reportedly on safety-review tracks. Real-person depiction is blocked at the model level — users who want to appear in an Omni video must complete an avatar onboarding sequence by recording themselves speaking a number sequence. That spoken-number handshake is Google's anti-deepfake control; bypassing it is prevented at the model level, not just the UI level.
The significance here is the precedent Google is setting on record. Google's statement puts it in writing that the model can do more than what shipped — the held-back features are a deployment choice, not a capability gap. Whether competitors follow with equivalent controls is the open question. As of I/O 2026, neither Sora 2, Veo 3.1, Kling 2, nor Runway Gen-4 have an equivalent avatar-onboarding handshake for real-person depiction.
Voice / speech editing
Editing spoken audio in generated or supplied video. Google's stated reason: responsible deployment testing. Analyst framing: election-year deepfake risk. No commit date.
Image-from-audio
Generating images from audio input alone. On a separate safety-review track from voice editing. Third-party sources only — Google did not detail this holdback in the launch post.
Audio-from-video
Extracting or regenerating audio from existing video. Also on a safety review track. No timeline published.
Real-person depiction
Appearing in an Omni video requires completing a spoken-number-sequence avatar onboarding. Bypassing the handshake is blocked at model level, not just UI level. Confirmed by TechCrunch coverage.
Clips longer than 10 seconds
The model can generate longer clips. Google chose 10 seconds for broader access and cost viability. Brichtova: 'isn't a model limitation.' Omni Pro is expected to raise this cap.
05 — AvailabilityThree surfaces live today — two paid, one free.
Gemini Omni Flash launched across three consumer surfaces simultaneously. The paid surfaces — the Gemini app and Google Flow — require an active AI Plus, AI Pro, or AI Ultra subscription. The free surface — YouTube Shorts Remix and YouTube Create — is available to any signed-in user aged 18 or older, no subscription required.
The free-on-YouTube distribution is strategically significant. It positions Google against Pika Labs 2.0 (the fastest tool for sub-10-second social clips) and undercuts Kling 2's per-second pricing for clips in the Shorts format. For a deeper look at the post-Sora video-gen landscape, including how Runway Gen-4, Kling, and Pika compete, see our companion post.
Google Flow is the paid video-creation workspace where Omni and Veo 3.1 coexist. Flow access is metered in credits — AI Plus users get 200 credits per month, AI Pro users get 1,000, and AI Ultra users get 10,000 to 25,000 depending on tier. The developer API is not yet available; Google said it is “coming in the next few weeks.”
Omni Flash access surfaces · entry price by tier
Source: Google AI subscriptions blog, May 19, 202606 — PricingAI Ultra dropped to $200 — and Flow credits are the real meter.
The Google AI subscriptions update published at I/O confirmed two pricing moves: AI Ultra dropped from $250/mo to $200/mo, and a new $100/mo entry tier was added for AI Ultra. Google wrote explicitly: “We're also reducing the monthly price of our top-tier AI Ultra plan from $250 to $200.” The $50 cut is a quiet admission that the original Ultra tier was a sticker-shock fence. It did not receive a keynote callout — it's buried in the subscription blog post.
For Omni access in Google Flow, the operational metric is Flow credits. Hand-wavy “10x usage limits” messaging is gone; actual credit allocations are now published. AI Plus gets 200 credits, AI Pro gets 1,000, AI Ultra (entry) gets 10,000, and AI Ultra (higher) gets 25,000. Google has not yet published how many credits a single Omni Flash generation costs, so clips-per-month math requires waiting for the official API pricing docs. Veo 3.1 access also runs through Flow credits for subscribers — AI Pro includes Veo 3.1 Lite trial access; AI Ultra unlocks Veo 3.1 full. Omni is additive to that structure, not a replacement.
For teams evaluating whether Omni fits their AI content workflow, the YouTube Shorts free tier is the lowest-friction entry point. For production volume, the AI Pro tier at $19.99/mo with 1,000 Flow credits is the most cost-visible starting point pending API pricing disclosure.
$7.99/mo · 200 Flow credits
Entry-level paid Omni access. Gemini app + Google Flow. YouTube Shorts Remix and Create remain free regardless of subscription tier for users 18+.
$19.99/mo · 1,000 Flow credits
Includes Veo 3.1 Lite trial in Flow. Omni access additive. 5x the Flow credits of AI Plus. Most cost-visible tier for production testing pending API pricing docs.
$100/mo · 10,000 Flow credits
New entry tier for AI Ultra, added at I/O 2026. Full Veo 3.1 access + Omni Flash. 10x the credits of AI Pro. Veo 3.1 full unlocked at Ultra.
$200/mo · 25,000 Flow credits
Top tier, repriced from $250 to $200 at I/O 2026. 25,000 Flow credits. Full Omni and Veo 3.1 access. The tier previously blocked by $250 sticker shock.
07 — Competitive LandscapeQ2 2026 AI video generation — where Omni fits.
The AI video generation landscape after the Sora standalone shutdown had five credible players before today: Veo 3.1, Sora 2 (folded back into ChatGPT), Kling 2, Runway Gen-4, and Pika Labs 2.0. Gemini Omni Flash enters as a sixth, with a different input surface than any of them. The comparison below uses independently reported data; no published cross-model benchmark existed at launch.
The single most important distinction in the table below is the multi-input column. Sora 2 (ChatGPT), Runway Gen-4, and Kling 2 accept text and image as input at most. Omni accepts text, image, audio, and video simultaneously in a single prompt. Veo 3.1 accepts text and video. No other consumer-facing model on the market today accepts all four input modalities at launch.
Any-input multimodal
Text + image + audio + video input, any combination. 10-second output cap (deployment choice). SynthID watermark, non-disableable. Voice editing withheld. Free on YouTube Shorts; paid in Gemini app. No API yet. Conversational multi-turn editing.
Video-first, developer API
Text and video input. Single-shot generation. Lives on Vertex AI and Google Flow. AI Pro includes Veo 3.1 Lite trial; AI Ultra unlocks full. Not a replacement for Omni — different surfaces, different use cases. Current active model confirmed by the internal fact-pack.
Text and image input, folded into ChatGPT
Standalone Sora app was discontinued March 2026 — six months after launch. Video generation now lives inside ChatGPT Plus. Max length 10-60 seconds depending on tier. No simultaneous multi-input across all four modalities. No SynthID.
Volume play, cost-efficient
Currently positioned as the cost-efficiency leader at approximately 40% below Runway Gen-4 per second of output. Text and image input. No four-modality simultaneous input. Strong for social-media volume; no SynthID or equivalent.
Creative control, professional tier
Text and image input. Positioned at the professional creative end — stronger manual controls, longer clip options, higher pricing. No four-modality simultaneous input. C2PA provenance, not SynthID. No free YouTube integration.
The structural advantage Google has that no competitor can match today is the YouTube distribution layer. Making Omni Flash free for YouTube Shorts Remix and YouTube Create removes the cost friction that kept Pika Labs as the speed-and-price leader for sub-10-second social clips. A creator who would have paid $8-15/mo for a Pika subscription to generate Shorts can now generate the same clips in YouTube Create at zero cost. That is a structural moat, not a feature.
The weakness relative to Sora 2 and Runway Gen-4 is clip length. Ten seconds is a hard cap at launch. For video formats beyond Shorts — YouTube standard, TikTok long-form, Instagram Reels at 90 seconds — Omni Flash is not a viable tool today. That gap is expected to close with Omni Pro, but Google has given no date.
08 — UX DifferentiatorConversational editing — the real differentiator vs Sora and Veo.
The clip generation itself is table stakes. The genuine UX differentiator in Omni Flash is the multi-turn conversational editing surface. Users can type natural-language instructions after receiving a clip — “remove the person in the background,” “make the lighting warmer,” “slow down the motion in the second half” — and Omni re-renders while maintaining character and scene consistency across edits.
Google's claimed architectural advantage is Gemini's long-context window. Veo 3.1 is described as operating on a single-shot regeneration model — each edit is a new generation from the prompt, without memory of prior outputs. Omni preserves the full generation history across tool calls and user messages in a session, which is what enables consistent re-rendering across turns. The consistency claim matches the pattern established in the Gemini 3.5 Flash architecture, where long-context retention is the throughline across the model family.
In practice, this positions Omni as closer to an iterative creative tool than a one-shot generation API. A workflow where you generate a base clip, refine it across multiple conversational turns, and then finalize for YouTube Shorts is exactly the use case the surface is designed for. That workflow is absent from Sora 2's ChatGPT integration, which does not preserve generation context across video edits.
For agency teams evaluating Omni alongside Gemini Spark — Google's always-on personal agent also launched today — the conversational editing surface in Omni is built on the same multi-turn reasoning stack that Spark uses for persistent task execution. The two products are architecturally aligned; a future integration where Spark manages video-generation workflows through Omni seems intentional.
09 — Release OutlookDeveloper API in weeks — Omni Pro has no date.
Google committed to two forward developments at I/O without giving specific dates. The developer API is “coming in the next few weeks” per the launch post. Omni Pro is announced but without a committed timeline — Google said Pro arrives “when we see a step change above Flash,” implying capability gating rather than calendar gating.
The developer API is the more consequential near-term event. Once Omni Flash leaves the consumer app and lands on Vertex AI or AI Studio, integrators get a SynthID-watermarked video primitive that cannot be generated anonymously. That changes the calculus for platforms that need AI-generated video with provenance tracking — media publishers, advertising verification networks, brand-safety tools. A video primitive that is provenance-stamped by design, not by post-processing, is architecturally different from any current alternative.
Omni Pro is expected to raise the 10-second clip cap, given Brichtova's framing that the cap is a deployment choice. Google has not confirmed that explicitly, nor given a timeline. For the full I/O 2026 announcement context — including everything Google announced at I/O, Spark, the Search overhaul, and Nano Banana Pro — see our companion hub.
The “coming weeks” API is the bigger story than any single consumer demo. Once Omni Flash reaches Vertex AI / AI Studio, the SynthID-watermarked video primitive becomes a building block for integrators — and a provenance layer that cannot be stripped. The question is whether Google ships the API before the November 2026 midterms and what guardrails ship alongside it.
Omni Flash is a category bet — any-to-any multimodal, watermark mandatory, voice editing withheld.
Gemini Omni Flash is a category bet. Any combination of text, image, audio, and video in; ten-second video clip out; SynthID watermark on every output at the model level, no exception. The conversational multi-turn editing surface — where character and scene consistency persist across natural-language refinements — is the genuine UX differentiator versus Sora 2, Veo 3.1, Kling 2, and Runway Gen-4. The 10-second cap is a deployment choice; the withheld voice-editing features are a more significant structural decision, and Google has put that on record.
The held-back features are the more interesting story. Voice and speech editing, image-from-audio, audio-from-video, and unfettered real-person depiction are all on safety-review tracks with no committed dates. Google's “election-year hedge” interpretation from third-party analysts puts on record that the model can do more than what shipped — and that the rest of the industry has not matched Google's avatar-onboarding handshake for real-person depiction. Whether Sora 2, Runway, or Kling follow with equivalent controls is the open question for the second half of 2026.
Watch the “coming weeks” developer API ship. Once Omni Flash leaves the consumer app and lands on Vertex AI and AI Studio, the SynthID-watermarked video primitive reaches integrators who can build provenance-aware video workflows at scale. That's the broader story beyond any one consumer demo — a video generation layer that is imperceptibly but indelibly stamped by design, not as an afterthought.