Gemini Omni: What Creators Need to Know
Google merged text, image, and video generation into a single model called Gemini Omni. We break down the capabilities and the tradeoffs.
Google I/O 2026 (May 19-20) brought the biggest structural shift in AI video generation since Kling 3.0 launched in February. The headline: Gemini Omni, a unified multimodal model that folds text generation, image generation, and video generation into a single system.
This is not a rebrand of Veo. Gemini Omni replaces entirely, absorbing video capabilities directly into the Gemini model family. The leaked codename "Toucan" — first discovered on May 2 by an X user who found UI strings inside the Gemini interface, later confirmed by TestingCatalog — turned out to be the video generation pathway within the broader Omni architecture. A public demo on May 11 drew strong reactions from both the AI research community and the creator ecosystem.
What Gemini Omni Actually Does
Three capabilities stand out.
Conversational Video Editing
Instead of regenerating entire clips for minor changes, Omni lets creators describe edits in plain language: "swap the red car for a black one," "make the lighting warmer," "add rain to the background." The model applies these as incremental modifications, preserving everything the creator did not ask to change. Current models like treat each generation as a fresh start — if you want a variation, you regenerate and hope the seed cooperates. Omni's approach mirrors how human editors actually work: iterative refinement, not repeated coin flips.
Long-Context Character Consistency
Gemini Omni maintains character appearance, clothing, and proportions across multiple shots using the same long-context architecture that powers Gemini's text capabilities. Feed it a character description once, and it remembers that character across an entire sequence of generations. This directly addresses the "character drift" problem — where the same person looks subtly different in every shot — that has plagued AI video production since its earliest days.