Google's conversational video model — generate a clip, then refine it by chatting instead of re-prompting.
Last verified · 2026-07-02 · by Moe Ameen
Gemini Omni Flash is a video generation and editing model from Google, launched in public preview via the Gemini API on June 30, 2026 (model ID gemini-omni-flash-preview). It is the fast, cost-efficient tier of the new Gemini Omni family and shipped the same day as Nano Banana 2 Lite, Google's fast image model. Where a plain text-to-video tool takes one prompt and hands you a finished clip, Omni Flash is built around a conversation: you generate a clip, then keep talking to it — "make it night," "move the camera left," "swap the jacket to red" — and each turn builds on the last result.
The model is multimodal on the way in. It accepts text, images, and video as references and produces video as output, drawing on Gemini's broader reasoning to keep the physics and continuity of a scene plausible rather than just stitching frames that look right. The headline capability is stateful editing: Google describes it as remembering the video context across turns and applying your change while preserving the elements you did not mention, so you refine a shot by describing edits instead of re-rolling the whole generation.
At launch there are real limits worth planning around. Clips are capped at 10 seconds, with longer durations described as coming soon. Output is 16:9 (the default) or 9:16 vertical. Every clip carries Google's invisible SynthID watermark for AI provenance. A handful of features are unsupported in the preview — audio references, multi-video referencing, and system-instruction / temperature controls — and Google notes character consistency can wobble when you change scenes. Video editing of uploaded content is also restricted in the EEA, Switzerland, and the UK. Treat any specific limit as a preview-era snapshot; this tier is shipping fast.
Pricing is usage-based at $0.10 per second of video output — the same rate Google charges for Veo 3.1 Fast — so a full 10-second clip runs about a dollar in raw generation cost. Omni Flash is reachable through the Gemini API, Google AI Studio, the Gemini app, and Google Flow.
Omni Flash's chat-to-edit loop is genuinely the fastest way to nail one 10-second shot. But a 10-second clip is not a post, and it is definitely not a content week. Kompozy is the layer that turns that clip into finished, scheduled content and then multiplies it. Drop an Omni Flash export into Kompozy and it burns in branded, on-style captions, reframes the clip cleanly for each destination's aspect ratio, and lets you stack hook text or lower-thirds through HyperFrames so the silent-autoplay first second actually reads. Then Kompozy schedules and fans that clip to TikTok, Reels, YouTube Shorts, X, LinkedIn, and the rest of its nine connected platforms from one queue — instead of you exporting and re-uploading into six apps by hand.
The bigger unlock is fan-out. A single Omni Flash clip can seed a whole content unit inside Kompozy: the video for short-form feeds, plus a quote graphic, a set of native text posts, and a thread written in your own voice through your Persona Brief — one 10-second render becomes a week of cross-platform posts. And where Omni Flash caps out (10 seconds, no talking head, no long-form), Kompozy generates the formats it can't: Persona Shorts and HeyGen avatar video, Clipped Shorts from long-form, carousels, blogs, and newsletters. Omni Flash owns the fast, conversational shot; Kompozy owns the captions, the format fan-out, the brand voice, the schedule, and the publish.
Gemini Omni Flash is a video generation and editing model from Google, launched in public preview via the Gemini API on June 30, 2026. Its defining feature is conversational editing — you generate a clip, then keep chatting to refine it, and each turn builds on the previous result while preserving what you did not change.
Clips are capped at 10 seconds in the launch preview, with longer durations described by Google as coming soon. Output is available in 16:9 (default) or 9:16 vertical.
It is priced at $0.10 per second of video output — the same rate as Veo 3.1 Fast — so a full 10-second clip costs roughly a dollar in raw generation. It is available via the Gemini API, Google AI Studio, the Gemini app, and Google Flow.
Short 10-second clips from text, an image, or a reference video; iterative edits to a generated clip via conversation; image-to-video motion; and 9:16 or 16:9 output for social and landscape. Preview limits include no audio references, no multi-video referencing, and occasional character-consistency drift across scene changes.
Omni Flash generates the clip but does not publish it. Bring the export into Kompozy to add branded captions, reframe it per platform, and schedule and publish it across TikTok, Reels, YouTube Shorts, X, LinkedIn, and more — then fan the same clip out into a quote card, text posts, and a thread in your voice.