Grok Imagine Video 1.5 moved from preview to general availability on June 16, 2026, and for marketing teams the timing matters as much as the model. xAI’s video generator now sits at the top of the Image-to-Video Arena by Elo score, runs on an autoregressive engine called Aurora that prizes temporal coherence over raw resolution, and produces audio in the same inference pass as the picture. It lands in a market the discontinued Sora left wide open.
Why now: the Sora consumer app was discontinued on April 26, 2026, and its API is on a deprecated track scheduled to sunset on September 24, 2026, with no announced successor. That exit created an immediate opening, and Grok Imagine Video 1.5 is the most aggressive bid to fill it — reportedly priced well below the competition, with a free-tier path that lets a brand test it before committing budget. The catch is that the headline price has not reconciled across sources, so this guide treats it with care.
What follows: what actually shipped versus what is still consumer-only or merely announced, why Aurora’s architecture explains both the model’s strengths and its limits, the two distinct leaderboards most coverage conflates, an honest reading of the pricing, the marketing use cases xAI is pitching, and the platform risks a brand should weigh before putting Grok Imagine into a production pipeline. Where a figure is vendor-stated or unverified, we say so plainly.
- 01Grok Imagine Video 1.5 went generally available on June 16, 2026.The API model string is grok-imagine-video-1.5, live across the Imagine API, grok.com/imagine, and the iOS and Android apps. It now leads the Image-to-Video Arena by Elo — a crowd-sourced, xAI-stated metric, not an overall video-model ranking.
- 02Aurora is autoregressive, not diffusion — and that explains everything.Aurora generates each frame in sequence, conditioning every new frame on the prior ones. That sequential design is the source of the model's temporal-coherence edge and the direct cause of its 720p ceiling. Higher resolution is on xAI's roadmap, not in the product today.
- 03Audio is generated in the same inference pass as the video.Sound effects, ambience, dialogue, and lip-synced speech come out alongside the visuals at no extra charge. xAI frames this as a differentiator from Runway, Kling, and the discontinued Sora, which required separate audio generation or post-production.
- 04The pricing is reported, not reconciled — verify before you budget.Two secondary sources disagree on the 720p rate, citing figures that imply either roughly $4.20 or $8.40 per minute. We present a labelled range rather than a single price, and point you to the official xAI pricing docs to confirm the current number.
- 05Treat it as a capability map, not a 'best model' headline.Arena data shows top-tier strength for people, fashion, action, fantasy, and animation, and weaker results for nature, animals, photorealistic, and moving-camera shots. Lead with what it does well; caveat the rest.
01 — The ReleaseFrom preview to GA, live on every surface.
The launch put one API model into general availability: grok-imagine-video-1.5, reachable through the Imagine API, the web app at grok.com/imagine, and the iOS and Android apps. The model had previously been in preview. For developers and marketing engineers building a pipeline, the API model string is the thing that matters — it is the stable target you integrate against.
Alongside it, xAI shipped a separate consumer-tier variant called Video 1.5 Fast, tuned for latency rather than for programmatic use. It is not a distinct API model string; it lives on grok.com/imagine and in the apps, and it generates a six-second 720p clip in roughly 25 seconds, down from 40-plus seconds in the previous model. Keep the two straight: developers want the API model, while the Fast variant is the consumer UI built for quick iteration. This is the GA follow-up to the workflow we covered in our earlier guide to Grok Imagine 1.5 for brand ad production.
grok-imagine-video-1.5
The stable integration target. Asynchronous flow: POST to receive a request_id, then poll until status is done. Authenticated with a standard xAI API key via the xai_sdk client, whose generate() method handles polling for you.
Video 1.5 Fast
A speed-optimized UI tier, not a separate API model string. Produces a 6-second, 720p clip in about 25 seconds versus 40+ seconds before — roughly a 40% speed gain for latency-sensitive consumer workflows.
02 — The EngineAurora is autoregressive — and that is the whole story.
The single most useful thing a marketer can understand about this model is its architecture, because it predicts both what the model is good at and where it falls down. Aurora is an autoregressive mixture-of-experts video engine. Where diffusion-based models — Sora, Runway, Kling — denoise every frame in parallel, Aurora generates frames sequentially, conditioning each new frame on all the frames before it. The plain-language version: it is closer to autocomplete for video than to repainting every frame at once.
Why it leads on coherence
Because each frame is built on the actual frames preceding it, the model holds a subject in place, keeps camera movement stable, and carries lighting transitions through a clip without the flicker and drift that parallel approaches can show. That is the source of the temporal coherence xAI leans on in its marketing, and it is genuinely useful for the kind of short, controlled motion brand creative needs.
Why it caps at 720p
The same sequential design is why the model tops out at 720p. Pushing to 1080p would mean processing roughly 2.25 times more tokens per frame all the way down a sequential chain — a cost Aurora cannot absorb as easily as a parallel diffusion model can. xAI has said a higher-resolution “Pro Mode” is on the roadmap but has not committed to a date, so treat 1080p as announced-intent, not a shipping capability. For social-first and concept-testing work, the 720p tradeoff is often acceptable; for hero product close-ups at full resolution, it is a real limit.
"These are our best image-to-video models yet: better motion, better physics, better audio, at the fastest speeds."— xAI, Grok Imagine Video 1.5 launch announcement (June 16, 2026)
03 — The LeaderboardsTwo arenas, two very different stories.
Nearly all the launch coverage compresses Grok Imagine Video 1.5 into a single “#1 AI video model” line. That is not accurate, and the distinction is the most useful thing this post can give a marketer. There are two separate evaluations, and the model sits in a very different place on each.
On the Image-to-Video Arena, xAI states the model holds the number-one Elo position, with a +52 Elo gain over Video 1.0 — one of the larger single-version jumps on that board, ahead of Sora 2, Veo 3.1, Seedance, and Kling on this metric. That is a crowd-sourced user-preference score, xAI-stated and not independently audited. On the Text-to-Video Arena, independent OpenRouter data puts the model at #6 overall — 1,233 Elo across 6,500-plus matches. Same model, two arenas, two stories. For the competing model it outranks on image-to-video, see Seedance 2.5 from ByteDance.
The category breakdown beneath the overall number is where the practical guidance lives. Lead with the strengths; caveat the rest.
| Category | Elo (OpenRouter) | Arena tier | Practical marketing implication |
|---|---|---|---|
| Lead with these — Top 1% | |||
| Cartoon & Anime | 1,339 | Top 1% | Animated explainers and stylized brand films |
| Fantasy | 1,329 | Top 1% | Concept and world-building creative |
| Action | 1,316 | Top 1% | Dynamic, motion-heavy ad spots |
| Fashion | 1,280 | Top 1% | Apparel and lifestyle reels |
| People | 1,276 | Top 1% | Talent-led, presenter-style content |
| Caveat these — Top 75% or below | |||
| Moving Camera | — | Top 75% or below | Keep camera moves simple; complex moves drift |
| Nature | — | Top 75% or below | Weak for landscape B-roll |
| Animals | — | Top 75% or below | Wildlife motion can read as off |
| Photorealistic | — | Top 75% or below | Risky for photoreal product close-ups |
04 — The EconomicsThe pricing is the story — and it has not reconciled.
Here is where most coverage repeats a number without checking the arithmetic, and where this post can be genuinely useful. The commercial pitch is that Grok Imagine Video 1.5 is dramatically cheaper than the alternatives. The problem is that the reported per-second and per-minute figures do not multiply out to each other. One widely cited report lists a 720p rate that, taken at the stated per-minute figure of about $4.20, implies roughly $0.07 per second — while a separate pre-GA pricing breakdown showed the preview tier at $0.14 per second, which works out to about $8.40 per minute. Those are not the same price.
The most likely explanations are that xAI halved pricing from preview to GA, or that the secondary report mixed a base-model rate into the 1.5 line. We cannot resolve that from here, so we do not print a single precise per-minute price as fact. The responsible move — and the one that builds trust with a budget owner — is to present the range, attribute it, and direct you to confirm the current number on the official xAI pricing docs at docs.x.ai before committing spend. The chart below uses the reported range against independently confirmed competitor rates.
720p video, cost per minute · Grok's reported range vs confirmed competitors
Sources: TechTimes (Jun 18, 2026), eesel.ai (Jun 5, 2026). Grok figures reported, not reconciled — verify at docs.x.ai.| Model | Reported price / sec | Per minute (derived) | Native audio in-pass | Vs Sora 2 Pro ($30/min) |
|---|---|---|---|---|
| Grok Imagine Video 1.5 | $0.07–0.14 (reported) | $4.20–8.40 | Yes (same pass) | 72–86% lower (xAI claims 86%) |
| Veo 3.1 Fast | $0.15 | $9.00 | Not stated | 70% lower |
| Veo 3.1 Quality | $0.40 | $24.00 | Not stated | 20% lower |
| Sora 2 Pro (deprecated) | $0.50 | $30.00 | No (separate) | — baseline |
Read the right-hand column carefully. xAI claims the model is roughly 86% cheaper than Sora 2 Pro — but that figure only holds if the GA price is around $0.07 per second. At the alternative reported $0.14 per second, the saving is closer to 72%. Either way the model undercuts Veo 3.1 and the exiting Sora, which is why the commercial interest is real. The point is to act on the direction of the saving, not to hard-code the headline percentage into a forecast. For the wider field of alternatives, our read on the AI video generator landscape after Sora’s exit maps where each option fits.
05 — The SpecWhat it actually outputs — and how you drive it.
The output envelope is tight and worth memorizing. Clips run 1 to 15 seconds, configurable; resolution is 480p or 720p; frame rate is fixed at 24fps; output is H.264 MP4. Supported aspect ratios cover the full social spread — 1:1, 16:9, 9:16, 4:3, 3:4, 3:2, and 2:3 — and for image-to-video the output defaults to the input image’s aspect ratio unless you override it. The 24fps lock matches cinematic convention but falls short of the 60fps used in gaming and sports content, which is a real constraint for some formats.
The primary workflow is image-to-video: a still image becomes the first frame, a prompt describes the motion, and the model animates forward while preserving the source composition, subject identity, and lighting. Beyond that, the Imagine API exposes five distinct video workflows — generation, image-to-video, video editing (capped at 8.7 seconds for edits), reference-to-video, and video extension. Audio is produced in the same inference pass, so dialogue, ambience, and effects land with the picture rather than in a separate step.
Per generation, configurable
Clips run 1 to 15 seconds. Longer sequences require chaining via the video-extension feature, and community testing reports visible quality degradation after two to three chained extensions, so plan around the single-clip ceiling.
480p or 720p, 24fps
The ceiling is 720p, a direct consequence of Aurora's sequential design. Frame rate is fixed at 24fps. A higher-resolution Pro Mode is on xAI's roadmap but has no committed release date.
Generated with the video
Sound effects, background ambience, dialogue, and lip-synced speech are produced in the same inference pass at no additional charge — xAI's stated differentiator from Runway, Kling, and the discontinued Sora.
One operational detail matters for pipeline design: the API is asynchronous. You POST a generation request, receive a request_id, and poll until the status reads done — generation typically takes up to several minutes, and the returned video URLs are temporary, so a production pipeline needs to fetch and store the file promptly. The accepted input image formats are broad (JPG, JPEG, PNG, WEBP, GIF, AVIF), and the rate limit is 60 requests per minute.
06 — For MarketersWhere it earns its place in a content pipeline.
xAI markets the video model around concrete commerce and advertising jobs, and the Imagine API page names them directly. The four below are the ones most relevant to a brand creative pipeline. None of them is a replacement for a full production for hero brand films — the 720p ceiling and 15-second limit see to that — but each maps cleanly to the volume, social-first content most teams now ship continuously. For the adjacent product-ad pattern, see our piece on AI video product swap and local redraw ads.
Photo to motion
Turn a single product photo into a cinematic short with smooth camera motion. The image-to-video workflow preserves the product's composition and lighting while animating forward — well suited to social product spots.
Person plus garment
Upload a person photo and a clothing item and generate a clip of the garment worn. A fit for apparel and lifestyle reels — one of the model's Top 1% arena categories (Fashion, People).
Restage the product
Place a product into a new environment without a reshoot — useful for testing settings and contexts before a full production commits budget to a location.
Cinematic to watercolor
Apply cinematic, anime, retro, or watercolor styles to existing footage. Strong for the stylized and animated categories where the model leads, weaker for photorealistic output.
The procurement advantage worth naming is the free tier. xAI lists video generation up to 15 seconds at 720p as available on the free consumer tier at grok.com — no X Premium subscription required, with limited quotas — while SuperGrok ($30/month) raises the limits. That matters because Sora required a paid subscription before a creator could even evaluate it. A brand can test Grok Imagine against its own creative, on its own products, before spending anything — a real advantage in a market where AI video vendors are increasingly asking for annual commitments. Standing up that kind of test-and-measure workflow inside an ongoing content operation is the work our social media content production and ecommerce growth engagements are built around.
07 — The RisksWhat a brand should weigh before committing.
Most marketing-facing coverage skips the risk picture entirely, and a brand evaluating a vendor cannot afford to. Two categories matter: the hard capability limits, which are knowable today, and the platform and reputational exposure, which is more uncertain but material. Sort the decision by where you sit rather than treating the model as a single yes-or-no.
Lean in
For people, fashion, action, fantasy, and animated content at social resolutions, this is where the model leads and the price advantage is real. Test on your own assets via the free tier first, then scale on the API.
Hold for now
The 720p ceiling and weaker photorealistic-category results make this a poor fit for full-resolution flagship product detail. Wait for the announced Pro Mode, or keep these shots on a higher-resolution pipeline.
Mind the 15-second wall
Clips cap at 15 seconds and chained video extensions degrade visibly after two or three links. Storyboard around short, self-contained shots rather than expecting a single continuous long take.
Run a governance check
Weigh the platform's content-moderation track record and ongoing regulatory exposure against your own brand-safety standards before associating spend or assets with the channel.
08 — ConclusionA strong, specific tool — read the asterisks.
Grok Imagine Video 1.5 is a real option for the right jobs — once you read past the headline.
The general availability of Grok Imagine Video 1.5 is a genuine moment for marketing teams. Aurora’s autoregressive engine delivers the temporal coherence short brand creative needs, native audio removes a production step, and the reported pricing undercuts both Veo 3.1 and the exiting Sora. With a free tier to trial it on, the barrier to testing is close to zero.
The discipline is in the asterisks. The number-one position is on the image-to-video leaderboard only; on text-to-video the model is mid-pack. The 720p ceiling and 15-second limit are architectural, not temporary. The celebrated price has not reconciled across sources, so the saving is a range to verify rather than a percentage to quote. And the platform carries a real moderation and legal history that a brand should weigh.
The right posture is neither hype nor dismissal. Match the model to the jobs it leads — people, fashion, action, fantasy, and animated social content — trial it on your own creative before committing, confirm the current price on the official docs, and keep hero, full-resolution work on a pipeline built for it. Used that way, Grok Imagine Video 1.5 earns a place in a 2026 content operation. Treated as a one-line “best AI video model,” it sets a team up to be surprised.