MarketingNew Release11 min readPublished June 26, 2026

xAI’s video model goes GA · 720p ceiling · top of the image-to-video arena

Grok Imagine Video 1.5: AI Video for Marketers in 2026

xAI moved Grok Imagine Video 1.5 from preview to general availability on June 16, 2026. It runs on Aurora, an autoregressive engine that buys temporal coherence at the cost of a 720p ceiling, generates audio in the same pass as the picture, and lands in the gap the discontinued Sora left open. This guide separates what is verified from what is vendor-stated — and flags a pricing story that has not yet reconciled.

DA
Digital Applied Team
Senior strategists · Published June 26, 2026
PublishedJune 26, 2026
Read time11 min
SourcesxAI, TechTimes, OpenRouter +4
Went generally available
Jun 16
preview → GA, 2026
API live
Resolution ceiling
720p
Aurora's sequential cap
no 1080p yet
Image-to-Video Arena
#1
xAI-stated Elo
#6 text-to-video
Max clip length
15s
1–15s · fixed 24fps

Grok Imagine Video 1.5 moved from preview to general availability on June 16, 2026, and for marketing teams the timing matters as much as the model. xAI’s video generator now sits at the top of the Image-to-Video Arena by Elo score, runs on an autoregressive engine called Aurora that prizes temporal coherence over raw resolution, and produces audio in the same inference pass as the picture. It lands in a market the discontinued Sora left wide open.

Why now: the Sora consumer app was discontinued on April 26, 2026, and its API is on a deprecated track scheduled to sunset on September 24, 2026, with no announced successor. That exit created an immediate opening, and Grok Imagine Video 1.5 is the most aggressive bid to fill it — reportedly priced well below the competition, with a free-tier path that lets a brand test it before committing budget. The catch is that the headline price has not reconciled across sources, so this guide treats it with care.

What follows: what actually shipped versus what is still consumer-only or merely announced, why Aurora’s architecture explains both the model’s strengths and its limits, the two distinct leaderboards most coverage conflates, an honest reading of the pricing, the marketing use cases xAI is pitching, and the platform risks a brand should weigh before putting Grok Imagine into a production pipeline. Where a figure is vendor-stated or unverified, we say so plainly.

Key takeaways
  1. 01
    Grok Imagine Video 1.5 went generally available on June 16, 2026.The API model string is grok-imagine-video-1.5, live across the Imagine API, grok.com/imagine, and the iOS and Android apps. It now leads the Image-to-Video Arena by Elo — a crowd-sourced, xAI-stated metric, not an overall video-model ranking.
  2. 02
    Aurora is autoregressive, not diffusion — and that explains everything.Aurora generates each frame in sequence, conditioning every new frame on the prior ones. That sequential design is the source of the model's temporal-coherence edge and the direct cause of its 720p ceiling. Higher resolution is on xAI's roadmap, not in the product today.
  3. 03
    Audio is generated in the same inference pass as the video.Sound effects, ambience, dialogue, and lip-synced speech come out alongside the visuals at no extra charge. xAI frames this as a differentiator from Runway, Kling, and the discontinued Sora, which required separate audio generation or post-production.
  4. 04
    The pricing is reported, not reconciled — verify before you budget.Two secondary sources disagree on the 720p rate, citing figures that imply either roughly $4.20 or $8.40 per minute. We present a labelled range rather than a single price, and point you to the official xAI pricing docs to confirm the current number.
  5. 05
    Treat it as a capability map, not a 'best model' headline.Arena data shows top-tier strength for people, fashion, action, fantasy, and animation, and weaker results for nature, animals, photorealistic, and moving-camera shots. Lead with what it does well; caveat the rest.

01The ReleaseFrom preview to GA, live on every surface.

The launch put one API model into general availability: grok-imagine-video-1.5, reachable through the Imagine API, the web app at grok.com/imagine, and the iOS and Android apps. The model had previously been in preview. For developers and marketing engineers building a pipeline, the API model string is the thing that matters — it is the stable target you integrate against.

Alongside it, xAI shipped a separate consumer-tier variant called Video 1.5 Fast, tuned for latency rather than for programmatic use. It is not a distinct API model string; it lives on grok.com/imagine and in the apps, and it generates a six-second 720p clip in roughly 25 seconds, down from 40-plus seconds in the previous model. Keep the two straight: developers want the API model, while the Fast variant is the consumer UI built for quick iteration. This is the GA follow-up to the workflow we covered in our earlier guide to Grok Imagine 1.5 for brand ad production.

API model
grok-imagine-video-1.5
Imagine API · 60 requests/minute

The stable integration target. Asynchronous flow: POST to receive a request_id, then poll until status is done. Authenticated with a standard xAI API key via the xai_sdk client, whose generate() method handles polling for you.

what developers build against
Consumer variant
Video 1.5 Fast
grok.com/imagine + iOS/Android

A speed-optimized UI tier, not a separate API model string. Produces a 6-second, 720p clip in about 25 seconds versus 40+ seconds before — roughly a 40% speed gain for latency-sensitive consumer workflows.

consumer-only
What 'GA' actually covers here
General availability applies to the video model on the Imagine API and the consumer surfaces. xAI has also been shipping workspace features in the days after the June 16 launch — Projects to group related generations, parallel multi-agent generation, and library search — that reposition Grok Imagine from a single-prompt tool into a persistent creative workspace. The model is live and buildable today; the workspace layer is still filling in around it.

02The EngineAurora is autoregressive — and that is the whole story.

The single most useful thing a marketer can understand about this model is its architecture, because it predicts both what the model is good at and where it falls down. Aurora is an autoregressive mixture-of-experts video engine. Where diffusion-based models — Sora, Runway, Kling — denoise every frame in parallel, Aurora generates frames sequentially, conditioning each new frame on all the frames before it. The plain-language version: it is closer to autocomplete for video than to repainting every frame at once.

Why it leads on coherence

Because each frame is built on the actual frames preceding it, the model holds a subject in place, keeps camera movement stable, and carries lighting transitions through a clip without the flicker and drift that parallel approaches can show. That is the source of the temporal coherence xAI leans on in its marketing, and it is genuinely useful for the kind of short, controlled motion brand creative needs.

Why it caps at 720p

The same sequential design is why the model tops out at 720p. Pushing to 1080p would mean processing roughly 2.25 times more tokens per frame all the way down a sequential chain — a cost Aurora cannot absorb as easily as a parallel diffusion model can. xAI has said a higher-resolution “Pro Mode” is on the roadmap but has not committed to a date, so treat 1080p as announced-intent, not a shipping capability. For social-first and concept-testing work, the 720p tradeoff is often acceptable; for hero product close-ups at full resolution, it is a real limit.

"These are our best image-to-video models yet: better motion, better physics, better audio, at the fastest speeds."— xAI, Grok Imagine Video 1.5 launch announcement (June 16, 2026)

03The LeaderboardsTwo arenas, two very different stories.

Nearly all the launch coverage compresses Grok Imagine Video 1.5 into a single “#1 AI video model” line. That is not accurate, and the distinction is the most useful thing this post can give a marketer. There are two separate evaluations, and the model sits in a very different place on each.

On the Image-to-Video Arena, xAI states the model holds the number-one Elo position, with a +52 Elo gain over Video 1.0 — one of the larger single-version jumps on that board, ahead of Sora 2, Veo 3.1, Seedance, and Kling on this metric. That is a crowd-sourced user-preference score, xAI-stated and not independently audited. On the Text-to-Video Arena, independent OpenRouter data puts the model at #6 overall — 1,233 Elo across 6,500-plus matches. Same model, two arenas, two stories. For the competing model it outranks on image-to-video, see Seedance 2.5 from ByteDance.

The category breakdown beneath the overall number is where the practical guidance lives. Lead with the strengths; caveat the rest.

Grok Imagine Video category performance from the OpenRouter Text-to-Video Arena (retrieved June 26, 2026): Elo score, arena tier, and the practical marketing implication for each category, split into top-tier strengths and weaker categories to caveat.
CategoryElo (OpenRouter)Arena tierPractical marketing implication
Lead with these — Top 1%
Cartoon & Anime1,339Top 1%Animated explainers and stylized brand films
Fantasy1,329Top 1%Concept and world-building creative
Action1,316Top 1%Dynamic, motion-heavy ad spots
Fashion1,280Top 1%Apparel and lifestyle reels
People1,276Top 1%Talent-led, presenter-style content
Caveat these — Top 75% or below
Moving CameraTop 75% or belowKeep camera moves simple; complex moves drift
NatureTop 75% or belowWeak for landscape B-roll
AnimalsTop 75% or belowWildlife motion can read as off
PhotorealisticTop 75% or belowRisky for photoreal product close-ups
Do not say '#1 AI video model'
The number-one position is specifically and only on the Image-to-Video Arena, and even that is an xAI-stated, crowd-sourced Elo figure rather than an independently audited result. On the broader Text-to-Video Arena the model ranks sixth. A claim like “the best AI video model” collapses two different evaluations into one and will not survive scrutiny. The honest framing is “top of the image-to-video leaderboard, mid-pack on text-to-video,” and that nuance is exactly what a vendor evaluation needs.

04The EconomicsThe pricing is the story — and it has not reconciled.

Here is where most coverage repeats a number without checking the arithmetic, and where this post can be genuinely useful. The commercial pitch is that Grok Imagine Video 1.5 is dramatically cheaper than the alternatives. The problem is that the reported per-second and per-minute figures do not multiply out to each other. One widely cited report lists a 720p rate that, taken at the stated per-minute figure of about $4.20, implies roughly $0.07 per second — while a separate pre-GA pricing breakdown showed the preview tier at $0.14 per second, which works out to about $8.40 per minute. Those are not the same price.

The most likely explanations are that xAI halved pricing from preview to GA, or that the secondary report mixed a base-model rate into the 1.5 line. We cannot resolve that from here, so we do not print a single precise per-minute price as fact. The responsible move — and the one that builds trust with a budget owner — is to present the range, attribute it, and direct you to confirm the current number on the official xAI pricing docs at docs.x.ai before committing spend. The chart below uses the reported range against independently confirmed competitor rates.

720p video, cost per minute · Grok's reported range vs confirmed competitors

Sources: TechTimes (Jun 18, 2026), eesel.ai (Jun 5, 2026). Grok figures reported, not reconciled — verify at docs.x.ai.
Sora 2 Pro (deprecated)$0.50/sec at 1024p · the exiting incumbent
$30/min
Veo 3.1 Quality$0.40/sec · independently confirmed
$24/min
Veo 3.1 Fast$0.15/sec · independently confirmed
$9/min
Grok Imagine V1.5 — reported high$0.14/sec · pre-GA preview tier (eesel.ai)
~$8.40/min
Grok Imagine V1.5 — reported low$0.07/sec · implied by the reported $4.20/min
~$4.20/min
AI video API cost matrix for marketing teams (June 2026): reported output price per second, the derived per-minute cost, native audio inclusion, and the saving versus the deprecated Sora 2 Pro baseline. Grok figures are reported and not reconciled.
ModelReported price / secPer minute (derived)Native audio in-passVs Sora 2 Pro ($30/min)
Grok Imagine Video 1.5$0.07–0.14 (reported)$4.20–8.40Yes (same pass)72–86% lower (xAI claims 86%)
Veo 3.1 Fast$0.15$9.00Not stated70% lower
Veo 3.1 Quality$0.40$24.00Not stated20% lower
Sora 2 Pro (deprecated)$0.50$30.00No (separate)— baseline

Read the right-hand column carefully. xAI claims the model is roughly 86% cheaper than Sora 2 Pro — but that figure only holds if the GA price is around $0.07 per second. At the alternative reported $0.14 per second, the saving is closer to 72%. Either way the model undercuts Veo 3.1 and the exiting Sora, which is why the commercial interest is real. The point is to act on the direction of the saving, not to hard-code the headline percentage into a forecast. For the wider field of alternatives, our read on the AI video generator landscape after Sora’s exit maps where each option fits.

A vendor hedge does not fix a number
The cost matrix above is built from reported figures that do not reconcile across sources, so we present a range and attribute the 86% saving to xAI rather than asserting it. The native-audio inclusion and the per-minute gap against Sora and Veo are the durable signals; the exact per-second rate is the thing to confirm at docs.x.ai before you model a campaign budget. Treat the headline as directional and the official pricing page as the source of truth.

05The SpecWhat it actually outputs — and how you drive it.

The output envelope is tight and worth memorizing. Clips run 1 to 15 seconds, configurable; resolution is 480p or 720p; frame rate is fixed at 24fps; output is H.264 MP4. Supported aspect ratios cover the full social spread — 1:1, 16:9, 9:16, 4:3, 3:4, 3:2, and 2:3 — and for image-to-video the output defaults to the input image’s aspect ratio unless you override it. The 24fps lock matches cinematic convention but falls short of the 60fps used in gaming and sports content, which is a real constraint for some formats.

The primary workflow is image-to-video: a still image becomes the first frame, a prompt describes the motion, and the model animates forward while preserving the source composition, subject identity, and lighting. Beyond that, the Imagine API exposes five distinct video workflows — generation, image-to-video, video editing (capped at 8.7 seconds for edits), reference-to-video, and video extension. Audio is produced in the same inference pass, so dialogue, ambience, and effects land with the picture rather than in a separate step.

Clip length
Per generation, configurable
15s

Clips run 1 to 15 seconds. Longer sequences require chaining via the video-extension feature, and community testing reports visible quality degradation after two to three chained extensions, so plan around the single-clip ceiling.

1–15s
Resolution
480p or 720p, 24fps
720p

The ceiling is 720p, a direct consequence of Aurora's sequential design. Frame rate is fixed at 24fps. A higher-resolution Pro Mode is on xAI's roadmap but has no committed release date.

no 1080p today
Audio
Generated with the video
1pass

Sound effects, background ambience, dialogue, and lip-synced speech are produced in the same inference pass at no additional charge — xAI's stated differentiator from Runway, Kling, and the discontinued Sora.

native, no upcharge

One operational detail matters for pipeline design: the API is asynchronous. You POST a generation request, receive a request_id, and poll until the status reads done — generation typically takes up to several minutes, and the returned video URLs are temporary, so a production pipeline needs to fetch and store the file promptly. The accepted input image formats are broad (JPG, JPEG, PNG, WEBP, GIF, AVIF), and the rate limit is 60 requests per minute.

06For MarketersWhere it earns its place in a content pipeline.

xAI markets the video model around concrete commerce and advertising jobs, and the Imagine API page names them directly. The four below are the ones most relevant to a brand creative pipeline. None of them is a replacement for a full production for hero brand films — the 720p ceiling and 15-second limit see to that — but each maps cleanly to the volume, social-first content most teams now ship continuously. For the adjacent product-ad pattern, see our piece on AI video product swap and local redraw ads.

Product demos
Photo to motion
image-to-video · pans, zooms, reveals

Turn a single product photo into a cinematic short with smooth camera motion. The image-to-video workflow preserves the product's composition and lighting while animating forward — well suited to social product spots.

ecommerce + social
Virtual try-on
Person plus garment
reference-to-video

Upload a person photo and a clothing item and generate a clip of the garment worn. A fit for apparel and lifestyle reels — one of the model's Top 1% arena categories (Fashion, People).

apparel
Product placement
Restage the product
reference-to-video

Place a product into a new environment without a reshoot — useful for testing settings and contexts before a full production commits budget to a location.

concept testing
Creative restyle
Cinematic to watercolor
video editing · style transfer

Apply cinematic, anime, retro, or watercolor styles to existing footage. Strong for the stylized and animated categories where the model leads, weaker for photorealistic output.

stylized content

The procurement advantage worth naming is the free tier. xAI lists video generation up to 15 seconds at 720p as available on the free consumer tier at grok.com — no X Premium subscription required, with limited quotas — while SuperGrok ($30/month) raises the limits. That matters because Sora required a paid subscription before a creator could even evaluate it. A brand can test Grok Imagine against its own creative, on its own products, before spending anything — a real advantage in a market where AI video vendors are increasingly asking for annual commitments. Standing up that kind of test-and-measure workflow inside an ongoing content operation is the work our social media content production and ecommerce growth engagements are built around.

07The RisksWhat a brand should weigh before committing.

Most marketing-facing coverage skips the risk picture entirely, and a brand evaluating a vendor cannot afford to. Two categories matter: the hard capability limits, which are knowable today, and the platform and reputational exposure, which is more uncertain but material. Sort the decision by where you sit rather than treating the model as a single yes-or-no.

Social-first, stylized content
Lean in

For people, fashion, action, fantasy, and animated content at social resolutions, this is where the model leads and the price advantage is real. Test on your own assets via the free tier first, then scale on the API.

Adopt for these categories
Hero product close-ups
Hold for now

The 720p ceiling and weaker photorealistic-category results make this a poor fit for full-resolution flagship product detail. Wait for the announced Pro Mode, or keep these shots on a higher-resolution pipeline.

Keep on another tool
Long-form sequences
Mind the 15-second wall

Clips cap at 15 seconds and chained video extensions degrade visibly after two or three links. Storyboard around short, self-contained shots rather than expecting a single continuous long take.

Design for short clips
Brand-safety-sensitive teams
Run a governance check

Weigh the platform's content-moderation track record and ongoing regulatory exposure against your own brand-safety standards before associating spend or assets with the channel.

Diligence before commit
The platform-risk context, stated precisely
Be precise about what happened: in late 2025 and early 2026, Grok Imagine’s image generation feature — a separate product from this video model, though it shares the brand and platform — was used to generate non-consensual sexualized content, including images appearing to depict minors. xAI subsequently faced federal lawsuits and regulatory investigations across the US, EU, UK, and Canada that remain ongoing as of June 2026, and it restricted image generation to paid subscribers and refined its classifiers in response. The video model is a distinct product, but a marketer making a vendor decision should factor in the platform’s moderation history and active legal exposure rather than evaluating the model in isolation.

08ConclusionA strong, specific tool — read the asterisks.

The shape of AI video, June 2026

Grok Imagine Video 1.5 is a real option for the right jobs — once you read past the headline.

The general availability of Grok Imagine Video 1.5 is a genuine moment for marketing teams. Aurora’s autoregressive engine delivers the temporal coherence short brand creative needs, native audio removes a production step, and the reported pricing undercuts both Veo 3.1 and the exiting Sora. With a free tier to trial it on, the barrier to testing is close to zero.

The discipline is in the asterisks. The number-one position is on the image-to-video leaderboard only; on text-to-video the model is mid-pack. The 720p ceiling and 15-second limit are architectural, not temporary. The celebrated price has not reconciled across sources, so the saving is a range to verify rather than a percentage to quote. And the platform carries a real moderation and legal history that a brand should weigh.

The right posture is neither hype nor dismissal. Match the model to the jobs it leads — people, fashion, action, fantasy, and animated social content — trial it on your own creative before committing, confirm the current price on the official docs, and keep hero, full-resolution work on a pipeline built for it. Used that way, Grok Imagine Video 1.5 earns a place in a 2026 content operation. Treated as a one-line “best AI video model,” it sets a team up to be surprised.

Put AI video to work in your content engine

Match the right AI video model to each job — and trial it before you commit.

Our team helps brands fold AI video into a real content pipeline — matching the right model to each job, trialing on your own creative, and keeping measurement honest in a market where pricing and leaderboards move weekly.

Free consultationExpert guidanceTailored solutions
What we work on

AI video in production pipelines

  • Model selection by job — Grok, Veo, Seedance, and Kling
  • Free-tier trials on your own products before budget commits
  • Social-first video at scale for the categories that perform
  • Measurement for a market with no settled benchmarks
  • Brand-safety and governance checks on new AI vendors
FAQ · Grok Imagine Video 1.5

The questions marketers ask every week.

Grok Imagine Video 1.5 is xAI's AI video generation model. It moved from preview to general availability on June 16, 2026, with the API model string grok-imagine-video-1.5 reachable through the Imagine API, the web app at grok.com/imagine, and the iOS and Android apps. The model runs on Aurora, an autoregressive engine, and generates audio in the same inference pass as the video. xAI also shipped a consumer-only speed variant called Video 1.5 Fast, which produces a six-second 720p clip in about 25 seconds — but that Fast variant is a UI tier, not a separate API model string, so developers should integrate against grok-imagine-video-1.5.