NVIDIA Cosmos 3 is the first fully open physical-AI omnimodel — a single model that natively understands and generates text, images, video and ambient sound, and also outputs the action signals a robot needs to move. Announced at GTC Taipei / Computex on May 31, 2026, its weights are downloadable from Hugging Face today.

Earlier physical-AI systems chained separate specialist models — one for scene understanding, another for world generation, a third for predicting actions — and paid an integration tax at every handoff. Cosmos 3 collapses that pipeline into one Mixture-of-Transformers architecture, which is the part of this release that actually changes how robotics and autonomous-vehicle teams build.

This guide covers what shipped, what "omnimodel" means in practice, the two-tower design, the five use modes a single model supports, the three hardware tiers, NVIDIA's leaderboard claims (and why we read every one of them as vendor-stated), and how the OpenMDW-1.1 license changes the calculus for commercial products.

Key takeaways

01
One open model that reasons, simulates, and acts.Cosmos 3 natively handles text, images, video, ambient sound, and action trajectories in a single unified model — NVIDIA's framing of an 'omnimodel'. It is the first physical-AI foundation model to put all of these in one set of weights.
02
A two-tower Mixture-of-Transformers architecture.A Reasoner Tower (autoregressive vision-language model) feeds context to a Generator Tower (diffusion transformer). They share 3D rotary position embeddings, and the Generator cannot run without the Reasoner — reasoning comes before generation by design.
03
Three tiers map to your hardware budget.Super (64B) targets Hopper/Blackwell datacenter GPUs, Nano (16B) runs on RTX PRO 6000 workstation-class GPUs, and Edge (4B) is built for Jetson devices. Edge is 'coming soon' with no announced date — Nano and Super weights are on Hugging Face now.
04
It outputs robot control signals, not just video.Beyond text and video, Cosmos 3 produces numerical control outputs — joint angles, gripper positions, trajectory points. That action output is the key differentiator over prior vision-only generative world models.
05
Open license, but read the leaderboard claims carefully.Cosmos 3 ships under the Linux Foundation's OpenMDW-1.1 license, which permits commercial use and derivatives with a 'Built on NVIDIA Cosmos' attribution. NVIDIA's #1 rankings are stated among open models and were not independently reproduced at launch.

01 — What ShippedA launch at Computex, weights live the same week.

NVIDIA announced Cosmos 3 during Jensen Huang's GTC Taipei keynote at Computex 2026 on May 31, 2026, positioning it as the open frontier foundation model for physical AI. The launch arrived in the same event cycle that NVIDIA used to frame the rest of its physical-AI roadmap — for the wider keynote context, see our GTC Taipei / Computex 2026 keynote first take and the broader order-pipeline analysis from the same keynote.

Cosmos 3 is not NVIDIA's first move into world models. The original Cosmos family, released in 2025, shipped separate models for world generation, physical understanding, and controlled scene generation. Cosmos 3 is best understood as a unification: it folds what previously required chaining several specialist models into one architecture that runs in a single forward pass. Two of the three tiers — Nano and Super — are downloadable now; the smaller Edge tier is announced but not yet released.

Datacenter

Cosmos 3 Super

64B total · 32B Reasoner + 32B Generator

The maximum-capability tier, targeted at NVIDIA Hopper and Blackwell datacenter GPUs. Best for large-scale synthetic data generation and the most demanding world-modeling and policy workloads. Weights available now on Hugging Face.

huggingface.co/nvidia/Cosmos3-Super

Workstation

Cosmos 3 Nano

16B total · 8B Reasoner + 8B Generator

Optimized for RTX PRO 6000 workstation-class GPUs. The pragmatic on-prem starting point for most teams: tractable to run, available as a NIM microservice, and exposed on build.nvidia.com for GPU-free trials.

huggingface.co/nvidia/Cosmos3-Nano

Embedded

Cosmos 3 Edge

4B · Jetson family · coming soon

A compact model built for Jetson edge devices and on-robot inference. NVIDIA describes it as 'coming soon' — no release date has been announced, and it is not downloadable today. Plan device-side deployments around Nano until Edge ships.

Not yet released

Release snapshot

Cosmos 3 launched May 31, 2026 at GTC Taipei / Computex. Cosmos 3 Super and Nano weights are on Hugging Face (the nvidia/cosmos3 collection lists 15 items), with deployable NIM microservices, a Cosmos3OmniPipeline in Hugging Face Diffusers, and a GPU-free trial on build.nvidia.com. Six open synthetic-data datasets ship alongside it, spanning robotics, physics, spatial reasoning, digital humans, autonomous driving, and warehouse operations.

02 — The Omnimodel IdeaWhat an omnimodel actually collapses.

NVIDIA's term for Cosmos 3 is "omnimodel": one model that natively understands and generates text, images, video, and ambient sound, and additionally produces robot action signals. The label matters less than what it removes. Modality encoders — a Vision Transformer for vision, a Variational Autoencoder for generation, and domain-aware vectors for actions — all project into a shared representation space, so the model reasons across modalities instead of bolting them together with glue code.

The differentiator over prior generative world models is the action output. Cosmos 3 does not just describe or render a scene; it can emit numerical control signals — joint angles, gripper positions, and trajectory points — that a robot controller or AV stack can execute directly. That is the difference between a model that imagines the world and one that can be wired into a physical control loop.

Cosmos 3 doesn't just understand the physical world — it generates it, predicts actions within it, and outputs the action trajectories that robot controllers and AV systems need to act in it.— NVIDIA Technical Blog

Here is why the pipeline-collapse framing is the right lens. Before an omnimodel, a robotics team typically ran a perception model to understand a scene, a separate world model to simulate outcomes, and a third policy model to predict actions — passing outputs between them and absorbing integration error and latency at every boundary. A single model that handles all of those tasks does more than improve any one score: it removes inference steps, cuts handoff latency, and simplifies the MLOps stack a team has to maintain. NVIDIA frames this as compressing physical-AI training and evaluation cycles "from months to days" — a vendor claim worth testing against your own workload rather than taking at face value.

03 — ArchitectureA two-tower design where reasoning comes first.

Cosmos 3 is built as a Mixture-of-Transformers (MoT) with a two-tower design. The two towers split the work — and the dependency direction between them is the whole point.

Reasoner Tower — autoregressive vision-language model

The Reasoner is an autoregressive vision-language model that ingests the scene and the instruction and builds a structured understanding of what is happening and what should happen next. It is the "think before you act" half of the model: it produces the context that conditions everything the Generator does.

Generator Tower — diffusion transformer

The Generator is a diffusion-based transformer that produces the output — future video frames, generated worlds, or action trajectories. Critically, the Generator cannot run without the Reasoner's context. Generation is always conditioned on reasoning, rather than the two operating as independent stages you could swap out.

Shared spatial-temporal structure

Both towers share a 3D multi-dimensional rotary position embedding (mRoPE), which gives them a consistent sense of spatial and temporal structure across modalities. That shared coordinate system is part of what lets reasoning and generation stay aligned on the same scene rather than drifting apart.

The key insight

The two-tower split is not just an efficiency trick. Forcing the Generator to depend on the Reasoner is how Cosmos 3 enforces reason-before-generate — the model commits to an understanding of the scene before it predicts frames or actions, which is the behavior you want when the output drives a physical robot.

04 — Five ModesOne set of weights, five ways to use it.

The same Cosmos 3 weights support five distinct use modes, which is the practical expression of the omnimodel claim. Rather than picking a specialist model per task, a builder selects a mode based on the input and output they need. The capability map below pairs each mode with a concrete robotics or autonomous-vehicle example.

Mode 01 · Perceive

Vision Language Model

VLM

Text and video in, text reasoning out. Use it to answer questions about a scene or generate a structured description — e.g. a warehouse robot asked what is on a shelf, or an AV stack reasoning about a traffic situation.

text/video → text

Mode 02 · Simulate

World Model / video generation

Text, image, or video in, generated video out. Produce synthetic worlds and rollouts for training data — a manipulation scene rendered from a prompt, or rare driving scenarios generated for AV evaluation.

text/image/video → video

Mode 03 · Predict

Forward Dynamics Model

FDM

Action plus image in, future video out. Given a candidate action and the current frame, predict what happens next — letting a robot 'imagine' the result of a grasp before it commits, or an AV preview a maneuver.

action + image → video

Mode 04 · Extract

Inverse Dynamics Model

IDM

Video in, action out. Recover the actions implied by a demonstration video — useful for learning from human demonstrations or auto-labeling teleoperation footage into action trajectories for training.

video → action

Mode 05 · Act

Policy Model

Image and text in, video and action out. The full policy loop: a dual-arm robot given a goal produces both the predicted rollout and the joint-angle trajectory to execute it — the mode early partners use for pick-and-place.

image + text → video + action

05 — Tier SelectionWhich tier to pick for your hardware.

Most launch coverage skips the hardware reality, but the tier you choose is dictated by where you deploy. The decision matrix below maps each tier to its hardware target, primary use case, and availability — including the Edge tier that is announced but not yet downloadable, so device-side teams know not to plan around it arriving today.

Tier

Cosmos 3 Super
64B total (32B + 32B)

Hardware target

Hopper / Blackwell datacenter GPUs

Best for & availability

Large-scale synthetic data generation and the heaviest world-modeling and policy workloads. Quantization to BF16, FP8, and NVFP4 supported. Weights available now on Hugging Face; deployable as a NIM microservice.

Tier

Cosmos 3 Nano
16B total (8B + 8B)

Hardware target

RTX PRO 6000 workstation-class GPU

Best for & availability

The pragmatic on-prem starting point. Tractable for most teams, exposed on build.nvidia.com for GPU-free trials, and available as a NIM microservice. Use this to prototype before committing datacenter spend.

Tier

Cosmos 3 Edge
4B · coming soon

Hardware target

Jetson family edge devices

Best for & availability

Compact on-robot / on-device inference. Announced but not yet released — no date given. Do not plan an embedded launch around Edge being downloadable today; prototype on Nano and watch for the release.

Tier	Hardware target	Best for & availability
Cosmos 3 Super 64B total (32B + 32B)	Hopper / Blackwell datacenter GPUs	Large-scale synthetic data generation and the heaviest world-modeling and policy workloads. Quantization to BF16, FP8, and NVFP4 supported. Weights available now on Hugging Face; deployable as a NIM microservice.
Cosmos 3 Nano 16B total (8B + 8B)	RTX PRO 6000 workstation-class GPU	The pragmatic on-prem starting point. Tractable for most teams, exposed on build.nvidia.com for GPU-free trials, and available as a NIM microservice. Use this to prototype before committing datacenter spend.
Cosmos 3 Edge 4B · coming soon	Jetson family edge devices	Compact on-robot / on-device inference. Announced but not yet released — no date given. Do not plan an embedded launch around Edge being downloadable today; prototype on Nano and watch for the release.

Deployment & quantization

Cosmos 3 supports BF16, FP8, and NVFP4 (4-bit) quantization, and NVIDIA states NVFP4 can deliver up to a 2x inference speedup versus baseline — a vendor figure, so benchmark it on your own hardware. Nano and Super are deployable as NIM microservices via Docker with an NGC API key, with infrastructure partners including CoreWeave, Microsoft Azure, Baseten, Nebius, Deep Infra, and Classmethod.

06 — BenchmarksThe leaderboard claims, read honestly.

NVIDIA says Cosmos 3 ranks #1 across a wide set of physical-AI leaderboards. Every one of these is stated among open models, and we treat them all as vendor-stated: NVIDIA had not published point-score comparison tables at launch, and independent reproduction of the rankings was not available. These leaderboards are also new, with limited third-party audit history. We list the claims qualitatively below — no invented scores — because the honest version is more useful than a precise number we cannot verify.

NVIDIA leaderboard claims · among open models (vendor-stated)

Source: NVIDIA, all rankings among open models · *vendor-stated, not independently reproduced at launch

World generation accuracyPhysics-IQ, PAI-Bench, R-Bench, Artificial Analysis

#1*

Robot policy (post-trained Nano)RoboLab simulation, RoboArena real-world DROID

#1*

Smart-infrastructure visionVANTAGE-Bench, TAR (Traffic Anomaly Reasoning)

#1*

What can be said with confidence: the rankings span three distinct problem families — world generation, robot policy, and vision understanding — which is itself notable for a single model, since most systems specialize. What cannot be said yet: how Cosmos 3 compares to closed frontier physical-AI systems, or how the open-model rankings hold up under independent evaluation. For sophisticated teams, the right move is the same one we recommend for any new model — run the eval on your own scenes and tasks, not on the press release.

Trust signal

Physical-AI leaderboards like Physics-IQ, PAI-Bench, RoboArena, and VANTAGE-Bench are relatively new and largely self-reported. Treat NVIDIA's "#1 among open models" as a vendor claim until independent score tables appear — and weight it accordingly in any model-selection decision.

07 — License & AccessOpen weights under a Linux Foundation license.

Cosmos 3 ships under OpenMDW-1.1, a license stewarded by the Linux Foundation. It permits commercial use, training, modification, redistribution, and derivative models. The one notable constraint: products built on Cosmos must display "Built on NVIDIA Cosmos" somewhere visible — a website, UI, about page, or documentation. NVIDIA does not claim ownership of outputs generated from Cosmos or its derivatives.

The licensing story is bigger than one model. NVIDIA adopted OpenMDW-1.1 across four model families at once — Cosmos, Isaac GR00T, Ising, and Nemotron — standardizing open-model licensing across its physical-AI, robotics, quantum, and general foundation-model lines. For the coding-model side of that same strategy, see our coverage of NVIDIA's Nemotron open-model strategy. Note that OpenMDW-1.1 is distinct from NVIDIA's prior NVIDIA Open Model License — Cosmos 3 uses the Linux Foundation standard.

Surface

huggingface.co/nvidia

What you get

Cosmos3-Super + Cosmos3-Nano weights

Best for

On-prem deployment, fine-tuning, quantization. Run via the Cosmos3OmniPipeline in Hugging Face Diffusers. Six open synthetic-data datasets are published in the same org for training.

Surface

NIM microservices

What you get

Docker deploy with an NGC API key

Best for

Production integration on managed infra. Partners include CoreWeave, Microsoft Azure, Baseten, Nebius, Deep Infra, and Classmethod. Best when you want managed serving rather than self-hosting weights.

Surface

build.nvidia.com

What you get

Cosmos 3 Nano Reasoner + full model

Best for

GPU-free trial and evaluation. The fastest path to test prompts and judge fit before any infrastructure decision — two hosted experiences are live at launch.

Surface	What you get	Best for
`huggingface.co/nvidia`	Cosmos3-Super + Cosmos3-Nano weights	On-prem deployment, fine-tuning, quantization. Run via the Cosmos3OmniPipeline in Hugging Face Diffusers. Six open synthetic-data datasets are published in the same org for training.
`NIM microservices`	Docker deploy with an NGC API key	Production integration on managed infra. Partners include CoreWeave, Microsoft Azure, Baseten, Nebius, Deep Infra, and Classmethod. Best when you want managed serving rather than self-hosting weights.
build.nvidia.com	Cosmos 3 Nano Reasoner + full model	GPU-free trial and evaluation. The fastest path to test prompts and judge fit before any infrastructure decision — two hosted experiences are live at launch.

A full builder toolchain ships with the model: Cosmos Curator for data filtering, annotation, and deduplication; Cosmos Evaluator for output scoring; NVIDIA TAO 7 for fine-tuning; and the Cosmos Cookbook of domain-specific recipes, with post-training scripts on the github.com/nvidia/Cosmos repository. If you are weighing an open physical-AI model against alternatives for a specific pipeline, our AI & digital transformation engagements start with exactly this kind of comparative, scenario-grounded evaluation.

08 — ImplicationsWhat this means for builders and the ecosystem.

Cosmos 3 also launched with a coalition and a roster of early adopters, which together signal where this is headed. The NVIDIA Cosmos Coalition has six founding members — Agile Robots, Black Forest Labs, Generalist, LTX, Runway, and Skild AI — drawn deliberately from different corners of the field: Agile Robots is a hardware manufacturer, Black Forest Labs, LTX, and Runway are generative-video labs, Skild AI works on generalist robot policy, and Generalist rounds out the policy side. The coalition's premise is a shared ecosystem with DGX Cloud infrastructure access and open contribution — a different model from a typical vendor-partner program.

Agile Robots is already an early-access partner, using Cosmos 3 to generate action-conditioned training trajectories for its Thor 3 and FR3 robots on complex industrial pick-and-place tasks, with testing inside the European Industrial AI Cloud. The launch adopter list also spans Samsung, LG Electronics, and Doosan Robotics in robotics; Li Auto in autonomous vehicles; and Centific, Fogsphere, Linker Vision, Milestone Systems, and Yuan in vision AI.

Thanks to the training in NVIDIA Cosmos 3, our robotic arms Thor 3 and FR3 can grasp a variation of objects with greater accuracy.— Agile Robots, early-access partner

For builders, the decision tree depends on what you are building. The matrix below sorts the most common physical-AI workload classes — and for the broader open-model landscape this fits into, our open-weight frontier models retrospective and our humanoid robotics pipeline coverage give useful context.

Synthetic data generation

World-model rollouts for training

Generating rare scenarios and labeled rollouts at scale is the clearest immediate win — replacing chained specialist models with one. Start on Super for throughput; validate output quality with Cosmos Evaluator before trusting the data.

Pick Cosmos 3 Super

On-prem prototyping

Workstation-scale evaluation

Most teams should prototype on Nano on an RTX PRO 6000 (or the GPU-free build.nvidia.com trial) before committing datacenter spend. Benchmark on your own scenes — the leaderboards are vendor-stated.

Pick Cosmos 3 Nano

Embedded / on-robot

Device-side inference

Edge (4B, Jetson) is the eventual fit, but it is not released yet. Do not block a roadmap on it. Prototype the policy on Nano now and migrate to Edge once it ships, treating the timeline as unannounced.

Wait on Edge

Production stack

Managed serving vs self-host

If you want managed infra, deploy via NIM microservices on a partner cloud; if you need sovereignty or fine-tuning control, self-host the open weights. Either way, the OpenMDW-1.1 'Built on NVIDIA Cosmos' attribution applies.

Route by control needs

09 — ConclusionThe first open omnimodel for the physical world.

The shape of open physical AI, June 2026

The real story is pipeline collapse, not a single benchmark.

Cosmos 3 is a genuine step in open physical AI: one model that reasons over a scene, generates the world, predicts outcomes, and outputs the action signals a robot needs — under a permissive Linux Foundation license with weights you can download today. The two-tower Mixture-of-Transformers design, where generation is always conditioned on reasoning, is the engineering choice that makes the omnimodel framing more than marketing.

The most consequential change is not any leaderboard position — every one of those is vendor-stated, among open models, and unproven by independent evaluation at launch. It is the pipeline collapse: replacing chained perception, world, and policy models with a single forward pass removes inference steps, handoff latency, and a large slice of MLOps complexity. That is a structural advantage that holds even if the headline rankings move.

The practical move for any serious team is the same as with any new model: ignore the press-release numbers, download Nano or open the build.nvidia.com trial, and run the evaluation on the scenes and tasks you actually care about. Cosmos 3 is the strongest open starting point for physical AI right now — but "strongest open starting point" is a reason to test it seriously, not a reason to ship it on faith.

NVIDIA Cosmos 3: The First Open Physical-AI Omnimodel

01 — What ShippedA launch at Computex, weights live the same week.

Cosmos 3 Super

Cosmos 3 Nano

Cosmos 3 Edge

02 — The Omnimodel IdeaWhat an omnimodel actually collapses.

03 — ArchitectureA two-tower design where reasoning comes first.

Reasoner Tower — autoregressive vision-language model

Generator Tower — diffusion transformer

Shared spatial-temporal structure

04 — Five ModesOne set of weights, five ways to use it.

Vision Language Model

World Model / video generation

Forward Dynamics Model

Inverse Dynamics Model

Policy Model

05 — Tier SelectionWhich tier to pick for your hardware.

06 — BenchmarksThe leaderboard claims, read honestly.

NVIDIA leaderboard claims · among open models (vendor-stated)

07 — License & AccessOpen weights under a Linux Foundation license.

08 — ImplicationsWhat this means for builders and the ecosystem.

World-model rollouts for training

Workstation-scale evaluation

Device-side inference

Managed serving vs self-host

09 — ConclusionThe first open omnimodel for the physical world.

The real story is pipeline collapse, not a single benchmark.

One open model that reasons, simulates, and acts makes physical-AI pipelines dramatically simpler.

Open physical-AI engagements

The questions we get every week.

Continue exploring frontier releases.

Inkling: Murati’s Open-Weight Bet Lands on Hugging Face

Tencent's Hunyuan Hy3: Open-Weight Reasoning Arrives

Fable 5 + GLM-5.2: Orchestrator Brain, Open-Weight Muscle

Why You Probably Can't Self-Host GLM-5.2 (and Alternatives)