NVIDIA's COMPUTEX 2026 keynote did something no prior NVIDIA event has done in a single sitting: it pushed every compute tier — personal PC, deskside workstation, standalone CPU, rack-scale server, and physical-AI robot — into the agentic-AI era at once. On May 31, 2026, GTC Taipei was less a product launch than a unified thesis about where computing is heading.

The headline products are RTX Spark, a superchip that runs large models locally on a thin laptop; Vera, NVIDIA's first CPU designed specifically for AI agents; and Vera Rubin, the rack-scale system NVIDIA says has entered full production. Around them sit DGX Station for Windows, the open Cosmos 3 physical-AI model, and the Nemotron 3 model family. Each one is a story; together they are a strategy.

This is a builder's first take, not a press recap. We cover what actually shipped versus what was merely announced, which numbers are independently confirmed versus vendor-stated, and the practical question every engineering team should be asking after a keynote like this: which of my workloads can move on-device, and which still need a rack? Everything below is sourced to NVIDIA's newsroom and corroborating trade press from launch day.

Key takeaways

01
Every compute tier moved on the same day.RTX Spark (laptop), DGX Station for Windows (deskside), Vera (standalone CPU), Vera Rubin NVL72 (rack) and Cosmos 3 / Isaac GR00T (robotics) were announced together — a coherent agentic stack from pocket to data center.
02
RTX Spark rewrites the on-device privacy calculus.A 70-billion-transistor superchip on TSMC 3nm pairing a 20-core Grace CPU with a Blackwell GPU. NVIDIA says it runs 120B-parameter models locally at 1M-token context on a 14mm laptop, so sensitive data need never leave the machine.
03
Vera is the first CPU pitched at agents, not throughput.88 Olympus cores, up to 1.2 TB/s memory bandwidth, designed for branch-heavy, memory-sensitive agentic code: tool calls, sandboxed execution, orchestration. NVIDIA reports more than 1.8x agentic-sandbox performance over x86 in its own tests.
04
Vera Rubin is in production, but the 10x is vendor-stated.NVIDIA says Vera Rubin entered full production with shipments beginning fall 2026 and a 10x agent-throughput gain over Grace Blackwell. That figure has no independent third-party replication as of the keynote — treat it as a strong signal, not a design spec.
05
The open-model lineup grew, with timing caveats.Cosmos 3 (physical-AI omnimodel) shipped in Super and Nano sizes; Nemotron 3 Ultra was announced with weights scheduled for June 4. The post-keynote release date means Ultra is an announcement, not yet a usable model.

01 — The ThesisFive compute tiers, one keynote.

Most COMPUTEX coverage reads as a list of separate product launches. The more useful framing is that NVIDIA moved all five of its compute tiers into the agentic era on the same stage, on the same day. That simultaneity is the actual news: it signals that NVIDIA now treats "an agent" as the unit of work across the entire hardware ladder, from a laptop in your bag to a multi-rack AI factory.

The five tiers map cleanly onto where work happens. RTX Spark is the personal PC. DGX Station for Windows is the deskside workstation. Vera is the standalone CPU for agentic servers. Vera Rubin NVL72 is the rack-scale AI factory engine, and Vera Rubin POD knits five racks into one supercomputer. Cosmos 3 and the Isaac GR00T humanoid reference design extend the same stack out into physical AI.

The PC is being reinvented. For forty years, you launched apps. Click. Type. With RTX Spark and Microsoft Windows, you ask — and the PC does the work.Jensen Huang, NVIDIA CEO — COMPUTEX 2026 Keynote

That "you ask, the PC does the work" line is the through-line of the whole keynote. If the agent is the new unit of compute, then every tier needs to be re-architected around the agent's actual workload pattern — long context, lots of tool calls, branch-heavy reasoning, and unpredictable bursts of retrieval. NVIDIA's argument is that yesterday's silicon, optimized for dense matrix throughput, is not the right shape for that pattern. Whether the silicon delivers is the question the rest of this post pressure-tests.

02 — Personal AIRTX Spark: a frontier model in your bag.

RTX Spark is the most consequential consumer announcement of the keynote. It is a 70-billion-transistor superchip built on TSMC's 3nm process, pairing a 20-core NVIDIA Grace CPU (developed with MediaTek, codename N1X) with a Blackwell RTX GPU over NVLink-C2C at 600 GB/s. NVIDIA positions it as the first Windows laptop chip to run the full CUDA software stack natively — a meaningful detail, because it means the existing CUDA developer ecosystem transfers to the device without a porting effort.

The specifications that matter for builders are the memory and the local-model claim. RTX Spark carries up to 128 GB of unified LPDDR5X memory at 300 GB/s, 6,144 Blackwell CUDA cores with fifth-generation Tensor Cores, and what NVIDIA states is 1 petaflop of FP4 AI performance. NVIDIA says that combination runs 120B-parameter LLMs locally with a 1-million-token context window — all vendor-stated, but corroborated in part by Tom's Hardware and Microsoft's own Windows Experience blog on launch day.

The chip

RTX Spark

70B transistors · TSMC 3nm · NVLink-C2C 600 GB/s

20-core Grace (Arm) CPU + Blackwell RTX GPU + 6,144 CUDA cores. First Windows laptop chip to run the full CUDA stack natively. NVIDIA-stated 1 petaflop FP4.

OEMs: ASUS, Dell, HP, Lenovo, Surface, MSI

The claim

Local 120B at 1M context

128 GB unified LPDDR5X · 300 GB/s

Up to 128 GB unified memory means a large model and a long context can both sit on-device. NVIDIA-stated: 120B-param models, 1M-token windows, no data leaving the laptop.

Vendor-stated · verify on shipping silicon

The security layer

NVIDIA OpenShell

per-agent sandboxes · YAML policy

A new runtime for personal agents on Windows: individual agent sandboxes, declarative policy enforcement, identity, containment, and end-to-end encryption. Ships alongside RTX Spark.

github.com/NVIDIA/OpenShell

The proprietary insight

The real RTX Spark story for enterprise is the privacy calculus. If NVIDIA's 120B-at-1M-context claim holds on shipping silicon, the entire context of an enterprise agent — proprietary code, customer records, internal documents — can stay on a single laptop and never touch a cloud API. At 128 GB of unified memory and 300 GB/s bandwidth, the binding constraint for a 120B model at long context is no longer the silicon; it is the OEM's thermal design in a 14mm chassis. That is the spec to watch when devices ship in fall 2026.

One caution. Trade press (StorageReview) described RTX Spark as "roughly RTX 5070-class" for gaming, but that comparison is not in NVIDIA's official materials, so treat it as informed speculation rather than a spec. The same goes for the headline laptop dimensions — as thin as 14mm, as light as 3 lbs, single-digit watts up to roughly 80W — which are NVIDIA's targets, not measurements of a shipping product. RTX Spark is announced for fall 2026, with more than 30 laptop and 10-plus desktop configurations planned across the ecosystem; it is not sampling or in production today.

03 — The CPU for AgentsVera: a CPU shaped for agents, not throughput.

Vera is the most architecturally interesting announcement for anyone building agentic systems. NVIDIA's pitch is that the CPU — not the GPU — has become the bottleneck for agents, because an agent spends enormous time on branch-heavy, memory-sensitive work that GPUs are bad at: tool calls, sandboxed code execution, Python and JavaScript runs, data retrieval, and orchestration. Vera is the first CPU NVIDIA has designed explicitly around that pattern.

The chip carries 88 NVIDIA Olympus cores with up to 1.2 TB/s of LPDDR5X memory bandwidth. NVIDIA reports up to 50% higher IPC than the previous-generation Grace core and roughly 40% lower peak memory latency than x86, plus support for Spatial Multithreading and up to 1.8 TB/s of coherent GPU-to-CPU bandwidth via NVLink-C2C. The named early adopters read like a who's-who of frontier labs and clouds: Anthropic, OpenAI, ByteDance, CoreWeave, Oracle Cloud Infrastructure, Lambda, Cloudflare, Akamai, and Together AI among them.

AI agents will be the largest users of computing. Vera is the first CPU designed for that future.Jensen Huang, NVIDIA CEO — COMPUTEX 2026 Keynote

The headline performance claim — more than 1.8x higher agentic sandbox performance than x86 CPUs across code compilation, code analysis, and Python execution — is where editorial discipline matters. NVIDIA does not identify the x86 SKU, the configuration, or the test harness behind that number, and there is no independent replication. We present it below decomposed by workload type so builders can reason about it, with every row explicitly marked as NVIDIA-reported.

Olympus cores

Per Vera CPU

NVIDIA's new core, built for branch-heavy agentic code rather than raw FLOPs. Spatial Multithreading targets the unpredictable, memory-bound work an agent actually does between model calls.

vs Grace generation

Memory bandwidth

LPDDR5X per CPU

1.2TB/s

High bandwidth is the point: agentic workloads are memory-sensitive, not just compute-sensitive. Up to 1.8 TB/s coherent bandwidth links the CPU to the GPU over NVLink-C2C.

+50% IPC vs Grace (vendor-stated)

Sandbox perf vs x86

NVIDIA internal benchmark

1.8×

NVIDIA reports more than 1.8x agentic-sandbox performance over x86 across compilation, analysis, and Python execution. The x86 SKU and harness are unspecified; no independent replication.

Vendor-stated · treat as signal

04 — The RackVera Rubin: full production, a 10x claim attached.

Vera Rubin is the rack-scale story, and NVIDIA's framing is that it entered full production as of the keynote, with shipments beginning fall 2026. Both the Vera CPU and the Rubin GPU are built on TSMC's 3nm process — a generation ahead of the N4 used for Grace Blackwell. The flagship configuration, Vera Rubin NVL72, packs 36 Vera CPUs and 72 Rubin GPUs unified by NVLink Switch 6, delivering a stated 3.6 exaFLOPS of NVFP4 inference, 75 TB of fast memory, and 260 TB/s of scale-up bandwidth in a single rack.

Above the rack sits Vera Rubin POD: five purpose-built racks operating as one integrated AI supercomputer, unifying Vera Rubin NVL72, the Vera CPU, Groq 3 LPX, BlueField-4 storage, and Spectrum-6 Ethernet. Jensen Huang described the supply chain behind it as "twice as large as Grace Blackwell," involving 150 ecosystem partners in Taiwan and more than 350 factories across 30 countries. Cloud availability is slated for the second half of 2026 at providers including CoreWeave, Lambda, Oracle Cloud Infrastructure, Microsoft Azure, and IBM Cloud.

The number to watch

NVIDIA's headline rack claim is 10x agent throughput at scale versus the previous-generation Grace Blackwell platform. The comparison baseline is "the previous-generation platform," not a named GPU SKU, and there is no independent third-party replication as of the keynote. Read it as a strong directional signal about where rack-scale agentic inference is heading — not a guaranteed multiplier you can write into a capacity plan.

Agentic AI is a new kind of workload. One prompt can launch a thousand-step journey of reasoning, retrieval, tool use and response generation.Jensen Huang, NVIDIA CEO — COMPUTEX 2026 Keynote

05 — The MatrixOne table across every tier.

Every COMPUTEX recap we have seen is siloed by product. The table below is our attempt to put all the compute tiers side by side so a builder can read across them in one pass and see which tier fits a given inference budget. Figures are drawn from NVIDIA's newsroom releases, Tom's Hardware, and VideoCardz's NVL72 deep-dive; performance numbers marked as such are vendor-stated.

Tier

RTX Spark (personal PC)

Headline specs

1 PF FP4 · 128 GB unified · 300 GB/s

Where it fits

On-device agents that must keep data local. NVIDIA-stated 120B models at 1M context on a 14mm laptop. Fall 2026.

Tier

DGX Station for Windows (deskside)

Headline specs

Up to 20 PF FP4 · 748 GB coherent

Where it fits

A trillion-parameter-class model on an enterprise desk. GB300 Grace Blackwell Ultra, 800 Gb/s networking. Q4 2026.

Tier

Vera CPU (standalone)

Headline specs

88 Olympus cores · 1.2 TB/s memory

Where it fits

Agentic servers: branch-heavy orchestration, sandboxed code, tool calls. The CPU layer of an AI factory.

Tier

Vera Rubin NVL72 (rack)

Headline specs

3.6 EF NVFP4 · 75 TB · 260 TB/s

Where it fits

Rack-scale agentic inference. 36 Vera CPUs + 72 Rubin GPUs. Full production; cloud H2 2026.

Tier

Vera Rubin POD (multi-rack)

Headline specs

Five racks as one supercomputer

Where it fits

AI-factory scale. NVL72 + Vera CPU + Groq 3 LPX + BlueField-4 + Spectrum-6 unified into one system.

Tier	Headline specs	Where it fits
RTX Spark (personal PC)	1 PF FP4 · 128 GB unified · 300 GB/s	On-device agents that must keep data local. NVIDIA-stated 120B models at 1M context on a 14mm laptop. Fall 2026.
DGX Station for Windows (deskside)	Up to 20 PF FP4 · 748 GB coherent	A trillion-parameter-class model on an enterprise desk. GB300 Grace Blackwell Ultra, 800 Gb/s networking. Q4 2026.
Vera CPU (standalone)	88 Olympus cores · 1.2 TB/s memory	Agentic servers: branch-heavy orchestration, sandboxed code, tool calls. The CPU layer of an AI factory.
Vera Rubin NVL72 (rack)	3.6 EF NVFP4 · 75 TB · 260 TB/s	Rack-scale agentic inference. 36 Vera CPUs + 72 Rubin GPUs. Full production; cloud H2 2026.
Vera Rubin POD (multi-rack)	Five racks as one supercomputer	AI-factory scale. NVL72 + Vera CPU + Groq 3 LPX + BlueField-4 + Spectrum-6 unified into one system.

Reading across the matrix, the pattern NVIDIA wants you to see is a continuum rather than a catalog. The same agentic workload — a long-context reasoning loop with heavy tool use — can run on a laptop for privacy, a deskside box for a single power user, or a rack for production scale, with the open-model lineup and CUDA stack carrying across all of them. The practical decision is no longer "cloud or local" as a binary; it is choosing the tier that matches each workload's privacy, latency, and cost profile.

06 — Open ModelsCosmos 3 and Nemotron 3: software to feed the silicon.

Hardware needs models to run, and NVIDIA used the keynote to extend two open-model lines. Cosmos 3 is billed as the world's first open physical-AI omnimodel, using a mixture-of-transformers architecture that pairs a reasoning transformer with an expert generation transformer. It natively handles text, images, video, ambient sound, and action trajectories in a single model, and ships in two sizes available immediately — Super at 32B parameters and Nano at 8B — via build.nvidia.com and Hugging Face, with an Edge size to follow.

NVIDIA reports that, among open models, Cosmos 3 ranks first on seven or more robotics benchmarks, including Physics-IQ, R-Bench, RoboArena, and VANTAGE-Bench. Keep the qualifier: this is a claim about ranking among open models, not against closed frontier models, and the benchmark results are vendor-reported with independent verification still pending. NVIDIA also says Cosmos 3 was trained on roughly 20 trillion multimodal tokens — itself a vendor-stated figure.

Physical AI

Cosmos 3

Super 32B · Nano 8B · Edge soon

Open physical-AI omnimodel handling text, image, video, audio, and action in one mixture-of-transformers model. NVIDIA-stated #1 among open models on 7+ robotics benchmarks.

build.nvidia.com · Hugging Face

Open language

Nemotron 3 family

Ultra ~550B · Super ~100B · Nano 30B

Hybrid latent-MoE models. Nano (4x throughput vs Nemotron 2 Nano) is available at keynote; Super and Ultra use NVIDIA's NVFP4 4-bit training format on Blackwell.

Ultra weights scheduled June 4, 2026

Robotics

Isaac GR00T reference

Unitree H2 Plus · Jetson AGX Thor

An open reference humanoid design (~6 ft, 31 DOF) on a Blackwell-powered Jetson Thor. Early adopters include ETH Zurich, Stanford, and UC San Diego. Hardware late 2026.

Open reference design

Date discipline

Nemotron 3 Ultra — a roughly 550B-parameter, ~55B-active hybrid MoE model — was announced at the keynote, but its weights are scheduled to drop on Hugging Face, ModelScope, and OpenRouter on June 4, 2026, coming later this week. Treat Ultra as an announcement, not a model you can pull today, and treat its stated 300+ tokens/second and 30%-lower-cost figures as vendor claims awaiting independent benchmarking. Nemotron 3 Nano was the family member available at keynote time.

07 — Reading the NumbersThe honest broker on vendor claims.

A keynote this dense is also a wall of benchmarks, and most of them share a property worth naming plainly: they are NVIDIA numbers, run on NVIDIA hardware, evaluated by NVIDIA. That does not make them wrong. It makes them a strong vendor signal rather than a verified design spec — and the difference matters when you are committing a budget. The chart below sorts the keynote's biggest performance claims by how confidently a builder should lean on them today.

How much to lean on each keynote claim · builder confidence

Source: our confidence reading of NVIDIA COMPUTEX 2026 claims + corroborating trade press

RTX Spark specs (transistors, cores, memory)Corroborated by Tom's Hardware + Microsoft

Confirmed-ish

Vera Rubin NVL72 configuration72 GPUs / 36 CPUs / 260 TB/s — VideoCardz corroborated

Confirmed-ish

RTX Spark 120B @ 1M context localVendor-stated · partial third-party corroboration

Vendor-stated

Vera 1.8x agentic sandbox vs x86No x86 SKU, config, or harness disclosed

Vendor-stated

Vera Rubin 10x throughput vs Grace BlackwellBaseline is a platform, not a named SKU · no replication

Vendor-stated

Cosmos 3 #1 on 7+ robotics benchmarksAmong open models only · verification pending

Vendor-stated

The interpretive point is not skepticism for its own sake. NVIDIA has a long track record of shipping silicon that broadly delivers on its architectural promises, and the configuration-level facts — core counts, memory capacities, process nodes — are the kind of thing independent outlets like Tom's Hardware and VideoCardz confirmed on launch day. It is the comparative multipliers — 1.8x, 10x, 300+ tokens/second — that should carry an asterisk until a third party runs the same workload on the same hardware. Build your plan on the confirmed specs; treat the multipliers as upside.

These are NVIDIA benchmarks, run on NVIDIA hardware, evaluated by NVIDIA. Treat them as strong signals, not design specs.Digital Applied — our reading of the COMPUTEX 2026 claims

08 — What To DoWhat builders should actually do next.

None of this hardware ships before fall 2026, so the right move now is not procurement — it is workload classification. The single most valuable exercise after this keynote is to audit your agentic workloads and sort each one by where it should run: on-device for privacy, deskside for a power user, or rack-scale for production throughput. The decision matrix below is how we frame that for clients.

Privacy-bound agents

On-device everything

If your agent touches regulated data, IP, or customer records, RTX Spark's local 120B-at-1M-context claim is the spec to validate when devices ship. No cloud round-trip means a fundamentally different compliance posture.

Watch RTX Spark

Single power user

Deskside frontier

For one engineer or analyst who needs a trillion-parameter-class model with low latency and full data control, DGX Station for Windows is the deskside tier. Q4 2026, pricing undisclosed — budget conservatively.

Evaluate DGX Station

Production scale

Rack-scale inference

High-volume agentic inference still belongs on the rack. Vera Rubin NVL72 cloud availability starts H2 2026 — pilot on a cloud provider before committing capital, and benchmark the 10x claim on your own traffic.

Pilot in the cloud

Open-model strategy

Cosmos 3 + Nemotron 3

Cosmos 3 Super/Nano are downloadable today for physical-AI and multimodal work. Nemotron 3 Nano is available now; Ultra arrives June 4. Start evaluating the open lineup on your own tasks before the silicon lands.

Test open models now

For most agencies and engineering teams, the honest near-term action is to run your own evaluations on the open models that are already downloadable, and to build a cost-and-privacy model for each agentic workload so you are ready to map them onto tiers the moment hardware ships. If you are weighing on-device versus rack-scale deployment for a specific agentic pipeline, our AI and digital transformation engagements start with exactly this kind of workload audit. For teams further along, our CRM automation work is a frequent first home for privacy-bound, on-device agents.

This keynote is best read alongside the longer NVIDIA arc. The production ramp announced here is the operational follow-through on NVIDIA's trillion-dollar AI infrastructure pipeline from the earlier GTC keynote, and the Nemotron 3 and OpenShell announcements extend the enterprise agentic-AI platform unveiled at GTC 2026. Nemotron 3 Ultra is the direct successor to Nemotron 3 Super, NVIDIA's open coding model.

09 — ConclusionThe agentic stack, from pocket to data center.

The shape of agentic compute, May 2026

NVIDIA stopped selling chips and started selling a tier for every agent.

COMPUTEX 2026 was not a single product launch; it was a unified argument that the agent is now the unit of compute, and that every tier of hardware needs to be re-shaped around it. RTX Spark puts a frontier-class model on a laptop, Vera redesigns the CPU around branch-heavy agentic work, and Vera Rubin scales the same idea to a rack and a POD — with Cosmos 3 and Nemotron 3 supplying the open models to run across all of it.

The discipline a builder needs is to separate the confirmed from the vendor-stated. The configuration facts — process nodes, core counts, memory capacities — are solid and partly corroborated by independent outlets. The comparative multipliers — 1.8x for Vera, 10x for Vera Rubin, 300-plus tokens per second for Nemotron 3 Ultra — are NVIDIA's own numbers, awaiting third-party replication. Lean on the first; treat the second as upside to verify.

With hardware shipping in fall 2026, the work to do now is not buying — it is classifying. Audit your agentic workloads, sort them by privacy, latency, and cost, and decide which ones belong on-device, which on a desk, and which on a rack. Teams that finish that exercise before the silicon arrives will be the ones that can move the moment it does, instead of starting the conversation from scratch.

NVIDIA COMPUTEX 2026: five tiers into the agentic era

01 — The ThesisFive compute tiers, one keynote.

02 — Personal AIRTX Spark: a frontier model in your bag.

RTX Spark

Local 120B at 1M context

NVIDIA OpenShell

03 — The CPU for AgentsVera: a CPU shaped for agents, not throughput.

Per Vera CPU

LPDDR5X per CPU

NVIDIA internal benchmark

04 — The RackVera Rubin: full production, a 10x claim attached.

05 — The MatrixOne table across every tier.

06 — Open ModelsCosmos 3 and Nemotron 3: software to feed the silicon.

Cosmos 3

Nemotron 3 family

Isaac GR00T reference

07 — Reading the NumbersThe honest broker on vendor claims.

How much to lean on each keynote claim · builder confidence

08 — What To DoWhat builders should actually do next.

On-device everything

Deskside frontier

Rack-scale inference

Cosmos 3 + Nemotron 3

09 — ConclusionThe agentic stack, from pocket to data center.

NVIDIA stopped selling chips and started selling a tier for every agent.

Turn a keynote of announcements into a deployment plan that's actually shippable.

Agentic compute engagements

The questions we get every week.

Continue exploring frontier releases.

The On-Device Agent Era: Local AI Goes Personal in 2026

Why You Probably Can't Self-Host GLM-5.2 (and Alternatives)

NVIDIA RTX Spark: 1-Petaflop Local AI Agent Box Guide

Nvidia GTC 2026: NemoClaw and Enterprise Agentic AI