NVIDIA's COMPUTEX 2026 keynote did something no prior NVIDIA event has done in a single sitting: it pushed every compute tier — personal PC, deskside workstation, standalone CPU, rack-scale server, and physical-AI robot — into the agentic-AI era at once. On May 31, 2026, GTC Taipei was less a product launch than a unified thesis about where computing is heading.
The headline products are RTX Spark, a superchip that runs large models locally on a thin laptop; Vera, NVIDIA's first CPU designed specifically for AI agents; and Vera Rubin, the rack-scale system NVIDIA says has entered full production. Around them sit DGX Station for Windows, the open Cosmos 3 physical-AI model, and the Nemotron 3 model family. Each one is a story; together they are a strategy.
This is a builder's first take, not a press recap. We cover what actually shipped versus what was merely announced, which numbers are independently confirmed versus vendor-stated, and the practical question every engineering team should be asking after a keynote like this: which of my workloads can move on-device, and which still need a rack? Everything below is sourced to NVIDIA's newsroom and corroborating trade press from launch day.
- 01Every compute tier moved on the same day.RTX Spark (laptop), DGX Station for Windows (deskside), Vera (standalone CPU), Vera Rubin NVL72 (rack) and Cosmos 3 / Isaac GR00T (robotics) were announced together — a coherent agentic stack from pocket to data center.
- 02RTX Spark rewrites the on-device privacy calculus.A 70-billion-transistor superchip on TSMC 3nm pairing a 20-core Grace CPU with a Blackwell GPU. NVIDIA says it runs 120B-parameter models locally at 1M-token context on a 14mm laptop, so sensitive data need never leave the machine.
- 03Vera is the first CPU pitched at agents, not throughput.88 Olympus cores, up to 1.2 TB/s memory bandwidth, designed for branch-heavy, memory-sensitive agentic code: tool calls, sandboxed execution, orchestration. NVIDIA reports more than 1.8x agentic-sandbox performance over x86 in its own tests.
- 04Vera Rubin is in production, but the 10x is vendor-stated.NVIDIA says Vera Rubin entered full production with shipments beginning fall 2026 and a 10x agent-throughput gain over Grace Blackwell. That figure has no independent third-party replication as of the keynote — treat it as a strong signal, not a design spec.
- 05The open-model lineup grew, with timing caveats.Cosmos 3 (physical-AI omnimodel) shipped in Super and Nano sizes; Nemotron 3 Ultra was announced with weights scheduled for June 4. The post-keynote release date means Ultra is an announcement, not yet a usable model.
01 — The ThesisFive compute tiers, one keynote.
Most COMPUTEX coverage reads as a list of separate product launches. The more useful framing is that NVIDIA moved all five of its compute tiers into the agentic era on the same stage, on the same day. That simultaneity is the actual news: it signals that NVIDIA now treats "an agent" as the unit of work across the entire hardware ladder, from a laptop in your bag to a multi-rack AI factory.
The five tiers map cleanly onto where work happens. RTX Spark is the personal PC. DGX Station for Windows is the deskside workstation. Vera is the standalone CPU for agentic servers. Vera Rubin NVL72 is the rack-scale AI factory engine, and Vera Rubin POD knits five racks into one supercomputer. Cosmos 3 and the Isaac GR00T humanoid reference design extend the same stack out into physical AI.
The PC is being reinvented. For forty years, you launched apps. Click. Type. With RTX Spark and Microsoft Windows, you ask — and the PC does the work.Jensen Huang, NVIDIA CEO — COMPUTEX 2026 Keynote
That "you ask, the PC does the work" line is the through-line of the whole keynote. If the agent is the new unit of compute, then every tier needs to be re-architected around the agent's actual workload pattern — long context, lots of tool calls, branch-heavy reasoning, and unpredictable bursts of retrieval. NVIDIA's argument is that yesterday's silicon, optimized for dense matrix throughput, is not the right shape for that pattern. Whether the silicon delivers is the question the rest of this post pressure-tests.
02 — Personal AIRTX Spark: a frontier model in your bag.
RTX Spark is the most consequential consumer announcement of the keynote. It is a 70-billion-transistor superchip built on TSMC's 3nm process, pairing a 20-core NVIDIA Grace CPU (developed with MediaTek, codename N1X) with a Blackwell RTX GPU over NVLink-C2C at 600 GB/s. NVIDIA positions it as the first Windows laptop chip to run the full CUDA software stack natively — a meaningful detail, because it means the existing CUDA developer ecosystem transfers to the device without a porting effort.
The specifications that matter for builders are the memory and the local-model claim. RTX Spark carries up to 128 GB of unified LPDDR5X memory at 300 GB/s, 6,144 Blackwell CUDA cores with fifth-generation Tensor Cores, and what NVIDIA states is 1 petaflop of FP4 AI performance. NVIDIA says that combination runs 120B-parameter LLMs locally with a 1-million-token context window — all vendor-stated, but corroborated in part by Tom's Hardware and Microsoft's own Windows Experience blog on launch day.
RTX Spark
20-core Grace (Arm) CPU + Blackwell RTX GPU + 6,144 CUDA cores. First Windows laptop chip to run the full CUDA stack natively. NVIDIA-stated 1 petaflop FP4.
Local 120B at 1M context
Up to 128 GB unified memory means a large model and a long context can both sit on-device. NVIDIA-stated: 120B-param models, 1M-token windows, no data leaving the laptop.
NVIDIA OpenShell
A new runtime for personal agents on Windows: individual agent sandboxes, declarative policy enforcement, identity, containment, and end-to-end encryption. Ships alongside RTX Spark.
One caution. Trade press (StorageReview) described RTX Spark as "roughly RTX 5070-class" for gaming, but that comparison is not in NVIDIA's official materials, so treat it as informed speculation rather than a spec. The same goes for the headline laptop dimensions — as thin as 14mm, as light as 3 lbs, single-digit watts up to roughly 80W — which are NVIDIA's targets, not measurements of a shipping product. RTX Spark is announced for fall 2026, with more than 30 laptop and 10-plus desktop configurations planned across the ecosystem; it is not sampling or in production today.
03 — The CPU for AgentsVera: a CPU shaped for agents, not throughput.
Vera is the most architecturally interesting announcement for anyone building agentic systems. NVIDIA's pitch is that the CPU — not the GPU — has become the bottleneck for agents, because an agent spends enormous time on branch-heavy, memory-sensitive work that GPUs are bad at: tool calls, sandboxed code execution, Python and JavaScript runs, data retrieval, and orchestration. Vera is the first CPU NVIDIA has designed explicitly around that pattern.
The chip carries 88 NVIDIA Olympus cores with up to 1.2 TB/s of LPDDR5X memory bandwidth. NVIDIA reports up to 50% higher IPC than the previous-generation Grace core and roughly 40% lower peak memory latency than x86, plus support for Spatial Multithreading and up to 1.8 TB/s of coherent GPU-to-CPU bandwidth via NVLink-C2C. The named early adopters read like a who's-who of frontier labs and clouds: Anthropic, OpenAI, ByteDance, CoreWeave, Oracle Cloud Infrastructure, Lambda, Cloudflare, Akamai, and Together AI among them.
AI agents will be the largest users of computing. Vera is the first CPU designed for that future.Jensen Huang, NVIDIA CEO — COMPUTEX 2026 Keynote
The headline performance claim — more than 1.8x higher agentic sandbox performance than x86 CPUs across code compilation, code analysis, and Python execution — is where editorial discipline matters. NVIDIA does not identify the x86 SKU, the configuration, or the test harness behind that number, and there is no independent replication. We present it below decomposed by workload type so builders can reason about it, with every row explicitly marked as NVIDIA-reported.
Per Vera CPU
NVIDIA's new core, built for branch-heavy agentic code rather than raw FLOPs. Spatial Multithreading targets the unpredictable, memory-bound work an agent actually does between model calls.
LPDDR5X per CPU
High bandwidth is the point: agentic workloads are memory-sensitive, not just compute-sensitive. Up to 1.8 TB/s coherent bandwidth links the CPU to the GPU over NVLink-C2C.
NVIDIA internal benchmark
NVIDIA reports more than 1.8x agentic-sandbox performance over x86 across compilation, analysis, and Python execution. The x86 SKU and harness are unspecified; no independent replication.
04 — The RackVera Rubin: full production, a 10x claim attached.
Vera Rubin is the rack-scale story, and NVIDIA's framing is that it entered full production as of the keynote, with shipments beginning fall 2026. Both the Vera CPU and the Rubin GPU are built on TSMC's 3nm process — a generation ahead of the N4 used for Grace Blackwell. The flagship configuration, Vera Rubin NVL72, packs 36 Vera CPUs and 72 Rubin GPUs unified by NVLink Switch 6, delivering a stated 3.6 exaFLOPS of NVFP4 inference, 75 TB of fast memory, and 260 TB/s of scale-up bandwidth in a single rack.
Above the rack sits Vera Rubin POD: five purpose-built racks operating as one integrated AI supercomputer, unifying Vera Rubin NVL72, the Vera CPU, Groq 3 LPX, BlueField-4 storage, and Spectrum-6 Ethernet. Jensen Huang described the supply chain behind it as "twice as large as Grace Blackwell," involving 150 ecosystem partners in Taiwan and more than 350 factories across 30 countries. Cloud availability is slated for the second half of 2026 at providers including CoreWeave, Lambda, Oracle Cloud Infrastructure, Microsoft Azure, and IBM Cloud.
Agentic AI is a new kind of workload. One prompt can launch a thousand-step journey of reasoning, retrieval, tool use and response generation.Jensen Huang, NVIDIA CEO — COMPUTEX 2026 Keynote
05 — The MatrixOne table across every tier.
Every COMPUTEX recap we have seen is siloed by product. The table below is our attempt to put all the compute tiers side by side so a builder can read across them in one pass and see which tier fits a given inference budget. Figures are drawn from NVIDIA's newsroom releases, Tom's Hardware, and VideoCardz's NVL72 deep-dive; performance numbers marked as such are vendor-stated.
| Tier | Headline specs | Where it fits |
|---|---|---|
| RTX Spark (personal PC) | 1 PF FP4 · 128 GB unified · 300 GB/s | On-device agents that must keep data local. NVIDIA-stated 120B models at 1M context on a 14mm laptop. Fall 2026. |
| DGX Station for Windows (deskside) | Up to 20 PF FP4 · 748 GB coherent | A trillion-parameter-class model on an enterprise desk. GB300 Grace Blackwell Ultra, 800 Gb/s networking. Q4 2026. |
| Vera CPU (standalone) | 88 Olympus cores · 1.2 TB/s memory | Agentic servers: branch-heavy orchestration, sandboxed code, tool calls. The CPU layer of an AI factory. |
| Vera Rubin NVL72 (rack) | 3.6 EF NVFP4 · 75 TB · 260 TB/s | Rack-scale agentic inference. 36 Vera CPUs + 72 Rubin GPUs. Full production; cloud H2 2026. |
| Vera Rubin POD (multi-rack) | Five racks as one supercomputer | AI-factory scale. NVL72 + Vera CPU + Groq 3 LPX + BlueField-4 + Spectrum-6 unified into one system. |
Reading across the matrix, the pattern NVIDIA wants you to see is a continuum rather than a catalog. The same agentic workload — a long-context reasoning loop with heavy tool use — can run on a laptop for privacy, a deskside box for a single power user, or a rack for production scale, with the open-model lineup and CUDA stack carrying across all of them. The practical decision is no longer "cloud or local" as a binary; it is choosing the tier that matches each workload's privacy, latency, and cost profile.
06 — Open ModelsCosmos 3 and Nemotron 3: software to feed the silicon.
Hardware needs models to run, and NVIDIA used the keynote to extend two open-model lines. Cosmos 3 is billed as the world's first open physical-AI omnimodel, using a mixture-of-transformers architecture that pairs a reasoning transformer with an expert generation transformer. It natively handles text, images, video, ambient sound, and action trajectories in a single model, and ships in two sizes available immediately — Super at 32B parameters and Nano at 8B — via build.nvidia.com and Hugging Face, with an Edge size to follow.
NVIDIA reports that, among open models, Cosmos 3 ranks first on seven or more robotics benchmarks, including Physics-IQ, R-Bench, RoboArena, and VANTAGE-Bench. Keep the qualifier: this is a claim about ranking among open models, not against closed frontier models, and the benchmark results are vendor-reported with independent verification still pending. NVIDIA also says Cosmos 3 was trained on roughly 20 trillion multimodal tokens — itself a vendor-stated figure.
Cosmos 3
Open physical-AI omnimodel handling text, image, video, audio, and action in one mixture-of-transformers model. NVIDIA-stated #1 among open models on 7+ robotics benchmarks.
Nemotron 3 family
Hybrid latent-MoE models. Nano (4x throughput vs Nemotron 2 Nano) is available at keynote; Super and Ultra use NVIDIA's NVFP4 4-bit training format on Blackwell.
Isaac GR00T reference
An open reference humanoid design (~6 ft, 31 DOF) on a Blackwell-powered Jetson Thor. Early adopters include ETH Zurich, Stanford, and UC San Diego. Hardware late 2026.
07 — Reading the NumbersThe honest broker on vendor claims.
A keynote this dense is also a wall of benchmarks, and most of them share a property worth naming plainly: they are NVIDIA numbers, run on NVIDIA hardware, evaluated by NVIDIA. That does not make them wrong. It makes them a strong vendor signal rather than a verified design spec — and the difference matters when you are committing a budget. The chart below sorts the keynote's biggest performance claims by how confidently a builder should lean on them today.
How much to lean on each keynote claim · builder confidence
Source: our confidence reading of NVIDIA COMPUTEX 2026 claims + corroborating trade pressThe interpretive point is not skepticism for its own sake. NVIDIA has a long track record of shipping silicon that broadly delivers on its architectural promises, and the configuration-level facts — core counts, memory capacities, process nodes — are the kind of thing independent outlets like Tom's Hardware and VideoCardz confirmed on launch day. It is the comparative multipliers — 1.8x, 10x, 300+ tokens/second — that should carry an asterisk until a third party runs the same workload on the same hardware. Build your plan on the confirmed specs; treat the multipliers as upside.
These are NVIDIA benchmarks, run on NVIDIA hardware, evaluated by NVIDIA. Treat them as strong signals, not design specs.Digital Applied — our reading of the COMPUTEX 2026 claims
08 — What To DoWhat builders should actually do next.
None of this hardware ships before fall 2026, so the right move now is not procurement — it is workload classification. The single most valuable exercise after this keynote is to audit your agentic workloads and sort each one by where it should run: on-device for privacy, deskside for a power user, or rack-scale for production throughput. The decision matrix below is how we frame that for clients.
On-device everything
If your agent touches regulated data, IP, or customer records, RTX Spark's local 120B-at-1M-context claim is the spec to validate when devices ship. No cloud round-trip means a fundamentally different compliance posture.
Deskside frontier
For one engineer or analyst who needs a trillion-parameter-class model with low latency and full data control, DGX Station for Windows is the deskside tier. Q4 2026, pricing undisclosed — budget conservatively.
Rack-scale inference
High-volume agentic inference still belongs on the rack. Vera Rubin NVL72 cloud availability starts H2 2026 — pilot on a cloud provider before committing capital, and benchmark the 10x claim on your own traffic.
Cosmos 3 + Nemotron 3
Cosmos 3 Super/Nano are downloadable today for physical-AI and multimodal work. Nemotron 3 Nano is available now; Ultra arrives June 4. Start evaluating the open lineup on your own tasks before the silicon lands.
For most agencies and engineering teams, the honest near-term action is to run your own evaluations on the open models that are already downloadable, and to build a cost-and-privacy model for each agentic workload so you are ready to map them onto tiers the moment hardware ships. If you are weighing on-device versus rack-scale deployment for a specific agentic pipeline, our AI and digital transformation engagements start with exactly this kind of workload audit. For teams further along, our CRM automation work is a frequent first home for privacy-bound, on-device agents.
This keynote is best read alongside the longer NVIDIA arc. The production ramp announced here is the operational follow-through on NVIDIA's trillion-dollar AI infrastructure pipeline from the earlier GTC keynote, and the Nemotron 3 and OpenShell announcements extend the enterprise agentic-AI platform unveiled at GTC 2026. Nemotron 3 Ultra is the direct successor to Nemotron 3 Super, NVIDIA's open coding model.
09 — ConclusionThe agentic stack, from pocket to data center.
NVIDIA stopped selling chips and started selling a tier for every agent.
COMPUTEX 2026 was not a single product launch; it was a unified argument that the agent is now the unit of compute, and that every tier of hardware needs to be re-shaped around it. RTX Spark puts a frontier-class model on a laptop, Vera redesigns the CPU around branch-heavy agentic work, and Vera Rubin scales the same idea to a rack and a POD — with Cosmos 3 and Nemotron 3 supplying the open models to run across all of it.
The discipline a builder needs is to separate the confirmed from the vendor-stated. The configuration facts — process nodes, core counts, memory capacities — are solid and partly corroborated by independent outlets. The comparative multipliers — 1.8x for Vera, 10x for Vera Rubin, 300-plus tokens per second for Nemotron 3 Ultra — are NVIDIA's own numbers, awaiting third-party replication. Lean on the first; treat the second as upside to verify.
With hardware shipping in fall 2026, the work to do now is not buying — it is classifying. Audit your agentic workloads, sort them by privacy, latency, and cost, and decide which ones belong on-device, which on a desk, and which on a rack. Teams that finish that exercise before the silicon arrives will be the ones that can move the moment it does, instead of starting the conversation from scratch.