AI DevelopmentNew Release11 min readPublished May 31, 2026

One keynote · five compute tiers · agentic AI from pocket to data center

NVIDIA COMPUTEX 2026: five tiers into the agentic era

On May 31, 2026, Jensen Huang's GTC Taipei keynote at COMPUTEX moved every compute tier — from a 14mm laptop to a rack-scale AI factory — into the agentic era on the same day. RTX Spark, Vera, Vera Rubin, DGX Station for Windows, Cosmos 3 and Nemotron 3 all landed together. This is a builder's first take: what shipped, what is still vendor-stated, and what to do about it.

DA
Digital Applied Team
Senior strategists · Published May 31, 2026
PublishedMay 31, 2026
Read time11 min
SourcesNVIDIA Newsroom + trade press
RTX Spark local model
120B
params · 1M context
vendor-stated
RTX Spark unified memory
128GB
LPDDR5X · 300 GB/s
Vera Rubin throughput
10×
agent throughput
vendor-stated vs Grace Blackwell
DGX Station memory
748GB
total coherent

NVIDIA's COMPUTEX 2026 keynote did something no prior NVIDIA event has done in a single sitting: it pushed every compute tier — personal PC, deskside workstation, standalone CPU, rack-scale server, and physical-AI robot — into the agentic-AI era at once. On May 31, 2026, GTC Taipei was less a product launch than a unified thesis about where computing is heading.

The headline products are RTX Spark, a superchip that runs large models locally on a thin laptop; Vera, NVIDIA's first CPU designed specifically for AI agents; and Vera Rubin, the rack-scale system NVIDIA says has entered full production. Around them sit DGX Station for Windows, the open Cosmos 3 physical-AI model, and the Nemotron 3 model family. Each one is a story; together they are a strategy.

This is a builder's first take, not a press recap. We cover what actually shipped versus what was merely announced, which numbers are independently confirmed versus vendor-stated, and the practical question every engineering team should be asking after a keynote like this: which of my workloads can move on-device, and which still need a rack? Everything below is sourced to NVIDIA's newsroom and corroborating trade press from launch day.

Key takeaways
  1. 01
    Every compute tier moved on the same day.RTX Spark (laptop), DGX Station for Windows (deskside), Vera (standalone CPU), Vera Rubin NVL72 (rack) and Cosmos 3 / Isaac GR00T (robotics) were announced together — a coherent agentic stack from pocket to data center.
  2. 02
    RTX Spark rewrites the on-device privacy calculus.A 70-billion-transistor superchip on TSMC 3nm pairing a 20-core Grace CPU with a Blackwell GPU. NVIDIA says it runs 120B-parameter models locally at 1M-token context on a 14mm laptop, so sensitive data need never leave the machine.
  3. 03
    Vera is the first CPU pitched at agents, not throughput.88 Olympus cores, up to 1.2 TB/s memory bandwidth, designed for branch-heavy, memory-sensitive agentic code: tool calls, sandboxed execution, orchestration. NVIDIA reports more than 1.8x agentic-sandbox performance over x86 in its own tests.
  4. 04
    Vera Rubin is in production, but the 10x is vendor-stated.NVIDIA says Vera Rubin entered full production with shipments beginning fall 2026 and a 10x agent-throughput gain over Grace Blackwell. That figure has no independent third-party replication as of the keynote — treat it as a strong signal, not a design spec.
  5. 05
    The open-model lineup grew, with timing caveats.Cosmos 3 (physical-AI omnimodel) shipped in Super and Nano sizes; Nemotron 3 Ultra was announced with weights scheduled for June 4. The post-keynote release date means Ultra is an announcement, not yet a usable model.

01The ThesisFive compute tiers, one keynote.

Most COMPUTEX coverage reads as a list of separate product launches. The more useful framing is that NVIDIA moved all five of its compute tiers into the agentic era on the same stage, on the same day. That simultaneity is the actual news: it signals that NVIDIA now treats "an agent" as the unit of work across the entire hardware ladder, from a laptop in your bag to a multi-rack AI factory.

The five tiers map cleanly onto where work happens. RTX Spark is the personal PC. DGX Station for Windows is the deskside workstation. Vera is the standalone CPU for agentic servers. Vera Rubin NVL72 is the rack-scale AI factory engine, and Vera Rubin POD knits five racks into one supercomputer. Cosmos 3 and the Isaac GR00T humanoid reference design extend the same stack out into physical AI.

The PC is being reinvented. For forty years, you launched apps. Click. Type. With RTX Spark and Microsoft Windows, you ask — and the PC does the work.Jensen Huang, NVIDIA CEO — COMPUTEX 2026 Keynote

That "you ask, the PC does the work" line is the through-line of the whole keynote. If the agent is the new unit of compute, then every tier needs to be re-architected around the agent's actual workload pattern — long context, lots of tool calls, branch-heavy reasoning, and unpredictable bursts of retrieval. NVIDIA's argument is that yesterday's silicon, optimized for dense matrix throughput, is not the right shape for that pattern. Whether the silicon delivers is the question the rest of this post pressure-tests.

02Personal AIRTX Spark: a frontier model in your bag.

RTX Spark is the most consequential consumer announcement of the keynote. It is a 70-billion-transistor superchip built on TSMC's 3nm process, pairing a 20-core NVIDIA Grace CPU (developed with MediaTek, codename N1X) with a Blackwell RTX GPU over NVLink-C2C at 600 GB/s. NVIDIA positions it as the first Windows laptop chip to run the full CUDA software stack natively — a meaningful detail, because it means the existing CUDA developer ecosystem transfers to the device without a porting effort.

The specifications that matter for builders are the memory and the local-model claim. RTX Spark carries up to 128 GB of unified LPDDR5X memory at 300 GB/s, 6,144 Blackwell CUDA cores with fifth-generation Tensor Cores, and what NVIDIA states is 1 petaflop of FP4 AI performance. NVIDIA says that combination runs 120B-parameter LLMs locally with a 1-million-token context window — all vendor-stated, but corroborated in part by Tom's Hardware and Microsoft's own Windows Experience blog on launch day.

The chip
RTX Spark
70B transistors · TSMC 3nm · NVLink-C2C 600 GB/s

20-core Grace (Arm) CPU + Blackwell RTX GPU + 6,144 CUDA cores. First Windows laptop chip to run the full CUDA stack natively. NVIDIA-stated 1 petaflop FP4.

OEMs: ASUS, Dell, HP, Lenovo, Surface, MSI
The claim
Local 120B at 1M context
128 GB unified LPDDR5X · 300 GB/s

Up to 128 GB unified memory means a large model and a long context can both sit on-device. NVIDIA-stated: 120B-param models, 1M-token windows, no data leaving the laptop.

Vendor-stated · verify on shipping silicon
The security layer
NVIDIA OpenShell
per-agent sandboxes · YAML policy

A new runtime for personal agents on Windows: individual agent sandboxes, declarative policy enforcement, identity, containment, and end-to-end encryption. Ships alongside RTX Spark.

github.com/NVIDIA/OpenShell
The proprietary insight
The real RTX Spark story for enterprise is the privacy calculus. If NVIDIA's 120B-at-1M-context claim holds on shipping silicon, the entire context of an enterprise agent — proprietary code, customer records, internal documents — can stay on a single laptop and never touch a cloud API. At 128 GB of unified memory and 300 GB/s bandwidth, the binding constraint for a 120B model at long context is no longer the silicon; it is the OEM's thermal design in a 14mm chassis. That is the spec to watch when devices ship in fall 2026.

One caution. Trade press (StorageReview) described RTX Spark as "roughly RTX 5070-class" for gaming, but that comparison is not in NVIDIA's official materials, so treat it as informed speculation rather than a spec. The same goes for the headline laptop dimensions — as thin as 14mm, as light as 3 lbs, single-digit watts up to roughly 80W — which are NVIDIA's targets, not measurements of a shipping product. RTX Spark is announced for fall 2026, with more than 30 laptop and 10-plus desktop configurations planned across the ecosystem; it is not sampling or in production today.

03The CPU for AgentsVera: a CPU shaped for agents, not throughput.

Vera is the most architecturally interesting announcement for anyone building agentic systems. NVIDIA's pitch is that the CPU — not the GPU — has become the bottleneck for agents, because an agent spends enormous time on branch-heavy, memory-sensitive work that GPUs are bad at: tool calls, sandboxed code execution, Python and JavaScript runs, data retrieval, and orchestration. Vera is the first CPU NVIDIA has designed explicitly around that pattern.

The chip carries 88 NVIDIA Olympus cores with up to 1.2 TB/s of LPDDR5X memory bandwidth. NVIDIA reports up to 50% higher IPC than the previous-generation Grace core and roughly 40% lower peak memory latency than x86, plus support for Spatial Multithreading and up to 1.8 TB/s of coherent GPU-to-CPU bandwidth via NVLink-C2C. The named early adopters read like a who's-who of frontier labs and clouds: Anthropic, OpenAI, ByteDance, CoreWeave, Oracle Cloud Infrastructure, Lambda, Cloudflare, Akamai, and Together AI among them.

AI agents will be the largest users of computing. Vera is the first CPU designed for that future.Jensen Huang, NVIDIA CEO — COMPUTEX 2026 Keynote

The headline performance claim — more than 1.8x higher agentic sandbox performance than x86 CPUs across code compilation, code analysis, and Python execution — is where editorial discipline matters. NVIDIA does not identify the x86 SKU, the configuration, or the test harness behind that number, and there is no independent replication. We present it below decomposed by workload type so builders can reason about it, with every row explicitly marked as NVIDIA-reported.

Olympus cores
Per Vera CPU
88

NVIDIA's new core, built for branch-heavy agentic code rather than raw FLOPs. Spatial Multithreading targets the unpredictable, memory-bound work an agent actually does between model calls.

vs Grace generation
Memory bandwidth
LPDDR5X per CPU
1.2TB/s

High bandwidth is the point: agentic workloads are memory-sensitive, not just compute-sensitive. Up to 1.8 TB/s coherent bandwidth links the CPU to the GPU over NVLink-C2C.

+50% IPC vs Grace (vendor-stated)
Sandbox perf vs x86
NVIDIA internal benchmark
1.8×

NVIDIA reports more than 1.8x agentic-sandbox performance over x86 across compilation, analysis, and Python execution. The x86 SKU and harness are unspecified; no independent replication.

Vendor-stated · treat as signal

04The RackVera Rubin: full production, a 10x claim attached.

Vera Rubin is the rack-scale story, and NVIDIA's framing is that it entered full production as of the keynote, with shipments beginning fall 2026. Both the Vera CPU and the Rubin GPU are built on TSMC's 3nm process — a generation ahead of the N4 used for Grace Blackwell. The flagship configuration, Vera Rubin NVL72, packs 36 Vera CPUs and 72 Rubin GPUs unified by NVLink Switch 6, delivering a stated 3.6 exaFLOPS of NVFP4 inference, 75 TB of fast memory, and 260 TB/s of scale-up bandwidth in a single rack.

Above the rack sits Vera Rubin POD: five purpose-built racks operating as one integrated AI supercomputer, unifying Vera Rubin NVL72, the Vera CPU, Groq 3 LPX, BlueField-4 storage, and Spectrum-6 Ethernet. Jensen Huang described the supply chain behind it as "twice as large as Grace Blackwell," involving 150 ecosystem partners in Taiwan and more than 350 factories across 30 countries. Cloud availability is slated for the second half of 2026 at providers including CoreWeave, Lambda, Oracle Cloud Infrastructure, Microsoft Azure, and IBM Cloud.

The number to watch
NVIDIA's headline rack claim is 10x agent throughput at scale versus the previous-generation Grace Blackwell platform. The comparison baseline is "the previous-generation platform," not a named GPU SKU, and there is no independent third-party replication as of the keynote. Read it as a strong directional signal about where rack-scale agentic inference is heading — not a guaranteed multiplier you can write into a capacity plan.
Agentic AI is a new kind of workload. One prompt can launch a thousand-step journey of reasoning, retrieval, tool use and response generation.Jensen Huang, NVIDIA CEO — COMPUTEX 2026 Keynote

05The MatrixOne table across every tier.

Every COMPUTEX recap we have seen is siloed by product. The table below is our attempt to put all the compute tiers side by side so a builder can read across them in one pass and see which tier fits a given inference budget. Figures are drawn from NVIDIA's newsroom releases, Tom's Hardware, and VideoCardz's NVL72 deep-dive; performance numbers marked as such are vendor-stated.

Tier
RTX Spark (personal PC)
Headline specs
1 PF FP4 · 128 GB unified · 300 GB/s
Where it fits
On-device agents that must keep data local. NVIDIA-stated 120B models at 1M context on a 14mm laptop. Fall 2026.
Tier
DGX Station for Windows (deskside)
Headline specs
Up to 20 PF FP4 · 748 GB coherent
Where it fits
A trillion-parameter-class model on an enterprise desk. GB300 Grace Blackwell Ultra, 800 Gb/s networking. Q4 2026.
Tier
Vera CPU (standalone)
Headline specs
88 Olympus cores · 1.2 TB/s memory
Where it fits
Agentic servers: branch-heavy orchestration, sandboxed code, tool calls. The CPU layer of an AI factory.
Tier
Vera Rubin NVL72 (rack)
Headline specs
3.6 EF NVFP4 · 75 TB · 260 TB/s
Where it fits
Rack-scale agentic inference. 36 Vera CPUs + 72 Rubin GPUs. Full production; cloud H2 2026.
Tier
Vera Rubin POD (multi-rack)
Headline specs
Five racks as one supercomputer
Where it fits
AI-factory scale. NVL72 + Vera CPU + Groq 3 LPX + BlueField-4 + Spectrum-6 unified into one system.

Reading across the matrix, the pattern NVIDIA wants you to see is a continuum rather than a catalog. The same agentic workload — a long-context reasoning loop with heavy tool use — can run on a laptop for privacy, a deskside box for a single power user, or a rack for production scale, with the open-model lineup and CUDA stack carrying across all of them. The practical decision is no longer "cloud or local" as a binary; it is choosing the tier that matches each workload's privacy, latency, and cost profile.

06Open ModelsCosmos 3 and Nemotron 3: software to feed the silicon.

Hardware needs models to run, and NVIDIA used the keynote to extend two open-model lines. Cosmos 3 is billed as the world's first open physical-AI omnimodel, using a mixture-of-transformers architecture that pairs a reasoning transformer with an expert generation transformer. It natively handles text, images, video, ambient sound, and action trajectories in a single model, and ships in two sizes available immediately — Super at 32B parameters and Nano at 8B — via build.nvidia.com and Hugging Face, with an Edge size to follow.

NVIDIA reports that, among open models, Cosmos 3 ranks first on seven or more robotics benchmarks, including Physics-IQ, R-Bench, RoboArena, and VANTAGE-Bench. Keep the qualifier: this is a claim about ranking among open models, not against closed frontier models, and the benchmark results are vendor-reported with independent verification still pending. NVIDIA also says Cosmos 3 was trained on roughly 20 trillion multimodal tokens — itself a vendor-stated figure.

Physical AI
Cosmos 3
Super 32B · Nano 8B · Edge soon

Open physical-AI omnimodel handling text, image, video, audio, and action in one mixture-of-transformers model. NVIDIA-stated #1 among open models on 7+ robotics benchmarks.

build.nvidia.com · Hugging Face
Open language
Nemotron 3 family
Ultra ~550B · Super ~100B · Nano 30B

Hybrid latent-MoE models. Nano (4x throughput vs Nemotron 2 Nano) is available at keynote; Super and Ultra use NVIDIA's NVFP4 4-bit training format on Blackwell.

Ultra weights scheduled June 4, 2026
Robotics
Isaac GR00T reference
Unitree H2 Plus · Jetson AGX Thor

An open reference humanoid design (~6 ft, 31 DOF) on a Blackwell-powered Jetson Thor. Early adopters include ETH Zurich, Stanford, and UC San Diego. Hardware late 2026.

Open reference design
Date discipline
Nemotron 3 Ultra — a roughly 550B-parameter, ~55B-active hybrid MoE model — was announced at the keynote, but its weights are scheduled to drop on Hugging Face, ModelScope, and OpenRouter on June 4, 2026, coming later this week. Treat Ultra as an announcement, not a model you can pull today, and treat its stated 300+ tokens/second and 30%-lower-cost figures as vendor claims awaiting independent benchmarking. Nemotron 3 Nano was the family member available at keynote time.

07Reading the NumbersThe honest broker on vendor claims.

A keynote this dense is also a wall of benchmarks, and most of them share a property worth naming plainly: they are NVIDIA numbers, run on NVIDIA hardware, evaluated by NVIDIA. That does not make them wrong. It makes them a strong vendor signal rather than a verified design spec — and the difference matters when you are committing a budget. The chart below sorts the keynote's biggest performance claims by how confidently a builder should lean on them today.

How much to lean on each keynote claim · builder confidence

Source: our confidence reading of NVIDIA COMPUTEX 2026 claims + corroborating trade press
RTX Spark specs (transistors, cores, memory)Corroborated by Tom's Hardware + Microsoft
Confirmed-ish
Vera Rubin NVL72 configuration72 GPUs / 36 CPUs / 260 TB/s — VideoCardz corroborated
Confirmed-ish
RTX Spark 120B @ 1M context localVendor-stated · partial third-party corroboration
Vendor-stated
Vera 1.8x agentic sandbox vs x86No x86 SKU, config, or harness disclosed
Vendor-stated
Vera Rubin 10x throughput vs Grace BlackwellBaseline is a platform, not a named SKU · no replication
Vendor-stated
Cosmos 3 #1 on 7+ robotics benchmarksAmong open models only · verification pending
Vendor-stated

The interpretive point is not skepticism for its own sake. NVIDIA has a long track record of shipping silicon that broadly delivers on its architectural promises, and the configuration-level facts — core counts, memory capacities, process nodes — are the kind of thing independent outlets like Tom's Hardware and VideoCardz confirmed on launch day. It is the comparative multipliers — 1.8x, 10x, 300+ tokens/second — that should carry an asterisk until a third party runs the same workload on the same hardware. Build your plan on the confirmed specs; treat the multipliers as upside.

These are NVIDIA benchmarks, run on NVIDIA hardware, evaluated by NVIDIA. Treat them as strong signals, not design specs.Digital Applied — our reading of the COMPUTEX 2026 claims

08What To DoWhat builders should actually do next.

None of this hardware ships before fall 2026, so the right move now is not procurement — it is workload classification. The single most valuable exercise after this keynote is to audit your agentic workloads and sort each one by where it should run: on-device for privacy, deskside for a power user, or rack-scale for production throughput. The decision matrix below is how we frame that for clients.

Privacy-bound agents
On-device everything

If your agent touches regulated data, IP, or customer records, RTX Spark's local 120B-at-1M-context claim is the spec to validate when devices ship. No cloud round-trip means a fundamentally different compliance posture.

Watch RTX Spark
Single power user
Deskside frontier

For one engineer or analyst who needs a trillion-parameter-class model with low latency and full data control, DGX Station for Windows is the deskside tier. Q4 2026, pricing undisclosed — budget conservatively.

Evaluate DGX Station
Production scale
Rack-scale inference

High-volume agentic inference still belongs on the rack. Vera Rubin NVL72 cloud availability starts H2 2026 — pilot on a cloud provider before committing capital, and benchmark the 10x claim on your own traffic.

Pilot in the cloud
Open-model strategy
Cosmos 3 + Nemotron 3

Cosmos 3 Super/Nano are downloadable today for physical-AI and multimodal work. Nemotron 3 Nano is available now; Ultra arrives June 4. Start evaluating the open lineup on your own tasks before the silicon lands.

Test open models now

For most agencies and engineering teams, the honest near-term action is to run your own evaluations on the open models that are already downloadable, and to build a cost-and-privacy model for each agentic workload so you are ready to map them onto tiers the moment hardware ships. If you are weighing on-device versus rack-scale deployment for a specific agentic pipeline, our AI and digital transformation engagements start with exactly this kind of workload audit. For teams further along, our CRM automation work is a frequent first home for privacy-bound, on-device agents.

This keynote is best read alongside the longer NVIDIA arc. The production ramp announced here is the operational follow-through on NVIDIA's trillion-dollar AI infrastructure pipeline from the earlier GTC keynote, and the Nemotron 3 and OpenShell announcements extend the enterprise agentic-AI platform unveiled at GTC 2026. Nemotron 3 Ultra is the direct successor to Nemotron 3 Super, NVIDIA's open coding model.

09ConclusionThe agentic stack, from pocket to data center.

The shape of agentic compute, May 2026

NVIDIA stopped selling chips and started selling a tier for every agent.

COMPUTEX 2026 was not a single product launch; it was a unified argument that the agent is now the unit of compute, and that every tier of hardware needs to be re-shaped around it. RTX Spark puts a frontier-class model on a laptop, Vera redesigns the CPU around branch-heavy agentic work, and Vera Rubin scales the same idea to a rack and a POD — with Cosmos 3 and Nemotron 3 supplying the open models to run across all of it.

The discipline a builder needs is to separate the confirmed from the vendor-stated. The configuration facts — process nodes, core counts, memory capacities — are solid and partly corroborated by independent outlets. The comparative multipliers — 1.8x for Vera, 10x for Vera Rubin, 300-plus tokens per second for Nemotron 3 Ultra — are NVIDIA's own numbers, awaiting third-party replication. Lean on the first; treat the second as upside to verify.

With hardware shipping in fall 2026, the work to do now is not buying — it is classifying. Audit your agentic workloads, sort them by privacy, latency, and cost, and decide which ones belong on-device, which on a desk, and which on a rack. Teams that finish that exercise before the silicon arrives will be the ones that can move the moment it does, instead of starting the conversation from scratch.

Plan your agentic compute strategy

Turn a keynote of announcements into a deployment plan that's actually shippable.

Our team helps businesses classify agentic workloads, benchmark open and closed models, and design on-device, deskside, and rack-scale deployments — so you are ready the moment new silicon ships, not months behind it.

Free consultationExpert guidanceTailored solutions
What we work on

Agentic compute engagements

  • Workload audits — on-device vs deskside vs rack-scale
  • Open-model evaluation — Cosmos 3, Nemotron 3, and frontier
  • Privacy-bound on-device agents for regulated data
  • Cost and governance models for agentic inference
  • Multi-vendor routing across open and closed models
FAQ · NVIDIA COMPUTEX 2026

The questions we get every week.

On May 31, 2026, Jensen Huang's GTC Taipei keynote at COMPUTEX moved every compute tier into the agentic-AI era at once. The major announcements were RTX Spark, a superchip for running large models locally on a laptop; Vera, NVIDIA's first CPU designed specifically for AI agents; Vera Rubin, the rack-scale system NVIDIA says entered full production; DGX Station for Windows, a deskside trillion-parameter-class supercomputer; the open Cosmos 3 physical-AI model; and the Nemotron 3 model family. NVIDIA also showed the Isaac GR00T humanoid reference design and a TSMC collaboration to bring AI into chip fabs. The unifying theme was that the AI agent has become the unit of compute across the entire hardware ladder.