NVIDIA RTX Spark is the company's first consumer-class superchip — a single system-on-chip that pairs a Blackwell RTX GPU, a 20-core Grace CPU, and up to 128 GB of unified memory, pitched as a way to run large AI agents entirely on the device rather than in the cloud. It was unveiled at the COMPUTEX 2026 keynote, delivered by Jensen Huang at the Taipei Music Center on May 31 / June 1.
The pitch is genuinely new for the personal computer: NVIDIA states that RTX Spark can run a 120-billion-parameter model with a context window of up to one million tokens locally, with no cloud round-trip and no data leaving the machine. If that holds up under independent testing, it reshapes the calculus for any team that has avoided local inference because the hardware couldn't hold a serious model.
This guide covers what was actually announced, the silicon underneath, the single most-confused spec in all the coverage (the bandwidth numbers), why the "1 petaflop" figure needs an FP4 asterisk, and NVIDIA's OpenShell runtime for keeping autonomous agents sandboxed. Every performance figure here is a vendor claim on pre-production hardware — we label them as such and tell you what to verify. For the wider keynote, see our COMPUTEX 2026 keynote first-take.
- 01RTX Spark is one chip, three compute domains.A Blackwell RTX GPU, a 20-core Grace CPU (10× Cortex-X925 + 10× A725), and an NPU share up to 128 GB of LPDDR5X unified memory on a single SoC built on TSMC's 3nm process — roughly 70 billion transistors.
- 02The bandwidth numbers get conflated everywhere.600 GB/s is the NVLink-C2C CPU-to-GPU interconnect bandwidth. The LPDDR5X memory bandwidth is a separate, lower figure (~273–300 GB/s). They are not the same number, and most coverage runs them together.
- 03'1 petaflop' is an FP4 figure, not general compute.NVIDIA rates RTX Spark at 1 petaflop of FP4 (4-bit) AI performance via fifth-gen Tensor Cores. Lower-precision math inflates the TFLOP count; '1 petaflop of FP4' is not '1 petaflop of general AI compute.'
- 04OpenShell is the agent-security layer — and it's early.An open-source runtime with kernel-level sandboxing, a YAML policy engine, and a Privacy Router that masks PII before routing to cloud models. It is a pre-1.0 release (0.0.x), so do not treat it as production-hardened.
- 05Availability is fall 2026; pricing is unconfirmed.NVIDIA projects 30+ laptop models and 10+ desktop configurations across major OEMs, with 100+ Windows ISVs adopting the platform. No OEM has disclosed final retail pricing as of June 1, 2026.
01 — What Was UnveiledA consumer superchip, announced at COMPUTEX.
NVIDIA unveiled RTX Spark during the GTC Taipei / COMPUTEX 2026 keynote, delivered by Jensen Huang at the Taipei Music Center. The timing spans a date line — May 31 in Pacific time, June 1 in Taipei — which is why some coverage dates it to May 31 and some to June 1. Both are correct for different time zones; this post follows the Taipei date.
What was announced is a Windows-first platform built around a single superchip, plus a wave of partner devices, a software ecosystem, and an agent-security runtime called OpenShell. NVIDIA positions it as turning the PC into a machine that runs AI agents locally rather than as a thin client to cloud models. The framing matters: this is not a discrete GPU you add to a tower — it is an integrated SoC that OEMs build laptops and mini PCs around.
RTX Spark
A single SoC with up to 128 GB LPDDR5X unified memory, fifth-gen Tensor Cores with FP4, and NVLink-C2C linking CPU and GPU. NVIDIA's consumer/creator answer for on-device agents.
OpenShell
A secure-by-design runtime for autonomous agents: kernel-level sandboxing, a declarative YAML policy engine, and a Privacy Router. Built on new Windows OS-enforced security primitives.
02 — The SiliconBlackwell GPU, Grace CPU, one package.
RTX Spark integrates three compute domains on one chip. The GPU is a Blackwell RTX part with 6,144 CUDA cores — the same core count as the RTX 5070 laptop GPU — fabricated on TSMC's 3nm process, with roughly 70 billion transistors across the package (vendor figures). It carries fifth-generation Tensor Cores with FP4 (4-bit floating point) support, which is where the headline AI-performance number comes from.
The CPU is a custom NVIDIA Grace part with up to 20 Arm cores: ten Cortex-X925 performance cores at 4.0 GHz and ten A725 efficiency cores at 2.85 GHz, on the Armv9 architecture. That makes RTX Spark a Windows-on-Arm platform — applications run through the silicon's Arm cores natively, and x86 software runs through Windows' Prism emulation layer. NVIDIA states its full CUDA stack — TensorRT, OptiX, DLSS, Reflex, G-SYNC — runs natively on the platform, which is the point that distinguishes it from earlier Arm Windows PCs.
Blackwell RTX CUDA cores
Same CUDA core count as the RTX 5070 laptop GPU, built on TSMC 3nm with fifth-gen Tensor Cores that add FP4 support. All figures vendor-stated.
Custom Grace (Armv9)
10× Cortex-X925 performance cores at 4.0 GHz plus 10× A725 efficiency cores at 2.85 GHz. A Windows-on-Arm platform with native CUDA, per NVIDIA.
Single-digit W to ~80 W
NVIDIA states the chip scales from a few watts at idle to roughly 80 W sustained, enabling both ~14 mm thin laptops (~3 lbs) and desktop mini PCs.
03 — The Bandwidth MathWhy 600 GB/s is not the memory bandwidth.
This is the single most-confused spec in RTX Spark coverage, and getting it right is what separates a useful analysis from a press-release rewrite. There are two distinct bandwidth numbers, and most publications run them together as if they were one.
The 600 GB/s figure is the NVLink-C2C chip-to-chip interconnect bandwidth — the bidirectional link between the Grace CPU and the Blackwell GPU inside the package. NVIDIA states this is roughly 5× the bidirectional bandwidth of PCIe Gen 5 (a vendor comparison). It is not the rate at which the chip reads from its main memory. The system memory is LPDDR5X, and its bandwidth is a separate, lower figure of approximately 273–300 GB/s. Conflating the two overstates the memory subsystem by roughly 2×.
For LLM inference this distinction is load-bearing: token generation speed for a memory-bound model is gated by how fast the chip can stream weights out of DRAM, which is the LPDDR5X figure — not the CPU-GPU interconnect. The table below separates the two cleanly and sets them against other unified-memory platforms.
RTX Spark bandwidth: interconnect vs. memory
| Link | Bandwidth | Type | What moves across it |
|---|---|---|---|
| NVLink-C2C (RTX Spark) | ~600 GB/s | Chip-to-chip interconnect | Data between the Grace CPU and Blackwell GPU |
| LPDDR5X DRAM (RTX Spark) | ~273–300 GB/s | System memory bandwidth | Model weights streamed during inference |
| Apple unified memory (high-end) | See Apple spec sheet | Unified memory bandwidth | Shared CPU/GPU memory — verify the figure on apple.com |
The 600 GB/s headline is the interconnect, not the memory bus. For local LLM throughput, the number that matters is the ~273–300 GB/s LPDDR5X figure.— Our reading of the RTX Spark spec disclosures
04 — The Petaflop AsteriskWhat "1 petaflop" actually means.
NVIDIA rates RTX Spark at 1 petaflop of FP4 AI performance, achieved via the fifth-generation Tensor Cores. The qualifier "FP4" is the whole story. FP4 is 4-bit floating point; running math at 4 bits instead of 16 or 32 lets the same silicon push far more operations per second, because each operation moves a quarter the data of an FP16 operation. So "1 petaflop of FP4" is a real number — but it is not equivalent to "1 petaflop of general-purpose AI compute," and it should never be compared head-to-head against an FP16 or FP32 figure from another platform.
From the same FP4 capability comes the platform's marquee inference claim: that RTX Spark can run a 120-billion-parameter model with up to a one-million-token context window entirely on-device. The mechanism is FP4 quantization — a 120B model in 4-bit weights fits inside 128 GB of unified memory with room for the KV cache. That is a vendor claim about what the platform can do; no independent party has benchmarked a 120B model at 1M context on a shipping device, and the specific token-throughput and time-to-first-token numbers will only be knowable once reviewers test retail hardware.
NVIDIA also published inference-uplift claims for its software stack: roughly a 2× speedup on agentic models via multi-token prediction in llama.cpp, and a 2.6× improvement on the DGX Spark developer box using optimized NVFP4 checkpoints in vLLM. Two cautions. First, all of these are vendor figures with no independent benchmarks at publication. Second, the 2.6× vLLM number references DGX Spark — the Linux developer product — not necessarily the consumer Windows RTX Spark; NVIDIA's materials bundle the two, but they are different products with different software stacks. If on-device economics are your reason for looking at RTX Spark, our inference cost optimization playbook is the right place to model fixed-hardware versus per-token cloud spend.
05 — Agent SecurityOpenShell: the privacy moat nobody is talking about.
If RTX Spark's silicon is the headline, OpenShell is the part that matters most for teams under data-handling obligations. OpenShell is NVIDIA's open-source, secure-by-design runtime for autonomous agents, hosted on GitHub and built on top of new Windows OS-enforced security primitives from Microsoft — identity controls, containment, and policy enforcement — so that an agent runs inside a sandbox on the user's own machine.
The architecture has three enforcement layers. First, programmable filesystem, network, and process isolation at the kernel level — true sandboxing, not a permissions prompt. Second, a declarative YAML policy engine evaluated at the binary, destination, method, and path level, so you can express exactly what an agent is and is not allowed to touch. Third — and this is the sleeper feature — a Privacy Router that keeps sensitive context on-device with local open models and only routes to frontier cloud models (Claude, the GPT family) when policy permits, masking PII in any query that does leave the machine.
Kernel-level isolation
Programmable sandboxing of what an agent can read, reach, and execute — enforced at the kernel rather than the application layer.
YAML policy engine
Declarative rules evaluated at four levels of granularity, so you define what agents can and cannot do without writing enforcement code by hand.
Privacy Router
Keeps sensitive context on-device with local models, routes to frontier cloud models only when policy allows, and disguises personal information in any outbound query.
The compliance case is where this gets strategic. Because sensitive data can stay on the device under the Privacy Router policy, RTX Spark plus OpenShell is a plausible way to satisfy data-residency and data-handling rules — GDPR, HIPAA, and sector-specific obligations — that make cloud inference a liability for some enterprises. That regulatory framing is analyst reasoning rather than a direct NVIDIA claim, and the YAML-policy-plus-binary-level enforcement is a genuinely different posture from a marketing "runs on-device" label. For teams weighing where agents should live, this slots directly into the self-hosted column of our SaaS vs. self-hosted agent deployment matrix.
06 — The Ecosystem30+ laptops, 100+ ISVs, in one season.
What stands out about RTX Spark is not just the chip but the breadth of the launch wave. NVIDIA projects 30+ laptop models and 10+ desktop configurations across major OEMs for the fall 2026 window, with 100+ Windows software providers adopting the platform — those counts are NVIDIA projections, not shipped-device tallies. The first wave of named devices spans ASUS (ProArt P16, P14, and a Mini PC), Dell (XPS 16 Creator Edition), HP (OmniBook Ultra 16, OmniBook X 14), Lenovo (Yoga Pro 9n), MSI, and Microsoft's own Surface Laptop Ultra as the hero device, with a Surface RTX Spark Dev Box for developer workloads.
On the software side, Adobe is rearchitecting Photoshop and Premiere Pro specifically for RTX Spark, targeting a 2× improvement in AI and graphics performance versus the current generation (a vendor target). The named ISV and game-developer list runs from Blackmagic Design, Blender, ComfyUI, and OTOY through KRAFTON, NetEase, Remedy, Riot Games, and Xbox. A multi-generation roadmap was also outlined: the current Blackwell/Grace platform now, a Vera Rubin Spark generation in the 2027–2028 window, and a Rosa Feynman Spark generation around 2029–2030 (all vendor-stated timeframes).
The PC is being reinvented. With RTX Spark and Microsoft Windows, you ask — and the PC does the work.— Jensen Huang, Founder & CEO, NVIDIA (COMPUTEX 2026 keynote)
07 — The ComparisonRTX Spark vs. the rest of the local-AI field.
RTX Spark does not exist in a vacuum. The most common point of confusion is the split between RTX Spark and DGX Spark — NVIDIA's existing developer box. DGX Spark runs DGX Linux on similar Grace Blackwell silicon and is aimed at AI developers; RTX Spark is the Windows-first consumer and creator product that layers the full RTX stack on top. They are different products for different users, and benchmarks for one should not be quoted as proxies for the other.
The table below frames the choice by use case rather than raw spec, and deliberately leaves price as "unconfirmed" for RTX Spark — no OEM had disclosed retail pricing as of June 1, 2026.
Local AI hardware: which one do you actually need?
| Option | OS | Privacy posture | Pricing | Best for |
|---|---|---|---|---|
| RTX Spark | Windows on Arm | On-device via OpenShell (pre-1.0) | Fall 2026 · pricing TBA (premium tier expected) | Creators and teams wanting local agents on Windows |
| DGX Spark | DGX Linux | On-device (developer box) | ~$4,700 (existing product) | AI developers building on Linux |
| Apple unified-memory Mac | macOS | On-device | Verify on apple.com | Mac-native creative and ML workflows |
| Cloud inference API | Any | Data leaves the device (per provider terms) | Per-token, usage-based | Bursty workloads, frontier-model access |
Privacy-bound autonomous workflows
If sensitive data cannot leave the machine, RTX Spark + OpenShell's Privacy Router is the headline use case — once you have piloted the pre-1.0 runtime and verified shipping benchmarks.
Linux AI builds
If your stack is Linux and you are building/training rather than deploying to creators, DGX Spark is the product NVIDIA already ships for that — don't wait on RTX Spark.
Variable, frontier-grade demand
For spiky workloads or when you need the absolute frontier model, per-token cloud inference still wins on flexibility. Fixed local hardware only pays off above a steady usage floor.
Buying decisions this quarter
RTX Spark ships fall 2026 with pricing unannounced and no independent benchmarks yet. Treat it as a roadmap item to evaluate at review, not a purchase to commit to today.
08 — ImplicationsWhat on-device agents change for teams.
Step back from the spec sheet and the strategic shift is clear: if a 120B-class model genuinely runs on a laptop, the long-standing trade-off between capability and data control starts to dissolve. For most of the last few years, "keep it on-device" meant accepting a much smaller model. RTX Spark is NVIDIA's bet that unified memory plus aggressive FP4 quantization closes enough of that gap to make local the default for a meaningful slice of agent workloads — not a compromise.
Projecting forward, the more interesting consequence is competitive. A 30-plus-laptop, 10-plus-desktop wave from essentially every major OEM in a single season is a different cadence from how Arm Windows PCs rolled out before. If even a fraction of those devices land and independent benchmarks hold up, on-device inference stops being a niche enthusiast story and becomes a procurement option IT departments have to price against cloud. The teams that benefit first are the ones that have already done the self-hosting homework — our guide to self-hosting open-weight LLMs maps the deployment decisions this hardware makes cheaper, and the 120B-class models it targets are the same scale as releases like NVIDIA Nemotron 3 Super 120B.
The launch wave · planned RTX Spark ecosystem
Source: NVIDIA projections (vendor-stated counts, not shipped-device tallies)The caveat that should govern every line above: this is a launch announcement, not a shipping-product review. The decisive questions — real token throughput on a 120B model, sustained thermals on a 14 mm laptop, OpenShell's stability under load, and actual retail pricing — are all unanswered until devices reach reviewers in fall 2026. The right posture is to track it closely and plan evaluations, not to rearchitect around vendor figures. If you want help modeling whether on-device agents fit your stack, that is exactly the kind of comparative work our AI transformation engagements begin with.
09 — ConclusionA credible bet, with the receipts still pending.
On-device agents just got a serious platform — now they need independent benchmarks.
RTX Spark is the most consequential consumer-AI hardware announcement of the season: a single superchip that pairs a Blackwell GPU with a 20-core Grace CPU and up to 128 GB of unified memory, with a credible story for running 120B-class agents locally and an open-source security runtime — OpenShell — built to keep those agents sandboxed and privacy-respecting.
The honest framing is the useful one. Every performance figure — 1 petaflop of FP4, the 120B local model, the bandwidth and inference speedups — is a vendor claim on pre-production hardware. The bandwidth numbers in particular are routinely conflated across coverage: 600 GB/s is the NVLink-C2C interconnect, not the ~273–300 GB/s LPDDR5X memory bus, and that distinction governs real inference throughput. OpenShell's 0.0.x versioning signals early software, and pricing is unannounced ahead of a fall 2026 launch.
The broader signal is the one to act on: the industry is moving the agent from the cloud to the desk. If RTX Spark's claims survive independent testing, the question for teams stops being "can we run a serious model locally" and becomes "which workloads should live on-device for control and cost, and which still belong in the cloud." The right move now is to plan that evaluation — not to buy the headline.