AI DevelopmentNew Release12 min readPublished June 1, 2026

Blackwell + Grace in one SoC · 128 GB unified memory · on-device agent inference

NVIDIA RTX Spark: a 1 PF Local AI Agent Box

At the COMPUTEX 2026 keynote, NVIDIA unveiled RTX Spark — a consumer superchip pairing a Blackwell RTX GPU with a 20-core Grace CPU and up to 128 GB of unified memory. NVIDIA says the platform runs 120-billion-parameter agents on-device. Here is what the silicon actually is, where the headline numbers come from, and what to verify before you build around it.

DA
Digital Applied Team
Senior strategists · Published Jun 1, 2026
PublishedJun 1, 2026
Read time12 min
SourcesNVIDIA, Microsoft, OEMs
AI performance
1PF
FP4 · vendor-stated
Unified memory
128GB
LPDDR5X · shared
Local model size
120B
vendor-stated cap
Availability
Fall
2026 · pricing TBA

NVIDIA RTX Spark is the company's first consumer-class superchip — a single system-on-chip that pairs a Blackwell RTX GPU, a 20-core Grace CPU, and up to 128 GB of unified memory, pitched as a way to run large AI agents entirely on the device rather than in the cloud. It was unveiled at the COMPUTEX 2026 keynote, delivered by Jensen Huang at the Taipei Music Center on May 31 / June 1.

The pitch is genuinely new for the personal computer: NVIDIA states that RTX Spark can run a 120-billion-parameter model with a context window of up to one million tokens locally, with no cloud round-trip and no data leaving the machine. If that holds up under independent testing, it reshapes the calculus for any team that has avoided local inference because the hardware couldn't hold a serious model.

This guide covers what was actually announced, the silicon underneath, the single most-confused spec in all the coverage (the bandwidth numbers), why the "1 petaflop" figure needs an FP4 asterisk, and NVIDIA's OpenShell runtime for keeping autonomous agents sandboxed. Every performance figure here is a vendor claim on pre-production hardware — we label them as such and tell you what to verify. For the wider keynote, see our COMPUTEX 2026 keynote first-take.

Key takeaways
  1. 01
    RTX Spark is one chip, three compute domains.A Blackwell RTX GPU, a 20-core Grace CPU (10× Cortex-X925 + 10× A725), and an NPU share up to 128 GB of LPDDR5X unified memory on a single SoC built on TSMC's 3nm process — roughly 70 billion transistors.
  2. 02
    The bandwidth numbers get conflated everywhere.600 GB/s is the NVLink-C2C CPU-to-GPU interconnect bandwidth. The LPDDR5X memory bandwidth is a separate, lower figure (~273–300 GB/s). They are not the same number, and most coverage runs them together.
  3. 03
    '1 petaflop' is an FP4 figure, not general compute.NVIDIA rates RTX Spark at 1 petaflop of FP4 (4-bit) AI performance via fifth-gen Tensor Cores. Lower-precision math inflates the TFLOP count; '1 petaflop of FP4' is not '1 petaflop of general AI compute.'
  4. 04
    OpenShell is the agent-security layer — and it's early.An open-source runtime with kernel-level sandboxing, a YAML policy engine, and a Privacy Router that masks PII before routing to cloud models. It is a pre-1.0 release (0.0.x), so do not treat it as production-hardened.
  5. 05
    Availability is fall 2026; pricing is unconfirmed.NVIDIA projects 30+ laptop models and 10+ desktop configurations across major OEMs, with 100+ Windows ISVs adopting the platform. No OEM has disclosed final retail pricing as of June 1, 2026.

01What Was UnveiledA consumer superchip, announced at COMPUTEX.

NVIDIA unveiled RTX Spark during the GTC Taipei / COMPUTEX 2026 keynote, delivered by Jensen Huang at the Taipei Music Center. The timing spans a date line — May 31 in Pacific time, June 1 in Taipei — which is why some coverage dates it to May 31 and some to June 1. Both are correct for different time zones; this post follows the Taipei date.

What was announced is a Windows-first platform built around a single superchip, plus a wave of partner devices, a software ecosystem, and an agent-security runtime called OpenShell. NVIDIA positions it as turning the PC into a machine that runs AI agents locally rather than as a thin client to cloud models. The framing matters: this is not a discrete GPU you add to a tower — it is an integrated SoC that OEMs build laptops and mini PCs around.

The superchip
RTX Spark
Blackwell RTX GPU · 20-core Grace CPU · NPU

A single SoC with up to 128 GB LPDDR5X unified memory, fifth-gen Tensor Cores with FP4, and NVLink-C2C linking CPU and GPU. NVIDIA's consumer/creator answer for on-device agents.

nvidia.com/products/rtx-spark
The runtime
OpenShell
Open-source · pre-1.0 (0.0.x)

A secure-by-design runtime for autonomous agents: kernel-level sandboxing, a declarative YAML policy engine, and a Privacy Router. Built on new Windows OS-enforced security primitives.

github.com/NVIDIA/OpenShell
Read this before any number below
Every performance figure in this guide — 1 petaflop FP4, the 120B local model, the bandwidth specs, the inference speedups — is a claim NVIDIA made about pre-production hardware at launch. As of June 1, 2026, no independent third party has benchmarked a shipping RTX Spark device. Treat the numbers as vendor-stated and verify against reviews once retail units ship in fall 2026.

02The SiliconBlackwell GPU, Grace CPU, one package.

RTX Spark integrates three compute domains on one chip. The GPU is a Blackwell RTX part with 6,144 CUDA cores — the same core count as the RTX 5070 laptop GPU — fabricated on TSMC's 3nm process, with roughly 70 billion transistors across the package (vendor figures). It carries fifth-generation Tensor Cores with FP4 (4-bit floating point) support, which is where the headline AI-performance number comes from.

The CPU is a custom NVIDIA Grace part with up to 20 Arm cores: ten Cortex-X925 performance cores at 4.0 GHz and ten A725 efficiency cores at 2.85 GHz, on the Armv9 architecture. That makes RTX Spark a Windows-on-Arm platform — applications run through the silicon's Arm cores natively, and x86 software runs through Windows' Prism emulation layer. NVIDIA states its full CUDA stack — TensorRT, OptiX, DLSS, Reflex, G-SYNC — runs natively on the platform, which is the point that distinguishes it from earlier Arm Windows PCs.

GPU cores
Blackwell RTX CUDA cores
6,144

Same CUDA core count as the RTX 5070 laptop GPU, built on TSMC 3nm with fifth-gen Tensor Cores that add FP4 support. All figures vendor-stated.

~70B transistors
CPU cores
Custom Grace (Armv9)
20

10× Cortex-X925 performance cores at 4.0 GHz plus 10× A725 efficiency cores at 2.85 GHz. A Windows-on-Arm platform with native CUDA, per NVIDIA.

10 P + 10 E
Power envelope
Single-digit W to ~80 W
~80W

NVIDIA states the chip scales from a few watts at idle to roughly 80 W sustained, enabling both ~14 mm thin laptops (~3 lbs) and desktop mini PCs.

vendor-stated

03The Bandwidth MathWhy 600 GB/s is not the memory bandwidth.

This is the single most-confused spec in RTX Spark coverage, and getting it right is what separates a useful analysis from a press-release rewrite. There are two distinct bandwidth numbers, and most publications run them together as if they were one.

The 600 GB/s figure is the NVLink-C2C chip-to-chip interconnect bandwidth — the bidirectional link between the Grace CPU and the Blackwell GPU inside the package. NVIDIA states this is roughly 5× the bidirectional bandwidth of PCIe Gen 5 (a vendor comparison). It is not the rate at which the chip reads from its main memory. The system memory is LPDDR5X, and its bandwidth is a separate, lower figure of approximately 273–300 GB/s. Conflating the two overstates the memory subsystem by roughly 2×.

For LLM inference this distinction is load-bearing: token generation speed for a memory-bound model is gated by how fast the chip can stream weights out of DRAM, which is the LPDDR5X figure — not the CPU-GPU interconnect. The table below separates the two cleanly and sets them against other unified-memory platforms.

Proprietary table · disambiguation

RTX Spark bandwidth: interconnect vs. memory

LinkBandwidthTypeWhat moves across it
NVLink-C2C (RTX Spark)~600 GB/sChip-to-chip interconnectData between the Grace CPU and Blackwell GPU
LPDDR5X DRAM (RTX Spark)~273–300 GB/sSystem memory bandwidthModel weights streamed during inference
Apple unified memory (high-end)See Apple spec sheetUnified memory bandwidthShared CPU/GPU memory — verify the figure on apple.com
Source: NVLink-C2C and LPDDR5X figures are NVIDIA / OEM vendor-stated; the two RTX Spark rows are distinct numbers and must not be conflated. Apple bandwidth figures vary by chip tier — verify the exact value against apple.com before comparing.
The 600 GB/s headline is the interconnect, not the memory bus. For local LLM throughput, the number that matters is the ~273–300 GB/s LPDDR5X figure.— Our reading of the RTX Spark spec disclosures

04The Petaflop AsteriskWhat "1 petaflop" actually means.

NVIDIA rates RTX Spark at 1 petaflop of FP4 AI performance, achieved via the fifth-generation Tensor Cores. The qualifier "FP4" is the whole story. FP4 is 4-bit floating point; running math at 4 bits instead of 16 or 32 lets the same silicon push far more operations per second, because each operation moves a quarter the data of an FP16 operation. So "1 petaflop of FP4" is a real number — but it is not equivalent to "1 petaflop of general-purpose AI compute," and it should never be compared head-to-head against an FP16 or FP32 figure from another platform.

From the same FP4 capability comes the platform's marquee inference claim: that RTX Spark can run a 120-billion-parameter model with up to a one-million-token context window entirely on-device. The mechanism is FP4 quantization — a 120B model in 4-bit weights fits inside 128 GB of unified memory with room for the KV cache. That is a vendor claim about what the platform can do; no independent party has benchmarked a 120B model at 1M context on a shipping device, and the specific token-throughput and time-to-first-token numbers will only be knowable once reviewers test retail hardware.

Why FP4 inflates the TFLOP count
A "petaflop" is a throughput number, and throughput rises as precision falls. FP4 operations move roughly a quarter the data of FP16, so the same Tensor Cores report a much larger flop figure at FP4 than they would at FP16. The honest reading: 1 PF of FP4is the metric for quantized inference workloads — not a general benchmark you can line up against another vendor's mixed-precision number.

NVIDIA also published inference-uplift claims for its software stack: roughly a 2× speedup on agentic models via multi-token prediction in llama.cpp, and a 2.6× improvement on the DGX Spark developer box using optimized NVFP4 checkpoints in vLLM. Two cautions. First, all of these are vendor figures with no independent benchmarks at publication. Second, the 2.6× vLLM number references DGX Spark — the Linux developer product — not necessarily the consumer Windows RTX Spark; NVIDIA's materials bundle the two, but they are different products with different software stacks. If on-device economics are your reason for looking at RTX Spark, our inference cost optimization playbook is the right place to model fixed-hardware versus per-token cloud spend.

05Agent SecurityOpenShell: the privacy moat nobody is talking about.

If RTX Spark's silicon is the headline, OpenShell is the part that matters most for teams under data-handling obligations. OpenShell is NVIDIA's open-source, secure-by-design runtime for autonomous agents, hosted on GitHub and built on top of new Windows OS-enforced security primitives from Microsoft — identity controls, containment, and policy enforcement — so that an agent runs inside a sandbox on the user's own machine.

The architecture has three enforcement layers. First, programmable filesystem, network, and process isolation at the kernel level — true sandboxing, not a permissions prompt. Second, a declarative YAML policy engine evaluated at the binary, destination, method, and path level, so you can express exactly what an agent is and is not allowed to touch. Third — and this is the sleeper feature — a Privacy Router that keeps sensitive context on-device with local open models and only routes to frontier cloud models (Claude, the GPT family) when policy permits, masking PII in any query that does leave the machine.

Layer 1
Kernel-level isolation
filesystem · network · process

Programmable sandboxing of what an agent can read, reach, and execute — enforced at the kernel rather than the application layer.

Sandboxed execution
Layer 2
YAML policy engine
binary · destination · method · path

Declarative rules evaluated at four levels of granularity, so you define what agents can and cannot do without writing enforcement code by hand.

Declarative control
Layer 3
Privacy Router
local-first · PII masking

Keeps sensitive context on-device with local models, routes to frontier cloud models only when policy allows, and disguises personal information in any outbound query.

Compliance unlock
Maturity check — read before relying on it
OpenShell is published as a pre-1.0 release(a 0.0.x version tag at the time of writing), and NVIDIA's developer blog framed it as a developer preview. The architecture is serious, but the versioning signals early-stage software — do not treat it as production-hardened, and pilot it before placing regulated workloads behind it.

The compliance case is where this gets strategic. Because sensitive data can stay on the device under the Privacy Router policy, RTX Spark plus OpenShell is a plausible way to satisfy data-residency and data-handling rules — GDPR, HIPAA, and sector-specific obligations — that make cloud inference a liability for some enterprises. That regulatory framing is analyst reasoning rather than a direct NVIDIA claim, and the YAML-policy-plus-binary-level enforcement is a genuinely different posture from a marketing "runs on-device" label. For teams weighing where agents should live, this slots directly into the self-hosted column of our SaaS vs. self-hosted agent deployment matrix.

06The Ecosystem30+ laptops, 100+ ISVs, in one season.

What stands out about RTX Spark is not just the chip but the breadth of the launch wave. NVIDIA projects 30+ laptop models and 10+ desktop configurations across major OEMs for the fall 2026 window, with 100+ Windows software providers adopting the platform — those counts are NVIDIA projections, not shipped-device tallies. The first wave of named devices spans ASUS (ProArt P16, P14, and a Mini PC), Dell (XPS 16 Creator Edition), HP (OmniBook Ultra 16, OmniBook X 14), Lenovo (Yoga Pro 9n), MSI, and Microsoft's own Surface Laptop Ultra as the hero device, with a Surface RTX Spark Dev Box for developer workloads.

On the software side, Adobe is rearchitecting Photoshop and Premiere Pro specifically for RTX Spark, targeting a 2× improvement in AI and graphics performance versus the current generation (a vendor target). The named ISV and game-developer list runs from Blackmagic Design, Blender, ComfyUI, and OTOY through KRAFTON, NetEase, Remedy, Riot Games, and Xbox. A multi-generation roadmap was also outlined: the current Blackwell/Grace platform now, a Vera Rubin Spark generation in the 2027–2028 window, and a Rosa Feynman Spark generation around 2029–2030 (all vendor-stated timeframes).

The PC is being reinvented. With RTX Spark and Microsoft Windows, you ask — and the PC does the work.— Jensen Huang, Founder & CEO, NVIDIA (COMPUTEX 2026 keynote)
On the Microsoft partnership
Microsoft framed the collaboration in expansive terms in NVIDIA's announcement, with CEO Satya Nadella describing a goal of delivering unmetered intelligence to every home and every desk with Windows. The agent-security primitives that OpenShell builds on are Microsoft's Windows OS-level controls, making this a joint hardware-and-OS play rather than an NVIDIA-only launch.

07The ComparisonRTX Spark vs. the rest of the local-AI field.

RTX Spark does not exist in a vacuum. The most common point of confusion is the split between RTX Spark and DGX Spark — NVIDIA's existing developer box. DGX Spark runs DGX Linux on similar Grace Blackwell silicon and is aimed at AI developers; RTX Spark is the Windows-first consumer and creator product that layers the full RTX stack on top. They are different products for different users, and benchmarks for one should not be quoted as proxies for the other.

The table below frames the choice by use case rather than raw spec, and deliberately leaves price as "unconfirmed" for RTX Spark — no OEM had disclosed retail pricing as of June 1, 2026.

Proprietary table · decision matrix

Local AI hardware: which one do you actually need?

OptionOSPrivacy posturePricingBest for
RTX SparkWindows on ArmOn-device via OpenShell (pre-1.0)Fall 2026 · pricing TBA (premium tier expected)Creators and teams wanting local agents on Windows
DGX SparkDGX LinuxOn-device (developer box)~$4,700 (existing product)AI developers building on Linux
Apple unified-memory MacmacOSOn-deviceVerify on apple.comMac-native creative and ML workflows
Cloud inference APIAnyData leaves the device (per provider terms)Per-token, usage-basedBursty workloads, frontier-model access
Sources: NVIDIA vendor data (RTX Spark, DGX Spark price). RTX Spark pricing was unconfirmed as of June 1, 2026. Apple figures vary by tier — verify on apple.com. Not investment or procurement advice.
On-device agents
Privacy-bound autonomous workflows

If sensitive data cannot leave the machine, RTX Spark + OpenShell's Privacy Router is the headline use case — once you have piloted the pre-1.0 runtime and verified shipping benchmarks.

Watch RTX Spark
Developer tooling
Linux AI builds

If your stack is Linux and you are building/training rather than deploying to creators, DGX Spark is the product NVIDIA already ships for that — don't wait on RTX Spark.

Consider DGX Spark
Bursty inference
Variable, frontier-grade demand

For spiky workloads or when you need the absolute frontier model, per-token cloud inference still wins on flexibility. Fixed local hardware only pays off above a steady usage floor.

Stay on cloud
Procurement now
Buying decisions this quarter

RTX Spark ships fall 2026 with pricing unannounced and no independent benchmarks yet. Treat it as a roadmap item to evaluate at review, not a purchase to commit to today.

Wait for reviews

08ImplicationsWhat on-device agents change for teams.

Step back from the spec sheet and the strategic shift is clear: if a 120B-class model genuinely runs on a laptop, the long-standing trade-off between capability and data control starts to dissolve. For most of the last few years, "keep it on-device" meant accepting a much smaller model. RTX Spark is NVIDIA's bet that unified memory plus aggressive FP4 quantization closes enough of that gap to make local the default for a meaningful slice of agent workloads — not a compromise.

Projecting forward, the more interesting consequence is competitive. A 30-plus-laptop, 10-plus-desktop wave from essentially every major OEM in a single season is a different cadence from how Arm Windows PCs rolled out before. If even a fraction of those devices land and independent benchmarks hold up, on-device inference stops being a niche enthusiast story and becomes a procurement option IT departments have to price against cloud. The teams that benefit first are the ones that have already done the self-hosting homework — our guide to self-hosting open-weight LLMs maps the deployment decisions this hardware makes cheaper, and the 120B-class models it targets are the same scale as releases like NVIDIA Nemotron 3 Super 120B.

The launch wave · planned RTX Spark ecosystem

Source: NVIDIA projections (vendor-stated counts, not shipped-device tallies)
OEM laptop models (planned)Across major partners · fall 2026
30+
Windows ISVs adoptingSoftware providers at launch
100+
OEM desktop configs (planned)Mini PCs and dev boxes
10+
Named hero deviceMicrosoft Surface Laptop Ultra
1

The caveat that should govern every line above: this is a launch announcement, not a shipping-product review. The decisive questions — real token throughput on a 120B model, sustained thermals on a 14 mm laptop, OpenShell's stability under load, and actual retail pricing — are all unanswered until devices reach reviewers in fall 2026. The right posture is to track it closely and plan evaluations, not to rearchitect around vendor figures. If you want help modeling whether on-device agents fit your stack, that is exactly the kind of comparative work our AI transformation engagements begin with.

09ConclusionA credible bet, with the receipts still pending.

The shape of on-device AI, June 2026

On-device agents just got a serious platform — now they need independent benchmarks.

RTX Spark is the most consequential consumer-AI hardware announcement of the season: a single superchip that pairs a Blackwell GPU with a 20-core Grace CPU and up to 128 GB of unified memory, with a credible story for running 120B-class agents locally and an open-source security runtime — OpenShell — built to keep those agents sandboxed and privacy-respecting.

The honest framing is the useful one. Every performance figure — 1 petaflop of FP4, the 120B local model, the bandwidth and inference speedups — is a vendor claim on pre-production hardware. The bandwidth numbers in particular are routinely conflated across coverage: 600 GB/s is the NVLink-C2C interconnect, not the ~273–300 GB/s LPDDR5X memory bus, and that distinction governs real inference throughput. OpenShell's 0.0.x versioning signals early software, and pricing is unannounced ahead of a fall 2026 launch.

The broader signal is the one to act on: the industry is moving the agent from the cloud to the desk. If RTX Spark's claims survive independent testing, the question for teams stops being "can we run a serious model locally" and becomes "which workloads should live on-device for control and cost, and which still belong in the cloud." The right move now is to plan that evaluation — not to buy the headline.

Evaluate on-device AI for your stack

On-device agents are becoming real — make sure the decision is evidence-based.

Our team helps businesses evaluate on-device versus cloud AI, benchmark local-inference hardware against per-token economics, and design agent deployments that respect data-residency obligations — delivered in days not quarters.

Free consultationExpert guidanceTailored solutions
What we work on

On-device & hybrid AI engagements

  • Local vs. cloud inference cost modeling
  • Agent deployment under GDPR / HIPAA data-residency rules
  • Open-weight model benchmarking on your workloads
  • Hybrid routing — on-device privacy plus frontier cloud
  • Hardware procurement strategy for AI teams
FAQ · RTX Spark guide

The questions we get every week.

RTX Spark is NVIDIA's first consumer-class superchip — a single system-on-chip that combines a Blackwell RTX GPU, a 20-core Grace CPU (ten Cortex-X925 performance cores and ten A725 efficiency cores), an NPU, and up to 128 GB of LPDDR5X unified memory, fabricated on TSMC's 3nm process. It was unveiled at the COMPUTEX 2026 keynote on May 31 / June 1. NVIDIA positions it as a Windows-on-Arm platform for running AI agents on-device rather than in the cloud, with the full CUDA stack supported natively. It is integrated into OEM laptops and mini PCs rather than sold as a discrete add-in GPU. All performance specifications NVIDIA disclosed are vendor claims on pre-production hardware.