Business15 min read

Google Cloud GTC 2026: Vera Rubin NVL72 Systems Guide

Google Cloud's GTC 2026 NVIDIA partnership brings Vera Rubin NVL72 rack systems to Cloud TPU infrastructure. Availability, performance specs, and guidance.

Digital Applied Team

March 25, 2026

15 min read

1.4

ExaFLOPs per Rack (FP8)

GPUs per NVL72 System

Q2 2026

Preview Availability Start

Launch Regions at Preview

Key Takeaways

NVL72 delivers 1.4 exaFLOPs per rack at FP8 precision: The Vera Rubin NVL72 configuration brings rack-scale AI compute to Google Cloud with 1.4 exaFLOPs of FP8 training performance, NVLink 6 interconnects, and a memory bandwidth that allows large language model training at scales previously requiring on-premises NVIDIA DGX SuperPOD deployments.

A4 Ultra instances arrive in preview starting Q2 2026: Google Cloud is making NVL72 capacity available through a new A4 Ultra instance family, with preview access beginning in us-central1 and europe-west4 during Q2 2026. Enterprise customers and research institutions can apply for preview access through the Google Cloud console.

Google Cloud now competes directly for large training workloads: The NVL72 partnership closes a gap that previously pushed large model training toward AWS and Azure, which had earlier access to high-end NVIDIA hardware. Google Cloud customers no longer need to choose between proprietary TPU infrastructure and NVIDIA-native alternatives.

TPU v5 and NVL72 serve complementary workloads, not competing ones: Google Cloud's strategy is to offer both architectures simultaneously. TPU v5e and v5p remain the most cost-efficient option for inference and fine-tuning workloads optimized for Google's stack, while NVL72 targets large-scale pretraining and research workloads that rely on CUDA ecosystems.

Google Cloud's GTC 2026 appearance marked a turning point in how the platform positions itself for large-scale AI infrastructure. The announcement of Vera Rubin NVL72 rack-scale systems arriving via the new A4 Ultra instance family closes a gap that has existed since AWS and Azure gained earlier access to high-density NVIDIA hardware configurations. For enterprises evaluating cloud platforms for large model training workloads, this changes the calculus.

The partnership reflects a broader pattern at GTC 2026, where major cloud providers competed to demonstrate the deepest NVIDIA integration rather than positioning their proprietary accelerators as alternatives. Google Cloud is betting that customers want both TPU-native and NVIDIA-native options within the same infrastructure ecosystem. For context on the full scope of NVIDIA's GTC 2026 announcements, including NemoClaw and enterprise agentic AI, the hardware announcements are part of a coherent platform strategy that extends from silicon to software.

Google Cloud GTC 2026 Announcement Overview

Google Cloud's GTC 2026 announcement confirmed a deepened infrastructure partnership with NVIDIA that brings Vera Rubin NVL72 rack-scale systems into Google Cloud's AI Hypercomputer offering alongside existing TPU v5e and v5p capacity. The announcement was made during Jensen Huang's keynote, where Google Cloud CEO Thomas Kurian appeared on stage to outline the availability timeline and positioning.

The core commitment is to provide NVL72 configurations as a first-class offering on Google Cloud, not as a limited research program. Google Cloud is launching the A4 Ultra instance family specifically to expose NVL72 capacity to enterprise customers, with the same SLA, networking, and operational tooling available for existing A3 instances built on H100 hardware.

Vera Rubin GPUs

72 Vera Rubin GPUs per rack connected via NVLink 6, representing the full NVL72 configuration and delivering the highest FLOP density available in cloud infrastructure.

A4 Ultra Instances

New instance family purpose-built for NVL72, available through standard Google Cloud console and APIs with the same operational tooling as existing accelerated compute offerings.

Q2 2026 Preview

Preview availability in us-central1 and europe-west4 from Q2 2026. Enterprise customers and research institutions can apply through the Google Cloud console early access program.

The announcement positions Google Cloud's AI Hypercomputer as a heterogeneous platform rather than a TPU-exclusive offering. This is a meaningful strategic shift. For several years, Google Cloud differentiated on proprietary TPU architecture while ceding large- scale NVIDIA-native training workloads to competitors. The NVL72 partnership signals that Google Cloud is prioritizing breadth of workload support over proprietary differentiation. For context on the financial scale behind this hardware cycle, see the analysis of NVIDIA's $1 trillion order pipeline revealed at the same GTC keynote.

Vera Rubin NVL72 Architecture and Specifications

Vera Rubin is NVIDIA's successor to the Blackwell architecture, named after the astronomer who provided foundational evidence for dark matter. The NVL72 configuration places 72 Vera Rubin GPUs in a single rack, connected by NVLink 6 rather than PCIe, creating a unified memory fabric that spans the entire rack. From the perspective of a training job, an NVL72 rack behaves more like a single very large accelerator than a cluster of individual GPUs.

The key architectural advances over Blackwell NVL72 configurations include higher per-GPU memory bandwidth, increased HBM4 memory capacity per GPU, and the NVLink 6 interconnect that doubles bandwidth per link over the NVLink 4 found in H100 systems. The FP8 training performance of 1.4 exaFLOPs per rack represents a significant step up from the roughly 720 petaFLOPs per rack achievable with H100 NVL8 configurations in dense cluster arrangements.

Vera Rubin NVL72 Key Specifications

Training Performance

1.4 exaFLOPs at FP8 per NVL72 rack configuration

GPU Count

72 Vera Rubin GPUs per rack, NVLink 6 connected

Interconnect

NVLink 6 with double the per-link bandwidth of NVLink 4

Memory

HBM4 per GPU, unified memory fabric across the full rack

The NVL72 form factor matters because it changes the programming model for distributed training. In H100 cluster configurations, training jobs typically use tensor parallelism within a node of 8 GPUs connected via NVLink, then pipeline or data parallelism across nodes using InfiniBand. With NVL72, the NVLink domain extends to 72 GPUs, allowing more of the model to reside in the high-bandwidth interconnect fabric and reducing reliance on slower inter-rack communication for common training architectures.

A4 Ultra Instance Family on Google Cloud

Google Cloud is exposing NVL72 capacity through a new A4 Ultra instance family that fits into the existing accelerated compute naming scheme alongside A3 Mega instances (H100) and A2 Ultra instances (A100). The A4 Ultra family is designed for workloads that need the full NVL72 rack as a single compute unit, rather than slicing individual GPUs into smaller instances.

Instance configurations will be offered at the rack level for large training jobs and at sub-rack configurations for teams that need NVL72 performance characteristics without committing a full rack. The sub-rack options allow research teams and smaller AI companies to evaluate the architecture without the cost of reserving an entire NVL72 system.

Full Rack Configuration

72-GPU NVL72 configurations for large pretraining jobs that benefit from the full NVLink 6 memory fabric. Targeted at enterprises training 100B+ parameter models and AI labs running benchmark-scale experiments.

Sub-Rack Configurations

Partial NVL72 configurations for teams evaluating the architecture without full rack commitment. Enables access to Vera Rubin GPU capabilities and NVLink 6 bandwidth at lower cost entry points.

On-Demand and Reserved

A4 Ultra instances will support both on-demand and committed- use discount pricing, with one-year and three-year reservation options for enterprises with predictable long-term training workloads.

Networking Integration

A4 Ultra instances connect to Google Cloud's Jupiter networking fabric for inter-rack communication, enabling multi-rack training runs that span NVL72 systems with low-latency cluster networking.

Preview access note: A4 Ultra preview availability in Q2 2026 is expected to be capacity-constrained. Organizations with large training workloads should register for early access through the Google Cloud console to secure allocation in the initial rollout cohort.

NVLink 6 Interconnect Advantages

NVLink 6 is the interconnect technology that makes the NVL72 configuration coherent rather than simply a cluster of GPUs. The doubled per-link bandwidth compared to NVLink 4 (found in H100 systems) reduces the communication bottleneck that limits scaling efficiency in large distributed training jobs. When gradient synchronization can complete faster, training throughput approaches linear scaling with GPU count rather than the sub-linear scaling typical of PCIe-connected clusters.

For transformer-based models with billions of parameters, the backward pass involves extensive all-reduce operations across all participating GPUs. In H100 NVL8 configurations, eight GPUs share NVLink bandwidth before traffic crosses to InfiniBand for inter-node communication. With NVL72, 72 GPUs share NVLink 6 bandwidth, meaning most of the gradient synchronization for common model sizes can complete within the NVLink fabric without touching inter-rack networking.

2x Bandwidth

NVLink 6 doubles per-link bandwidth versus NVLink 4, allowing faster gradient synchronization during backward passes in large model training runs.

Unified Memory

Coherent memory fabric across all 72 GPUs allows tensor parallelism at rack scale without the overhead of explicit data transfers between GPUs.

Scaling Efficiency

Reduces reliance on slower inter-rack networking for common model sizes, keeping more communication within the high-bandwidth NVLink fabric and improving MFU.

Model FLOP utilization (MFU) is the practical metric that determines how efficiently a training cluster converts its theoretical peak performance into useful computation. NVLink 6's bandwidth improvement directly raises achievable MFU for large models by keeping more computation in the fast path. For organizations evaluating cloud training costs, higher MFU means fewer GPU-hours and lower cost to reach a given loss threshold.

Availability Timeline and Regional Rollout

Google Cloud has committed to preview availability for A4 Ultra instances in Q2 2026, with initial capacity in us-central1 (Iowa) and europe-west4 (Netherlands). These regions were chosen based on existing data center infrastructure, power availability, and the geographic distribution of expected early adopters in North America and Europe.

Regional expansion beyond the two preview regions will follow Google Cloud's standard capacity rollout process. Based on historical patterns for A3 and previous accelerated instance families, additional US regions such as us-east4 and asia- southeast1 can be expected within the first year following general availability. Google Cloud has not specified a GA date, but preview programs for previous instance families typically ran for three to six months.

Availability Roadmap Summary

Q2 2026

Preview access in us-central1 and europe-west4. Capacity- constrained; early access applications via Google Cloud console.

H2 2026

Expected GA based on historical preview cadence. Additional regions likely to follow shortly after GA announcement.

2027

Broader regional availability, additional instance size configurations, and reserved capacity pricing options expected.

Planning note: Organizations that need to align model training schedules with NVL72 availability should plan for H2 2026 at the earliest for production-grade capacity. Preview access in Q2 2026 is suitable for architecture testing and benchmarking, but volume and SLA commitments will follow the GA release.

Workload Guidance and Target Use Cases

Not every AI workload benefits equally from NVL72 infrastructure. Google Cloud's own guidance distinguishes between workloads that are compute-bound and bandwidth-bound, directing each to the appropriate infrastructure. Understanding this distinction is essential for making cost-effective infrastructure decisions.

Best Fit: Large Pretraining

Training 70B+ parameter models from scratch on large token budgets. NVL72's rack-scale NVLink domain eliminates the inter-node bottleneck that limits H100 cluster MFU on models of this scale.

Best Fit: Multimodal Foundation Models

Video generation, image-language models, and audio-visual architectures that require high memory capacity per GPU and high bandwidth between activations processing different modalities simultaneously.

Consider Alternatives: Fine-Tuning

Fine-tuning 7B to 70B models on task-specific datasets typically achieves similar outcomes on A3 (H100) instances at lower cost. The NVLink 6 bandwidth advantage is most valuable at training scales where communication becomes the bottleneck.

Consider Alternatives: Inference Serving

High-throughput inference workloads are generally better served by TPU v5e or A3 instances optimized for serving latency and cost per token, rather than NVL72 configurations optimized for training throughput.

Google Cloud's AI Hypercomputer orchestration layer manages job placement across TPU and GPU infrastructure, theoretically allowing training pipelines to use NVL72 for pretraining passes and then shift to TPU v5e for distillation and inference serving without separate infrastructure management. This unified control plane is a competitive advantage Google Cloud holds over pure-play NVIDIA cloud providers.

How This Positions Google Cloud vs AWS and Azure

Prior to the NVL72 announcement, AWS and Azure held an advantage in high-density NVIDIA compute for enterprises that needed NVIDIA-native infrastructure. AWS P5 instances (H100 NVL8) and Azure ND H100 v5 instances were the primary options for large NVIDIA-based training, while Google Cloud offered A3 Mega (H100) as a more recent entrant. The Vera Rubin NVL72 on A4 Ultra leapfrogs both competitors' current NVIDIA offerings in raw compute density.

AWS and Azure are expected to announce Vera Rubin availability on their own timelines, and the competitive advantage Google Cloud gains from NVL72 preview access is likely to be temporary. However, for enterprises evaluating multi-year cloud commitments for AI infrastructure, being among the first to offer NVL72 strengthens Google Cloud's credibility as a serious AI infrastructure provider alongside its proprietary TPU heritage.

Important context:Hardware availability alone does not determine cloud platform choice. Ecosystem maturity, software tooling, networking performance, support quality, pricing, and existing workload compatibility all factor into enterprise infrastructure decisions. Google Cloud's NVL72 announcement is one input into a broader evaluation.

TPU v5 and NVL72 Coexistence Strategy

Google Cloud's strategy is emphatically not to replace TPU infrastructure with NVIDIA hardware. TPU v5e and v5p remain the recommended path for inference serving, fine-tuning of JAX- based models, and workloads that benefit from Google's custom optimizations. The NVL72 partnership adds CUDA-ecosystem support for workloads that need it, without requiring customers to abandon existing TPU investments.

In practice, large AI organizations often run mixed infrastructure. Pretraining happens on GPU clusters with PyTorch and CUDA, while inference serving uses TPUs or purpose-built inference accelerators. Google Cloud's ability to serve both phases within a single cloud reduces the friction of multi-cloud management that has led some enterprises to split training and inference workloads across providers.

NVL72 A4 Ultra Strengths

Large-scale pretraining with CUDA/PyTorch
Research workloads requiring latest NVIDIA GPU
Multimodal training with high memory bandwidth
Ecosystem compatibility with NVIDIA toolchain

TPU v5 Strengths

Cost-efficient inference serving at scale
JAX and XLA-optimized training pipelines
Fine-tuning with Google-native ML frameworks
Mature operational tooling and GA support

Business Impact and AI Strategy Implications

For enterprise technology buyers, the Google Cloud NVL72 announcement has immediate and medium-term implications. In the immediate term, organizations with upcoming large model training requirements should include Google Cloud A4 Ultra in their infrastructure evaluation alongside AWS and Azure NVIDIA offerings. The preview access program is worth entering even if GA timelines are uncertain, as preview benchmarks will inform procurement decisions.

In the medium term, the availability of Vera Rubin-class compute through Google Cloud accelerates the development timeline for foundation models and downstream AI products. Faster training runs at lower cost per token trained means the gap between frontier models and enterprise-accessible models continues to shrink. For businesses building on top of foundation models via API, this translates to more capable base models and more competitive pricing from providers who can train and serve more efficiently. For deeper context on how this hardware cycle fits into broader digital transformation trends, see our AI and digital transformation services.

The strategic takeaway for businesses that are not training their own foundation models is that cloud infrastructure competition benefits end users through better models and lower inference costs. Google Cloud's NVL72 adoption, combined with its TPU v5 inference infrastructure, creates a vertically integrated AI platform that can compete for enterprise AI workloads that were previously distributed across specialized providers.

Conclusion

Google Cloud's GTC 2026 announcement of Vera Rubin NVL72 systems through the A4 Ultra instance family represents a meaningful shift in cloud AI infrastructure. The combination of 1.4 exaFLOPs per rack, NVLink 6 interconnects, and preview availability in Q2 2026 positions Google Cloud to compete directly for the large-scale training workloads that have historically preferred AWS and Azure.

The coexistence strategy — keeping TPU v5 infrastructure for inference and JAX-native workloads while adding NVL72 for CUDA-native training — is coherent and reflects the practical reality of how large AI organizations use heterogeneous infrastructure. Organizations planning AI infrastructure investments through 2026 and beyond should include Google Cloud's evolving AI Hypercomputer platform in their evaluation alongside the established NVIDIA-native offerings from AWS and Azure.

Ready to Navigate AI Infrastructure for Your Business?

Cloud AI infrastructure choices affect everything from model quality to operating costs. Our team helps businesses align technology strategy with the rapidly evolving AI hardware landscape.

Get Started Explore AI & Digital Transformation

Free consultation

Expert guidance

Tailored solutions