NVIDIA Unveils Vera Rubin POD 40-Rack AI Supercomputer for Agentic Workloads

Iris Coleman Mar 16, 2026 19:48

NVIDIA announces Vera Rubin POD featuring 1,152 GPUs across 40 racks, delivering 60 exaflops and 10x better inference performance per watt than Blackwell.

NVIDIA Unveils Vera Rubin POD 40-Rack AI Supercomputer for Agentic Workloads

NVIDIA just dropped the specs on its most ambitious AI infrastructure play yet. The Vera Rubin POD packs 1,152 Rubin GPUs across 40 racks, delivering 60 exaflops of compute power and 10 petabytes per second of total scale-up bandwidth. Production units ship in the second half of 2026.

The numbers here are staggering: 1.2 quadrillion transistors, nearly 20,000 NVIDIA dies, all engineered to function as a single coherent supercomputer. NVIDIA claims 4x better training performance and 10x better inference performance per watt compared to its current Blackwell architecture—with token costs dropping to one-tenth of current levels.

Five Purpose-Built Rack Systems

The POD combines five distinct rack-scale systems, each targeting specific bottlenecks in modern AI workloads:

Vera Rubin NVL72 serves as the core compute engine. Each rack integrates 72 Rubin GPUs and 36 Vera CPUs connected through NVLink 6, which pushes 3.6 TB/s bandwidth per GPU—more total bandwidth than the entire global internet, according to NVIDIA. The system targets all four AI scaling laws: pretraining, post-training, test-time scaling, and agentic scaling.

Groq 3 LPX racks tackle the latency problem. With 256 language processing units per rack using SRAM-only architecture, these pair with NVL72 to deliver what NVIDIA claims is 35x more tokens and 10x more revenue opportunity for trillion-parameter models versus Blackwell.

Vera CPU racks provide sandbox environments for agent testing. A single rack sustains over 22,500 concurrent reinforcement learning environments—critical for validating agentic AI outputs before deployment.

BlueField-4 STX racks introduce what NVIDIA calls "AI-native storage" through the CMX context memory platform. By offloading KV cache to dedicated high-bandwidth storage, the system claims 5x higher tokens-per-second and 5x better power efficiency than traditional approaches.

Spectrum-6 SPX networking racks tie everything together with 102.4 Tb/s switches featuring co-packaged optics.

The Token Economics Argument

NVIDIA frames this around a specific market reality: token consumption now exceeds 10 quadrillion annually, and the shift from human-AI to AI-AI interactions will accelerate that growth dramatically. Modern agentic systems generate massive reasoning token volumes while expanding KV cache requirements—exactly the bottleneck this architecture targets.

Third-party SemiAnalysis InferenceMax benchmarks cited by NVIDIA show current Blackwell systems already deliver 50x better performance per watt and 35x lower cost per token compared to H200. Vera Rubin aims to extend that lead.

Thermal and Power Engineering

The third-generation MGX rack architecture introduces Intelligent Power Smoothing with 6x more rack-level energy storage (400 joules per GPU) than previous generations. This reduces peak current demands by up to 25% and eliminates the need for massive battery packs.

All racks operate at 45°C warm-water inlet temperatures, enabling data centers in many climates to use ambient air cooling. NVIDIA claims this frees enough power to add 10% more racks in the same facility power budget.

Looking Ahead

Beyond the initial POD configuration, NVIDIA previewed Vera Rubin Ultra NVL576 scaling to 576 GPUs across eight racks, and the next-generation Kyber architecture targeting NVL1152 with 144 GPUs per rack. The roadmap suggests NVIDIA sees multi-rack NVLink domains as the future of AI infrastructure—not just bigger GPUs, but fundamentally different system architectures.

For enterprises planning AI infrastructure investments, the message is clear: the economics of AI compute are shifting from chip-level to facility-level optimization. Those building out data centers now face a choice between current-generation systems and waiting for Vera Rubin availability in late 2026.

Image source: Shutterstock

nvidia
ai infrastructure
vera rubin
data centers
enterprise ai

NVIDIA Unveils Vera Rubin POD 40-Rack AI Supercomputer for Agentic Workloads

NVIDIA Unveils Vera Rubin POD 40-Rack AI Supercomputer for Agentic Workloads

Five Purpose-Built Rack Systems

The Token Economics Argument

Thermal and Power Engineering

Looking Ahead

You May Also Like

Anthropic’s Mythos AI Allegedly Compromised by China-Linked Hackers: What You Need to Know

Bitcoin Whale Deposits to Binance Surge 160% as BTC Dips Below $60,000

MEXC Spot Market Adds Ondo Tokenized Plug Power, iShares MSCI South Korea and Japan ETFs

Trending News

Is Doge Losing Steam As Traders Choose Pepeto For The Best Crypto Investment?

US Dollar Steady After Fed Holds Rates, BBH Analysts Weigh In

Myanmar’s president arrives in China to boost trade ties

Faraday Future Intelligent Electric Inc. ( FFAI) Stock: Rebounds Ahead of EAI Robotics Launch

Luxor Ships Commander Software to Optimize Bitcoin Mining Fleet Profitability – News Bytes Bitcoin News

24/7 Live News

Quick Reads

AI Giants Are Going Public: Will 2026 Be the Biggest IPO Year in U.S. History?

Fed Rate Decision June 17: Will a Hawkish Hold Under New Chair Kevin Warsh Pressure Bitcoin?

Tokenized Stocks: Boom and Bust in One Week — SpaceX's SPCX Goes Live on Solana as Avalanche Treasury Crashes 38%

Is Crypto Winter Over? Standard Chartered Calls Bitcoin's $59,000 Bottom — and Declares a “Crypto Spring”

No New Phone, No SIM Swap: How SpaceX Starlink Direct to Cell Actually Works

Crypto Prices