NVIDIA releases Dynamo 1.0, an open-source inference OS adopted by AWS, Azure, Google Cloud, and major AI companies. Claims 7x performance gains on Blackwell GPUsNVIDIA releases Dynamo 1.0, an open-source inference OS adopted by AWS, Azure, Google Cloud, and major AI companies. Claims 7x performance gains on Blackwell GPUs

NVIDIA Dynamo 1.0 Ships With 7x Inference Boost for AI Data Centers

2026/03/17 05:10
Okuma süresi: 3 dk
Bu içerikle ilgili geri bildirim veya endişeleriniz için lütfen [email protected] üzerinden bizimle iletişime geçin.

NVIDIA Dynamo 1.0 Ships With 7x Inference Boost for AI Data Centers

Luisa Crawford Mar 16, 2026 21:10

NVIDIA releases Dynamo 1.0, an open-source inference OS adopted by AWS, Azure, Google Cloud, and major AI companies. Claims 7x performance gains on Blackwell GPUs.

NVIDIA Dynamo 1.0 Ships With 7x Inference Boost for AI Data Centers

NVIDIA shipped Dynamo 1.0 on March 16, 2026, marking the production release of what the company calls the first operating system purpose-built for AI inference at data center scale. The open-source framework has already secured adoption from AWS, Microsoft Azure, Google Cloud, and Oracle Cloud Infrastructure, alongside production deployments at Perplexity, PayPal, Pinterest, and Cursor.

The headline number: a 7x increase in requests served on NVIDIA Blackwell GPUs, according to the SemiAnalysis InferenceX benchmark running DeepSeek R1-0528. That performance gain comes from Dynamo's disaggregated serving architecture combined with wide expert parallel processing across GB200 NVL72 systems.

What Dynamo Actually Does

Modern AI reasoning models have grown too large for single GPUs. Dynamo orchestrates inference workloads across multiple GPU nodes, handling the coordination that becomes nightmarish at scale. The framework splits work into three core components: a GPU Planner for dynamic resource management, a Smart Router that optimizes request distribution based on KV cache state, and a memory manager that shuttles data between GPU memory and cheaper storage tiers.

For enterprises running agentic AI workflows—where multiple models interact with external tools—Dynamo introduces "agent hints" that let applications signal latency sensitivity and expected output length. Running with NVIDIA's NeMo Agent Toolkit, this delivered 4x lower time-to-first-token and 1.5x higher throughput on Llama 3.1 using Hopper GPUs.

Production Adoption Accelerates

The adopter list reads like a who's who of cloud and AI infrastructure. AstraZeneca, ByteDance, CoreWeave, Tencent Cloud, and Together AI have deployed Dynamo in production. Storage vendors including Dell, IBM, NetApp, and WEKA have built integrations for KV cache offloading beyond GPU memory limits.

Open source integration runs deep. SGLang, vLLM, and TensorRT LLM all use Dynamo's NIXL library for KV cache transfers. LangChain built a direct integration for injecting routing hints. Microsoft contributed deployment guides and hardening patches after testing on Azure Kubernetes Service.

New Capabilities in 1.0

ModelExpress cuts replica startup time by 7x for large mixture-of-experts models like DeepSeek v3. Instead of each new worker downloading and initializing weights independently, Dynamo loads once and streams weights over NVLink to additional GPUs.

Multimodal workloads get dedicated optimizations. Disaggregated encode/prefill/decode separates image processing from text generation, with an embedding cache that skips GPU encoding for repeated images—yielding 30% faster time-to-first-token on the Qwen3-VL-30B model.

Video generation support arrived through integrations with FastVideo and SGLang Diffusion. NVIDIA demonstrated generating a 5-second video in roughly 40 seconds on a single Hopper GPU using Wan2.1.

The Infrastructure Play

Dynamo fits NVIDIA's broader strategy of owning the full AI stack beyond silicon. As inference costs become the dominant expense for AI deployments, software that squeezes more throughput from existing hardware becomes as valuable as the GPUs themselves. The open-source approach—unusual for NVIDIA—suggests the company views ecosystem lock-in as more valuable than licensing revenue.

For data center operators evaluating Blackwell purchases, Dynamo's performance claims change the ROI math. A 7x throughput improvement on the same hardware effectively slashes per-inference costs, though real-world results will vary based on model architecture and workload patterns. The framework's roadmap targets reinforcement learning and expanded multimodal capabilities—areas where inference demands are only growing.

Image source: Shutterstock
  • nvidia
  • ai infrastructure
  • dynamo
  • machine learning
  • enterprise ai
Piyasa Fırsatı
Ucan fix life in1day Logosu
Ucan fix life in1day Fiyatı(1)
$0.0003488
$0.0003488$0.0003488
-0.76%
USD
Ucan fix life in1day (1) Canlı Fiyat Grafiği
Sorumluluk Reddi: Bu sitede yeniden yayınlanan makaleler, halka açık platformlardan alınmıştır ve yalnızca bilgilendirme amaçlıdır. MEXC'nin görüşlerini yansıtmayabilir. Tüm hakları telif sahiplerine aittir. Herhangi bir içeriğin üçüncü taraf haklarını ihlal ettiğini düşünüyorsanız, kaldırılması için lütfen [email protected] ile iletişime geçin. MEXC, içeriğin doğruluğu, eksiksizliği veya güncelliği konusunda hiçbir garanti vermez ve sağlanan bilgilere dayalı olarak alınan herhangi bir eylemden sorumlu değildir. İçerik, finansal, yasal veya diğer profesyonel tavsiye niteliğinde değildir ve MEXC tarafından bir tavsiye veya onay olarak değerlendirilmemelidir.

Ayrıca Şunları da Beğenebilirsiniz

Franklin Templeton CEO Dismisses 50bps Rate Cut Ahead FOMC

Franklin Templeton CEO Dismisses 50bps Rate Cut Ahead FOMC

The post Franklin Templeton CEO Dismisses 50bps Rate Cut Ahead FOMC appeared on BitcoinEthereumNews.com. Franklin Templeton CEO Jenny Johnson has weighed in on whether the Federal Reserve should make a 25 basis points (bps) Fed rate cut or 50 bps cut. This comes ahead of the Fed decision today at today’s FOMC meeting, with the market pricing in a 25 bps cut. Bitcoin and the broader crypto market are currently trading flat ahead of the rate cut decision. Franklin Templeton CEO Weighs In On Potential FOMC Decision In a CNBC interview, Jenny Johnson said that she expects the Fed to make a 25 bps cut today instead of a 50 bps cut. She acknowledged the jobs data, which suggested that the labor market is weakening. However, she noted that this data is backward-looking, indicating that it doesn’t show the current state of the economy. She alluded to the wage growth, which she remarked is an indication of a robust labor market. She added that retail sales are up and that consumers are still spending, despite inflation being sticky at 3%, which makes a case for why the FOMC should opt against a 50-basis-point Fed rate cut. In line with this, the Franklin Templeton CEO said that she would go with a 25 bps rate cut if she were Jerome Powell. She remarked that the Fed still has the October and December FOMC meetings to make further cuts if the incoming data warrants it. Johnson also asserted that the data show a robust economy. However, she noted that there can’t be an argument for no Fed rate cut since Powell already signaled at Jackson Hole that they were likely to lower interest rates at this meeting due to concerns over a weakening labor market. Notably, her comment comes as experts argue for both sides on why the Fed should make a 25 bps cut or…
Paylaş
BitcoinEthereumNews2025/09/18 00:36
U.K. judge allows lawsuit over alleged $172M bitcoin theft between spouses

U.K. judge allows lawsuit over alleged $172M bitcoin theft between spouses

The post U.K. judge allows lawsuit over alleged $172M bitcoin theft between spouses appeared on BitcoinEthereumNews.com. A U.K. High Court judge allowed a lawsuit
Paylaş
BitcoinEthereumNews2026/03/17 08:24
US Stablecoin Yield Ban May See Others Step Up: Ledger Exec

US Stablecoin Yield Ban May See Others Step Up: Ledger Exec

The post US Stablecoin Yield Ban May See Others Step Up: Ledger Exec appeared on BitcoinEthereumNews.com. A block on stablecoin yield payments in the US will likely
Paylaş
BitcoinEthereumNews2026/03/17 08:28