NVIDIA releases Nemotron 3 Super, a 120B parameter open model delivering 5x higher throughput for agentic AI with a 1M-token context window. (Read More)NVIDIA releases Nemotron 3 Super, a 120B parameter open model delivering 5x higher throughput for agentic AI with a 1M-token context window. (Read More)

NVIDIA Drops Nemotron 3 Super With 5x Throughput Gains for AI Agents

2026/03/12 06:44
3 min read
For feedback or concerns regarding this content, please contact us at [email protected]

NVIDIA Drops Nemotron 3 Super With 5x Throughput Gains for AI Agents

Felix Pinkston Mar 11, 2026 22:44

NVIDIA releases Nemotron 3 Super, a 120B parameter open model delivering 5x higher throughput for agentic AI with a 1M-token context window.

NVIDIA Drops Nemotron 3 Super With 5x Throughput Gains for AI Agents

NVIDIA launched Nemotron 3 Super on March 11, 2026, a 120-billion-parameter open model that delivers 5x higher throughput than its predecessor while targeting the computational bottlenecks that have plagued multi-agent AI systems.

The model activates only 12 billion of its 120 billion parameters per inference call. This sparse activation pattern, powered by a hybrid Mamba-Transformer Mixture-of-Experts architecture, slashes the compute requirements that typically make large reasoning models impractical for continuous operation.

Why Multi-Agent AI Has Been Stuck

Multi-agent systems generate up to 15x the tokens of standard chat applications. Every turn requires re-sending conversation history, tool outputs, and reasoning steps. NVIDIA calls this the "context explosion" problem—and it causes agents to gradually drift from their original objectives over extended tasks.

The second constraint? The "thinking tax." Running massive reasoning models for every subtask makes multi-agent applications too expensive and slow for production deployment.

Nemotron 3 Super attacks both problems simultaneously. Its native 1-million-token context window gives agents persistent memory across long workflows. The hybrid architecture keeps latency low enough for concurrent agent deployment at scale.

Technical Architecture Worth Noting

The model introduces several architectural innovations that separate it from standard transformer designs:

Latent MoE compresses token embeddings before routing to experts, enabling the model to consult 4x as many specialists for identical computational cost. This granularity matters when a single conversation spans tool calls, code generation, and data analysis within a few turns.

Multi-token prediction forecasts several future tokens in one forward pass. Beyond training benefits, this enables built-in speculative decoding—up to 3x wall-clock speedups for structured generation tasks like code without requiring a separate draft model.

Native NVFP4 pretraining runs the majority of operations in 4-bit precision from the first gradient update. The model learns accuracy within these constraints rather than suffering post-training quantization losses. NVIDIA claims 4x inference speedup on B200 GPUs compared to FP8 on H100.

Benchmark Performance

On PinchBench—a benchmark measuring LLM performance as the "brain" of autonomous agents—Nemotron 3 Super scores 85.6% across the full test suite. NVIDIA claims this makes it the best open model in its class for agentic applications.

The model was post-trained with reinforcement learning across 21 environment configurations using NeMo Gym, generating over 1.2 million environment rollouts during training. This trajectory-based approach targets reliable behavior under multi-step workflows rather than satisfying single-turn responses.

Open Everything

NVIDIA released the complete package: weights on Hugging Face, 10 trillion curated pretraining tokens, 40 million post-training samples, and full training recipes. The NVIDIA Nemotron Open Model License allows enterprise deployment anywhere.

Deployment cookbooks cover vLLM, SGLang, and TensorRT LLM. The model runs through Perplexity Pro, OpenRouter, and build.nvidia.com, with additional availability through Baseten, Cloudflare, DeepInfra, Fireworks AI, and Together AI.

NVIDIA positions Nemotron 3 Super alongside Nemotron 3 Nano (released December 2025) for tiered deployment—Nano handles targeted individual steps while Super manages complex multi-step planning. The upcoming Nemotron 3 Ultra will complete the family for expert-level tasks.

Image source: Shutterstock
  • nvidia
  • nemotron
  • ai agents
  • open source ai
  • machine learning
Disclaimer: The articles reposted on this site are sourced from public platforms and are provided for informational purposes only. They do not necessarily reflect the views of MEXC. All rights remain with the original authors. If you believe any content infringes on third-party rights, please contact [email protected] for removal. MEXC makes no guarantees regarding the accuracy, completeness, or timeliness of the content and is not responsible for any actions taken based on the information provided. The content does not constitute financial, legal, or other professional advice, nor should it be considered a recommendation or endorsement by MEXC.
Tags:

You May Also Like

[Finterest] How do you start saving with Pag-IBIG’s MP2 program?

[Finterest] How do you start saving with Pag-IBIG’s MP2 program?

MP2 may be right for you if you have a conservative risk appetite and an investment horizon of at least 5 years
Share
Rappler2026/03/12 13:05
XRP steadies near $1.38 as Bollinger squeeze hints at breakout before CPI

XRP steadies near $1.38 as Bollinger squeeze hints at breakout before CPI

Markets Share Share this article
Copy linkX (Twitter)LinkedInFacebookEmail
XRP steadies near $1.38 as Bollinger squeeze
Share
Coindesk2026/03/12 13:15
Google's AP2 protocol has been released. Does encrypted AI still have a chance?

Google's AP2 protocol has been released. Does encrypted AI still have a chance?

Following the MCP and A2A protocols, the AI Agent market has seen another blockbuster arrival: the Agent Payments Protocol (AP2), developed by Google. This will clearly further enhance AI Agents' autonomous multi-tasking capabilities, but the unfortunate reality is that it has little to do with web3AI. Let's take a closer look: What problem does AP2 solve? Simply put, the MCP protocol is like a universal hook, enabling AI agents to connect to various external tools and data sources; A2A is a team collaboration communication protocol that allows multiple AI agents to cooperate with each other to complete complex tasks; AP2 completes the last piece of the puzzle - payment capability. In other words, MCP opens up connectivity, A2A promotes collaboration efficiency, and AP2 achieves value exchange. The arrival of AP2 truly injects "soul" into the autonomous collaboration and task execution of Multi-Agents. Imagine AI Agents connecting Qunar, Meituan, and Didi to complete the booking of flights, hotels, and car rentals, but then getting stuck at the point of "self-payment." What's the point of all that multitasking? So, remember this: AP2 is an extension of MCP+A2A, solving the last mile problem of AI Agent automated execution. What are the technical highlights of AP2? The core innovation of AP2 is the Mandates mechanism, which is divided into real-time authorization mode and delegated authorization mode. Real-time authorization is easy to understand. The AI Agent finds the product and shows it to you. The operation can only be performed after the user signs. Delegated authorization requires the user to set rules in advance, such as only buying the iPhone 17 when the price drops to 5,000. The AI Agent monitors the trigger conditions and executes automatically. The implementation logic is cryptographically signed using Verifiable Credentials (VCs). Users can set complex commission conditions, including price ranges, time limits, and payment method priorities, forming a tamper-proof digital contract. Once signed, the AI Agent executes according to the conditions, with VCs ensuring auditability and security at every step. Of particular note is the "A2A x402" extension, a technical component developed by Google specifically for crypto payments, developed in collaboration with Coinbase and the Ethereum Foundation. This extension enables AI Agents to seamlessly process stablecoins, ETH, and other blockchain assets, supporting native payment scenarios within the Web3 ecosystem. What kind of imagination space can AP2 bring? After analyzing the technical principles, do you think that's it? Yes, in fact, the AP2 is boring when it is disassembled alone. Its real charm lies in connecting and opening up the "MCP+A2A+AP2" technology stack, completely opening up the complete link of AI Agent's autonomous analysis+execution+payment. From now on, AI Agents can open up many application scenarios. For example, AI Agents for stock investment and financial management can help us monitor the market 24/7 and conduct independent transactions. Enterprise procurement AI Agents can automatically replenish and renew without human intervention. AP2's complementary payment capabilities will further expand the penetration of the Agent-to-Agent economy into more scenarios. Google obviously understands that after the technical framework is established, the ecological implementation must be relied upon, so it has brought in more than 60 partners to develop it, almost covering the entire payment and business ecosystem. Interestingly, it also involves major Crypto players such as Ethereum, Coinbase, MetaMask, and Sui. Combined with the current trend of currency and stock integration, the imagination space has been doubled. Is web3 AI really dead? Not entirely. Google's AP2 looks complete, but it only achieves technical compatibility with Crypto payments. It can only be regarded as an extension of the traditional authorization framework and belongs to the category of automated execution. There is a "paradigm" difference between it and the autonomous asset management pursued by pure Crypto native solutions. The Crypto-native solutions under exploration are taking the "decentralized custody + on-chain verification" route, including AI Agent autonomous asset management, AI Agent autonomous transactions (DeFAI), AI Agent digital identity and on-chain reputation system (ERC-8004...), AI Agent on-chain governance DAO framework, AI Agent NPC and digital avatars, and many other interesting and fun directions. Ultimately, once users get used to AI Agent payments in traditional fields, their acceptance of AI Agents autonomously owning digital assets will also increase. And for those scenarios that AP2 cannot reach, such as anonymous transactions, censorship-resistant payments, and decentralized asset management, there will always be a time for crypto-native solutions to show their strength? The two are more likely to be complementary rather than competitive, but to be honest, the key technological advancements behind AI Agents currently all come from web2AI, and web3AI still needs to keep up the good work!
Share
PANews2025/09/18 07:00