Introducing Nemotron 3 Super: An Open Hybrid Mamba-Transformer MoE for Agentic Reasoning
By Jakub Antkiewicz
•2026-03-12T08:42:09Z
NVIDIA today released Nemotron 3 Super, a 120-billion parameter open model engineered to address persistent efficiency and cost challenges in agentic AI systems. The model is specifically aimed at mitigating the 'thinking tax'—the high computational cost of reasoning—and 'context explosion,' a form of goal drift that occurs during long, complex tasks. With a native one-million-token context window and an architecture optimized for efficiency, Nemotron 3 Super targets demanding, multi-agent applications such as autonomous software development and cybersecurity analysis, which have often been impractical to deploy at scale.
The model’s design introduces several architectural innovations to balance performance with operational cost. It features a hybrid Mamba-Transformer backbone that uses Mamba layers for efficient long-sequence processing and Transformer layers for precise data recall. A key feature is 'Latent MoE,' which compresses data before routing it to expert sub-networks, allowing the model to consult four times as many specialists for the same inference cost. Additionally, the model uses multi-token prediction to speed up generation and was pretrained natively in NVIDIA's 4-bit NVFP4 format, a method that optimizes it for Blackwell-generation hardware and reduces its memory footprint from the outset rather than through post-training compression.
By releasing Nemotron 3 Super with open weights, datasets, and training recipes, NVIDIA enables developers to customize and run the model on their own infrastructure, supporting a more accessible development ecosystem. The company is also promoting a 'Super + Nano' deployment pattern, where the larger Super model handles complex planning while the smaller, previously released Nano model executes targeted sub-tasks. This tiered strategy presents a flexible framework for building sophisticated agentic systems, positioning open-source models as increasingly viable for workflows that might otherwise require expensive, proprietary APIs.
Nemotron 3 Super's release signals a strategic focus on solving the unit economics of agentic AI, coupling an open, hybrid architecture with native, low-precision pretraining that is tightly co-designed with next-generation hardware.