AiPhreaks ← Back to News Feed

NVIDIA Nemotron 3 Ultra Powers Faster, More Efficient Reasoning for Long-Running Agents

By Jakub Antkiewicz

2026-06-05T11:29:17Z

NVIDIA Targets Agentic AI with Open Nemotron 3 Ultra Model

NVIDIA has released Nemotron 3 Ultra, a 550B-parameter Mixture-of-Experts (MoE) open model specifically designed to orchestrate complex, long-running agentic AI workflows. The release addresses a growing industry need for models that can manage multi-step tasks with high reasoning capability without incurring prohibitive costs or performance bottlenecks. This is a critical challenge as developers shift focus from single-turn chatbots to more capable autonomous systems that must plan, use tools, and maintain context over extended periods.

Technical Specifications and Architectural Innovations

The model, which utilizes 55B active parameters per inference, introduces several architectural innovations aimed at balancing performance with efficiency. Key to its design is a hybrid Mamba-Transformer architecture for effective long-context processing and NVFP4 quantization, a new precision format that enables up to 5x higher throughput. This quantization allows a single model checkpoint to run across NVIDIA's Hopper, Blackwell, and Ampere GPU generations, simplifying deployment for developers. The model's efficiency is further demonstrated by its ability to lower the token-based cost for agentic tasks by up to 30% on select benchmarks.

  • Model Size: 550B total parameters (Mixture-of-Experts) with 55B active parameters.
  • Training Method: Features Multi-Teacher On-Policy Distillation (MOPD), where the model learns from over ten specialized teacher models for continuous domain-specific improvement.
  • Key Features: LatentMoE for efficient expert routing and Multi-Token Prediction (MTP) to accelerate generative speed in multi-turn conversations.
  • Data & Licensing: Released under the permissive OpenMDW-1.1 license with open weights, recipes, and a transparent data pipeline including 212B new domain-specific tokens.

Ecosystem Impact and Open Availability

By releasing Nemotron 3 Ultra with fully open weights, training recipes, and a significant portion of its data pipeline, NVIDIA is positioning itself as a key infrastructure provider for the open-source agentic AI ecosystem. This move provides enterprises with a transparent, adaptable foundation for building sovereign and domain-specific AI agents. The release is complemented by a suite of tools, including the secure NVIDIA OpenShell runtime and the NemoClaw deployment blueprint, which together aim to standardize the stack for developing and running autonomous agents more safely.

Strategic Takeaway: NVIDIA's release of Nemotron 3 Ultra is less about introducing another frontier model and more about providing a complete, open, and highly-optimized reference architecture for agentic AI. By bundling the model with secure runtimes (OpenShell), deployment frameworks (NemoClaw), and transparent training data, NVIDIA is aiming to make its hardware and software ecosystem the default choice for enterprises building complex, long-running autonomous systems, thereby reinforcing its full-stack dominance.
End of Transmission
Scan All Nodes Access Archive