AiPhreaks ← Back to News Feed

Develop Physical AI Reasoning, World, and Action Models with NVIDIA Cosmos 3

By Jakub Antkiewicz

2026-06-01T13:36:24Z

NVIDIA Open-Sources Cosmos 3 to Build Foundation for Physical AI

NVIDIA has released and open-sourced Cosmos 3, a foundation model designed to enable physical AI systems like robots and autonomous vehicles to understand and interact with the real world. The model integrates physical reasoning, world generation, and action generation into a single architecture, a move intended to streamline development for complex embodied AI tasks. By making the models, training scripts, and datasets publicly available, NVIDIA aims to create a more open and reproducible environment for researchers and developers in the field.

Unified Architecture and Open Ecosystem

At the core of Cosmos 3 is a Mixture-of-Transformers (MoT) architecture that unifies previously separate functions into two towers: a vision-language "Reasoner" for interpreting multimodal inputs and a diffusion-based "Generator" for creating physics-aware video and action sequences. The release includes two model sizes: a 16B parameter Cosmos 3 Nano for efficient inference on workstation GPUs like the RTX PRO 6000, and a 64B Cosmos 3 Super for high-quality data generation on data center hardware like Hopper and Blackwell. This unified model simplifies workflows by removing the need to orchestrate multiple inference pipelines.

The complete open-source release includes several key components:

  • Cosmos 3 Nano and Super model checkpoints available on Hugging Face.
  • Six new synthetic datasets for robotics, autonomous driving, and warehouse operations.
  • Open post-training scripts on GitHub for adapting the model to custom domains.
  • NVIDIA NIM microservices for optimized production deployment.

Shaping the Market with Tools and Benchmarks

By providing a comprehensive set of tools—from models to deployment via NIM microservices—NVIDIA is positioning Cosmos 3 as a foundational layer for the burgeoning physical AI market. The inclusion of open training recipes allows organizations to adapt the model for specific uses, such as policy learning in robotics or generating rare edge-case scenarios for autonomous driving. Furthermore, NVIDIA is introducing new evaluation frameworks like the Cosmos Human Evaluation (HUE) benchmark, indicating an effort to not only provide the tools but also to influence the standards by which advanced generative AI systems are measured.

By open-sourcing the entire stack for Cosmos 3—from models and datasets to training recipes and deployment microservices—NVIDIA is not just releasing a tool, but strategically cultivating a developer ecosystem for physical AI that is deeply integrated with its GPU hardware, from workstations to data centers.
End of Transmission
Scan All Nodes Access Archive