CCCL Runtime: A Modern C++ Runtime for CUDA
By Jakub Antkiewicz
•2026-06-23T11:16:45Z
NVIDIA Introduces CCCL Runtime for Modern CUDA C++ Development
NVIDIA has unveiled the CUDA Core Compute Libraries (CCCL) Runtime, a new collection of C++ APIs designed to modernize the development experience on its CUDA platform. The new runtime offers a set of idiomatic C++ abstractions for core functionalities like stream management, memory allocation, and kernel launches. This initiative addresses the growing complexity of GPU-accelerated applications, where multiple libraries often compete for shared resources, demanding safer and more explicit programming models to prevent common runtime errors.
The CCCL Runtime is engineered to be an alternative to the traditional C-style CUDA runtime, shifting from implicit state management to explicit dependencies. For instance, instead of associating a stream with a 'current' device, the new API requires developers to specify the device upon stream creation, making code easier to reason about locally. This design philosophy is evident across the new API surface, which emphasizes compile-time safety and clear resource handling. Key technical enhancements include:
- Strong Typing: Dedicated types like
cuda::device_refreplace raw integer IDs to help catch errors during compilation. - Explicit Dependencies: Objects like streams are explicitly constructed with their dependencies (e.g., the device they run on), removing ambiguity from global states.
- Asynchronous by Default: Stream-ordered operations are the standard convention, promoting better performance through asynchronous execution and memory management via pools.
- Clear Resource Ownership: The API distinguishes between owning types (e.g.,
cuda::stream) and non-owning reference types (cuda::stream_ref), simplifying lifetime management and improving interoperability with existing code.
This release represents a significant effort by NVIDIA to enhance developer productivity within its ecosystem. By aligning the CUDA programming model with modern C++ practices, the CCCL Runtime aims to lower the barrier for writing robust, maintainable, and composable GPU code. For the broader AI market, this facilitates the construction of more complex software stacks, as libraries can interact more predictably without interfering with one another's implicit GPU states. This focus on the developer toolchain is a crucial component of maintaining the CUDA platform's dominance as the foundation for high-performance computing.
With the CCCL Runtime, NVIDIA is investing heavily in the developer experience to lower the friction of building complex GPU software, reinforcing the CUDA ecosystem's long-term stability and making it more difficult for competitors to lure away developers with promises of simpler programming models.