What specific performance optimizations did NVIDIA use to minimize the overhead of Confidential Computing?

To mitigate performance impacts from encryption and secure work submission, NVIDIA and its partners implemented several software optimizations. These include a CC-safe autotuner in FlashInfer that uses the GPU global timer for accurate kernel selection, an asynchronous device-to-host copy worker in SGLang to maintain compute/copy overlap, and piecewise CUDA graph support in SGLang to reduce kernel launch overhead, which is typically amplified in secure environments.

Hardware-Rooted AI Security That Won’t Slow You Down

NVIDIA Blackwell Delivers Secure AI Inference with Minimal Performance Impact

NVIDIA has released benchmark results for its Confidential Computing (CC) feature on new Blackwell GPUs, demonstrating that its hardware-rooted security incurs a performance overhead of less than 8% for large model inference. This data directly addresses a significant barrier to enterprise AI adoption, particularly in regulated industries, by showing that protecting sensitive data and proprietary models during active use does not require a substantial compromise on performance.

The Confidential Computing solution provides a security layer extending from the silicon to the system software. At its core, the technology relies on a hardware root of trust, where a private signing key is fused into the GPU during manufacturing. Before a workload runs, the NVIDIA Remote Attestation Service (NRAS) verifies the integrity of the compute environment. Performance tests on an HGX B300 system running the Qwen 3.5 397B model confirmed the low overhead across various batch sizes and sequence lengths. Key technical elements include:

Hardware Root of Trust: Private signing key fused into Blackwell GPUs at manufacturing.
Attestation: The NRAS remotely verifies the GPU and CPU Trusted Execution Environment (TEE) before secrets are deployed.
Performance Optimizations: Software improvements in frameworks like FlashInfer and SGLang mitigate latency from secure work submission and encrypted memory transfers.
Multi-GPU Security: NVLink encryption is supported for secure multi-GPU configurations of up to eight GPUs.

By quantifiably proving a minimal performance trade-off, NVIDIA is positioning its Blackwell architecture as a practical platform for production AI in sectors like finance, healthcare, and government. These industries often face strict data privacy and sovereignty mandates, such as GDPR and HIPAA. The ability to secure model weights and user data while in-use could accelerate the deployment of generative AI for sensitive applications, giving organizations the confidence to process confidential information without exposing it to the host system or software stack.

Strategic Takeaway: NVIDIA's benchmarks for Confidential Computing on Blackwell are a direct challenge to the assumption that robust, hardware-rooted security must come at a steep performance cost. By demonstrating a sub-10% overhead, the company is effectively removing a key objection for enterprise AI adoption in regulated industries, positioning its platform as a practical solution for production-grade, secure inference.

>> Verify Original Transmission at NVIDIA