Is NVIDIA Fleet Intelligence a paid service and how is it deployed?

No, NVIDIA Fleet Intelligence is offered at no cost to NVIDIA data center GPU owners, operators, and cloud tenants. It is an agent-based managed service; a low-footprint, open-source agent is installed on each GPU worker node, which then streams telemetry to the fully managed Fleet Intelligence cloud service.

Introducing NVIDIA Fleet Intelligence for Real-Time GPU Fleet Visibility and Optimization

NVIDIA Releases Fleet Intelligence for GPU Fleet Management

NVIDIA has announced the general availability of NVIDIA Fleet Intelligence, a managed service designed to provide real-time visibility and optimization for large-scale data center GPU fleets. The service, offered at no cost to NVIDIA data center GPU customers, aims to address the significant operational challenges of managing complex AI infrastructure, including hardware heterogeneity, power constraints, and identifying performance bottlenecks that can lead to wasted resources and missed service-level agreements (SLAs).

The service operates via a low-footprint, open-source agent installed on worker nodes, which streams telemetry data to a managed cloud service hosted on NVIDIA NGC. Fleet Intelligence focuses on monitoring five critical areas of GPU operations to ensure fleet health and efficiency. It also incorporates a cryptographic verification feature for GPU integrity, leveraging the NVIDIA Attestation SDK to confirm that firmware and configurations have not been tampered with. The service currently supports Vera Rubin, Blackwell, and Hopper GPU architectures.

Power: Tracks utilization and throttling to manage data center power budgets.
Temperature: Detects hotspots and potential airflow issues to prevent thermal throttling.
Performance: Monitors utilization, memory bandwidth, and interconnect health.
Health: Surfaces ECC errors, retired pages, and other signals to preempt hardware failures.
Uniform Configuration: Verifies driver, firmware, and BIOS consistency across the fleet.

By providing this tooling as a standard, no-cost service, NVIDIA is addressing a growing pain point for its largest customers and cloud partners like Lambda and IREN. This move not only helps customers maximize the return on their substantial hardware investments but also provides NVIDIA with anonymized operational data that can be used to develop future predictive failure models. The initiative signals a strategic push from simply supplying hardware to providing the foundational operational software required to run AI factories effectively, further cementing its role within the enterprise AI ecosystem.

With Fleet Intelligence, NVIDIA is moving beyond selling powerful chips to providing the essential operational software needed to manage them at scale, effectively lowering the total cost of ownership and solidifying its dominant position in the AI infrastructure stack.

>> Verify Original Transmission at NVIDIA