Building Blocks for Foundation Model Training and Inference on AWS
By Jakub Antkiewicz
•2026-05-12T10:31:12Z
AWS Prepares for Next-Generation AI with NVIDIA Blackwell and UltraServers
Amazon Web Services (AWS) has detailed its infrastructure roadmap for foundation model development, centered on the new Amazon EC2 P6 instance family featuring NVIDIA's Blackwell architecture. The announcement underscores a systemic approach to scaling that moves beyond raw compute, focusing instead on tightly coupled hardware for the entire AI lifecycle—from pre-training to inference. This shift addresses the industry's growing confrontation with bottlenecks in memory bandwidth and interconnect speed, which increasingly dictate performance for frontier models.
Technical Specifications of the P6 Platform
The new P6 instances and related offerings introduce significant upgrades in compute, memory, and networking designed to minimize data movement overhead. More critically, AWS is introducing EC2 UltraServers, which extend the high-bandwidth NVLink fabric beyond a single machine. The P6e-GB200 UltraServer, built on the NVIDIA GB200 NVL72 platform, connects up to 72 Blackwell GPUs in one NVLink domain. This architecture is engineered to reduce inter-node communication latency, a primary constraint for complex models such as Mixture-of-Experts (MoE).
- P6 Instances: Feature NVIDIA B200 and B300 GPUs, with the p6-b300.48xlarge offering up to 2,100 GB of HBM3e memory per 8-GPU instance.
- Enhanced Interconnect: Incorporate 5th-generation NVLink with 14.4 TB/s of aggregate bandwidth and Elastic Fabric Adapter version 4 (EFAv4) for faster cross-node communication.
- GB200 UltraServers: Expose up to 72 Blackwell GPUs and 13.4 TB of aggregate HBM3e within a single, unified NVLink domain, reducing reliance on the EFA network for performance-critical collective operations.
Addressing System-Level Bottlenecks for the AI Lifecycle
This infrastructure refresh signals a mature understanding of AI workloads, where performance scaling now depends heavily on post-training and inference efficiency, not just pre-training FLOPS. By providing building blocks like UltraClusters and UltraServers, AWS is directly addressing the physical limits of data movement that govern model performance. This integrated approach, combining accelerated compute with high-throughput storage like Amazon FSx for Lustre and low-latency networking, provides a robust platform for developers building on open-source stacks like PyTorch and Kubernetes, solidifying AWS's position as a key enabler for organizations pushing the boundaries of model scale.
The AWS strategy is less about headline FLOPS and more about holistic system design. By engineering solutions like UltraServers that expand the high-bandwidth NVLink domain and pairing them with EFAv4 networking, AWS is tackling the primary scaling inhibitors in modern AI: data movement and communication latency. This focus on system-level efficiency is critical for supporting the next generation of large, communication-intensive models.