Introducing Storage Buckets on the Hugging Face Hub
By Jakub Antkiewicz
•2026-03-11T08:41:11Z
Hugging Face has launched Storage Buckets, a new object storage service integrated into its Hub platform. The feature addresses a long-standing challenge in machine learning operations: managing the large volume of intermediate files like model checkpoints, processed data shards, and logs. These artifacts, which are frequently modified and generated by distributed jobs, are often ill-suited for the version-controlled Git repositories traditionally used for final models and datasets. Storage Buckets provide a mutable, S3-like environment for this 'in-motion' data, aiming to streamline the development workflow within a single platform.
Underpinning the new service is Xet, Hugging Face's proprietary chunk-based storage backend. Unlike traditional file storage that treats files as monolithic blobs, Xet breaks content into smaller chunks and deduplicates them across all files in a bucket. This architecture is particularly efficient for ML workloads where successive files, such as model checkpoints, share significant amounts of unchanged data. The company states this reduces bandwidth usage, accelerates transfer speeds, and, for Enterprise customers, lowers costs by billing based on the deduplicated storage footprint. The service also includes a 'pre-warming' capability, allowing users to stage data closer to their compute resources on partners like AWS and GCP to improve data access throughput for large-scale training.
The introduction of Storage Buckets signals a strategic expansion for Hugging Face, moving its platform beyond a publishing destination to become a more comprehensive hub for the entire ML lifecycle. By offering a native solution for the messy, high-throughput storage needs of active development, the company is positioning itself to capture workflows that currently rely on separate cloud storage services. This creates a more cohesive path for developers, from raw data and iterative experimentation in Buckets to a polished, versioned artifact in a model or dataset repo. The feature was privately tested with launch partners including Jasper, Arcee, IBM, and PixAI, indicating enterprise-level interest in a more integrated MLOps toolchain.
By providing a native storage layer for the messy, intermediate stages of development, Hugging Face is moving to centralize the entire MLOps workflow on its platform, directly competing with generic cloud object storage for AI workloads.