AiPhreaks ← Back to News Feed

Granite 4.1 LLMs: How They’re Built

By Jakub Antkiewicz

2026-04-30T10:07:05Z

IBM Releases Open-Source Granite 4.1 LLMs with 15T Token Training Regimen

IBM has released its Granite 4.1 family of dense, decoder-only large language models, making the 3B, 8B, and 30B parameter models available under the permissive Apache 2.0 license. The release is significant not just for its open nature, but for the detailed methodology IBM disclosed, which includes a five-phase pre-training pipeline using approximately 15 trillion tokens. Notably, the 8B instruct model reportedly achieves performance on par with or exceeding the company's previous 32B Mixture-of-Experts (MoE) model, signaling a strong focus on training efficiency and data quality over raw parameter scale.

A Five-Phase Approach to Pre-Training and Long Context

The foundation of Granite 4.1 is a meticulously structured, five-stage pre-training process that progressively refines the data mixture. The initial phases focus on broad language understanding using web-scale data, but later stages pivot to higher-quality, domain-specific sources, including curated math, code, and synthetic instruction data. The final phase extends the context window up to 512K tokens for the 8B and 30B models. The models all share a common architectural foundation, differing only in scale.

  • Architecture: Decoder-only dense transformer
  • Attention: Grouped Query Attention (GQA)
  • Activations: SwiGLU
  • Embeddings: Rotary Position Embeddings (RoPE) and shared input/output embeddings
  • Normalization: RMSNorm

Post-Training Refinement Through LLM-as-Judge and Sequential RL

Following pre-training, the models undergo a rigorous supervised fine-tuning (SFT) stage on ~4.1 million samples curated by an LLM-as-Judge framework designed to enforce high standards for correctness and instruction following. IBM then applies a four-stage reinforcement learning (RL) pipeline using on-policy GRPO with DAPO loss. This sequential process systematically enhances capabilities in distinct domains—starting with a multi-domain mixture before targeting chat (RLHF), self-identification, and finally recovering mathematical reasoning, a capability often diminished during chat-focused tuning. This structured approach aims to maximize performance across multiple domains while minimizing the catastrophic forgetting often seen with monolithic RL training.

IBM's detailed disclosure of its five-stage pre-training and sequential reinforcement learning pipeline for Granite 4.1 signals a strategic shift in the open-source community, where data quality curation and methodical, multi-phase skill enhancement are becoming more critical competitive differentiators than raw parameter count.
End of Transmission
Scan All Nodes Access Archive