Introducing Mellum2: A 12B Mixture-of-Experts Model by JetBrains
By Jakub Antkiewicz
•2026-06-02T12:04:19Z
JetBrains Releases Mellum2 for Efficient AI Workloads
JetBrains has released Mellum2, an open-source 12-billion parameter model designed for high-throughput text and code tasks. The model’s introduction addresses a growing industry need for efficient, low-latency components within larger AI systems. By leveraging a Mixture-of-Experts architecture, Mellum2 provides substantial computational capacity while maintaining the performance required for real-time applications like routing, retrieval-augmented generation (RAG), and agentic sub-tasks.
The model, released under the permissive Apache 2.0 license, is built to be both powerful and economical. Its key distinction is activating only 2.5 billion of its 12 billion parameters for any given token, a design choice that enables more than a 2x inference speedup compared to similarly sized dense models. This focus on performance makes it a practical option for developers building complex AI workflows.
- Architecture: Mixture-of-Experts (MoE)
- Total Parameters: 12B
- Active Parameters: 2.5B per token
- Modality: Text and Code
- License: Apache 2.0
The release of Mellum2 signals a broader architectural trend in production AI systems: a move away from relying on a single, monolithic model. Instead, developers are assembling systems from multiple specialized components. JetBrains positions Mellum2 as a 'focal' model for these stacks, capable of handling high-frequency operations like prompt classification, tool selection, and context preparation. Its efficiency and open license also make it well-suited for private, self-hosted deployments where data security and proprietary code are primary concerns.
JetBrains' release of Mellum2 underscores the industry's shift towards operational efficiency, where smaller, specialized open-source models are becoming critical components for building cost-effective and performant multi-agent systems, moving away from a reliance on single, monolithic APIs.