What makes the MiniMax M2.7 model efficient despite its large 230B parameter count?

MiniMax M2.7 uses a sparse Mixture-of-Experts (MoE) architecture. This design means that for any given input, only a small fraction of the model's total parameters—specifically 8 out of 256 'experts' totaling 10B active parameters—are used. This approach significantly reduces computational cost during inference compared to a dense model of similar size, while still retaining the knowledge capacity of the full 230B parameters.

MiniMax M2.7 Advances Scalable Agentic Workflows on NVIDIA Platforms for Complex AI Applications

NVIDIA and MiniMax AI have announced the open-weights release of MiniMax M2.7, an enhanced model aimed at developers building complex agentic systems. The model is now available across NVIDIA's platform and the broader open-source ecosystem. This release is significant as it provides a powerful, yet computationally efficient, tool for demanding applications in software engineering, machine learning research, and automated reasoning tasks, backed by a major hardware provider's full software stack.

Technically, MiniMax M2.7 is a 230 billion-parameter sparse Mixture-of-Experts (MoE) model that keeps inference costs manageable by activating only 10 billion parameters per token. It features a 200K token input context length and a design that activates 8 of its 256 available experts for any given input. To maximize performance, NVIDIA integrated specialized kernels into the vLLM and SGLang inference frameworks, including an optimized QK RMS Norm kernel and FP8 MoE support. These integrations have yielded throughput improvements of up to 2.7x on NVIDIA Blackwell Ultra GPUs in internal tests.

The model's integration into the NVIDIA ecosystem provides developers with multiple pathways for adoption. For building autonomous agents, the open-source NVIDIA NemoClaw stack offers a secure runtime environment. For production deployment, MiniMax M2.7 is available as an NVIDIA NIM microservice, and for customization, the NVIDIA NeMo Framework provides recipes for fine-tuning and reinforcement learning. This end-to-end support system lowers the barrier for developers to both experiment with and deploy sophisticated AI agents at scale.

NVIDIA's extensive, full-stack support for the MiniMax M2.7 release highlights a critical industry trend: hardware and platform providers are no longer just enabling AI, but actively curating and optimizing the open-source ecosystem to drive adoption of their specific technologies.