Welcome Gemma 4: Frontier multimodal intelligence on device
By Jakub Antkiewicz
•2026-04-08T09:00:22Z
Google DeepMind has released its Gemma 4 family of models, making them publicly available on Hugging Face under a permissive Apache 2.0 license. The release is significant as it provides a series of open, multimodal models designed to perform effectively on a wide spectrum of hardware, from powerful servers to on-device applications. The new family handles text, image, audio, and even video inputs, offering developers a versatile toolset that developers noted performed impressively well in pre-release testing without extensive fine-tuning.
The Gemma 4 series comes in four sizes, ranging from a 2.3 billion effective parameter model to a 31 billion dense model, including a 26 billion parameter mixture-of-experts (MoE) variant. Key architectural features are designed for efficiency and long-context performance, such as alternating sliding-window and global attention, and a Shared KV Cache that reuses key-value states across final layers to reduce memory and compute load. Smaller models also incorporate Per-Layer Embeddings (PLE), a technique that adds a parallel, low-dimensional conditioning signal to each layer, enabling more specialized processing at a modest parameter cost.
This release has broad implications for the AI ecosystem by providing a high-quality, open alternative for building complex multimodal applications. Immediate support across a wide range of inference engines and libraries—including transformers, llama.cpp, MLX, and WebGPU—lowers the barrier to adoption. By focusing on models that can run efficiently on local hardware, Google is directly addressing market demand for more private, responsive, and cost-effective AI solutions, potentially accelerating the development of sophisticated agentic systems that operate at the edge.
Google's strategy with Gemma 4 is to arm the open-source community with tools that rival closed models in multimodal capability while explicitly engineering for efficiency on consumer hardware. This dual focus on frontier performance and on-device accessibility is a direct effort to capture developer mindshare in the rapidly growing edge AI market.