What are the hardware requirements for running the Reachy Mini's local speech pipeline?

The requirements are flexible due to the modular architecture. Users on Apple Silicon can run the entire stack, including the LLM, using the MLX backend. For those with NVIDIA GPUs, the Transformers backend is a suitable option. For maximum performance or to run on less powerful machines, the LLM can be offloaded to a separate, more powerful server using llama.cpp or vLLM, which then communicates with the robot over the local network.

Reachy Mini goes fully local

Reachy Mini Gains Local-Only Voice Stack

Pollen Robotics has released an update for its Reachy Mini robot that allows its entire conversational AI pipeline to run completely offline. The new local stack eliminates the need to send audio data to external servers, providing users with full data privacy, zero API costs, and complete control over the AI models. This is powered by the company's open-source speech-to-speech library, which implements a cascaded pipeline and exposes a WebSocket API for the robot to connect to.

The technical foundation is a four-stage, modular voice pipeline that users can customize. While developers can swap components, the recommended default stack offers a strong balance of performance and quality on consumer hardware. The architecture also decouples the LLM from the primary voice loop, allowing the most computationally intensive part to run as a separate process via backends like llama.cpp, vLLM, or on-device with MLX for Apple Silicon and Transformers for CUDA and CPU.

The Recommended Local Stack

VAD (Voice Activity Detection): Silero VAD, a tiny and accurate model that runs on CPU.
STT (Speech-to-Text): Parakeet-TDT 0.6B v3, chosen for its speed and streaming-friendly nature.
LLM (Large Language Model): Gemma 4, served via llama.cpp with optimizations like flash attention.
TTS (Text-to-Speech): Qwen3-TTS, an expressive, low-latency, and multilingual model.

This move reflects a significant trend toward edge AI and on-device processing within the robotics and agent development communities. By providing a clear, open-source path to a fully local voice agent, Pollen Robotics is lowering the barrier for developers and hobbyists to experiment with sophisticated AI interactions without being locked into the ecosystems of large cloud providers like OpenAI or Hugging Face. The modularity of the speech-to-speech library ensures that as new, more efficient models are released, users can easily upgrade their robot's capabilities.

The maturation of open-source models for every stage of the speech pipeline—from VAD to TTS—marks a critical inflection point. It makes fully private, cost-effective, and highly customizable AI agents practical on consumer hardware, shifting the focus from API consumption to on-device model orchestration.

>> Verify Original Transmission at Hugging Face