Fluid, natural voice translation with Gemini 3.5 Live Translate
By Jakub Antkiewicz
•2026-06-10T11:39:25Z
Google Launches Continuous Speech-to-Speech Translation
Google has released Gemini 3.5 Live Translate, a new audio model designed to provide near real-time, speech-to-speech translation. The system's primary distinction is its ability to generate translated speech continuously, staying only a few seconds behind the live speaker. This approach is a departure from conventional turn-by-turn systems that require a speaker to pause before translation begins, aiming instead for more fluid and natural multilingual conversations by preserving the original speaker’s intonation, pacing, and pitch.
Technical Details and Phased Rollout
The Gemini 3.5 Live Translate model is being deployed across Google’s product ecosystem, targeting developers, enterprises, and consumers simultaneously. For developers, it is available in public preview via the Gemini Live API and Google AI Studio. Enterprise customers can access it in a private preview within Google Meet, which will expand language support from five to over 70. The model is also being integrated into the public Google Translate app on Android and iOS. Key partners, including Grab and CJ ENM, have tested the model, citing its accuracy and low latency. The generated audio is watermarked using SynthID to ensure AI-generated content remains detectable.
- Supported Languages: 70+ with automatic detection
- Core Functionality: Continuous speech generation, preserving speaker tone
- Availability: Gemini Live API, Google Meet (preview), Google Translate app
- Key Integrations: API support through platforms like Agora, Fishjam, LiveKit, and Pipecat
Ecosystem Impact and Competitive Landscape
By releasing the model as an API, Google is positioning Gemini 3.5 Live Translate as a foundational technology for the broader communications industry. This strategy allows third-party developers and platforms to build sophisticated, real-time multilingual capabilities directly into their own applications. The early adoption by real-time media platforms like LiveKit and Agora indicates a clear push to establish this technology as a core utility for any service involving live voice communication, from business meetings to live broadcasts and customer support. This move intensifies competition in the communication-platform-as-a-service (CPaaS) market by offering a powerful, easily integrated AI feature.
Google's release of the Gemini Live API is less about enhancing its own translation apps and more a strategic move to commoditize real-time audio translation as a core developer utility, aiming to embed its AI infrastructure across the entire communication technology stack.