How does DeepL's new voice translation technology work?

Currently, the system uses a three-stage process: it first converts speech to text, then applies its established translation engine to the text, and finally converts the translated text back into speech. The company aims to develop a direct, end-to-end voice translation model in the future.

DeepL, known for text translation, now wants to translate your voice

DeepL, a company widely recognized for its high-quality text translation services, has officially entered the real-time voice translation market. The company today released a new suite of voice-to-voice tools targeting a range of applications from enterprise meetings to mobile conversations. This move signals DeepL's strategic expansion beyond its core text-based offerings into the complex and competitive arena of real-time speech AI, leveraging its established brand in translation accuracy.

According to CEO Jarek Kutylowski, the primary technical hurdle was balancing low latency with translation accuracy. The initial product suite reflects a multi-pronged approach to market entry, supported by a new API that allows third-party developers to integrate DeepL's voice technology into custom applications like call centers. The company also confirmed its technology stack currently operates on a speech-to-text, translation, and text-to-speech pipeline, with plans to develop a more efficient, end-to-end voice translation model in the future.

DeepL's Voice Product Suite

Meeting Add-ons: Early access add-ons for platforms like Zoom and Microsoft Teams provide real-time audio translation or live translated captions.
Mobile & Web: A dedicated tool for in-person or remote conversations on mobile and web platforms.
Group Conversations: A feature for settings like workshops or training, allowing participants to join a translated conversation via a QR code.
Custom Vocabulary: The system can be trained to recognize and adapt to industry-specific terminology and proper names.

DeepL is not entering an empty field. The voice AI space includes well-funded competitors targeting specific niches. For instance, Sanas focuses on real-time accent modification for call centers, while Camb.AI provides AI-powered dubbing for the media industry. More direct competition comes from companies like Palabra, which is developing a translation engine designed to preserve the speaker’s original voice. DeepL's entry intensifies the competition, pitting its deep experience in text translation against specialists focused purely on novel voice synthesis and modification techniques.

By launching a voice suite built upon its existing text translation core, DeepL is executing a classic strategy of leveraging a mature, trusted product to enter an adjacent high-growth market. This pragmatic, multi-step approach—shipping a speech-to-text-to-speech product now while developing a direct end-to-end model for the future—allows the company to capture market share immediately without waiting for a research breakthrough.

>> Verify Original Transmission at TechCrunch AI