Google AI ‘Translatotron’ Can Make Anyone a Real-Time Polyglot

Synced
Synced
May 16, 2019 · 3 min read
Image for post
Image for post

Google AI yesterday released its latest research result in speech-to-speech translation, the futuristic-sounding “Translatotron.” Billed as the world’s first end-to-end speech-to-speech translation model, Translatotron promises the potential for real-time cross-linguistic conversations with low latency and high accuracy.

Humans have always dreamed of a voice-based device that could enable them to simply leap over language barriers. While advances in deep learning have contributed to highly improved accuracy in speech recognition and machine translation, smooth conversations between different language speakers remained hampered by unnatural pauses during machine processing.

Google’s wireless headphone Pixel Bud released in 2017 boasted real-time speech translation, but users found the practical experience less then satisfying. Delivering an English-language prompt such as “Help me speak Russian” would connect the earbud to the Google Translate app on the user’s smartphone. The app would then convert the user’s English speech into English text, translate that to Russian text, then read the content aloud in Russian. The steps in the speech-text-text-speech transfer however caused a few seconds of latency, and Google strove to speed that up.

In 2017, Google researchers introduced a deep neural network architecture that could directly translate speech in one language into text in another. Their experiments showed the end-to-end approach outperformed previous cascade models combining speech recognition and machine translation models in Spanish-English speech translation tasks. The research laid the foundation for Google Assistant Interpreter Mode introduced earlier this year, which translates a users’ speech into target language text on a Google smart display.

Google took another leap forward today with Translatotron. The new model comprises an attention-based sequence-to-sequence network trained on voice spectrograms which generates spectrograms of the target-language translation; a neural vocoder that converts output spectrograms to time-domain waveforms; and a pretrained speaker encoder to preserve a user’s vocal characteristics. Voice transcripts are still needed during training, but not for the inferencing.

Translatotron demonstrated an impressive translation accuracy in Spanish-to-English tasks. However, the model did not defeat the baseline ST (speech-to-text) → TTS (text-to-speech) cascade model in experiments, remaining 6 BLEU points below the baseline in Conversational Spanish-to-English dataset and 9.3 BLEU points shy on the Fisher Spanish-English dataset (target speech synthesized by Parallel WaveNet in a female English speaker’s voice).

In two additional speech quality tasks, Translatotron using WaveRNN vocoders scored over 4.0 — a “very good range” — in the evaluation of speech naturalness, and managed to preserve speakers’ vocal characteristics in cross-language voice transfer tasks, although not as well as conventional TTS models.

Researchers concluded that further work will be required to improve the Translatotron model, but believe their experiments open up new possibilities for faster and more efficient Google Translate applications.

The paper Direct speech-to-speech translation with a sequence-to-sequence model is on arXiv.

Journalist: Tony Peng | Editor: Michael Sarazen

2018 Fortune Global 500 Public Company AI Adaptivity Report is out!
Purchase a Kindle-formatted report on Amazon.
Apply for Insight Partner Program to get a complimentary full PDF report.

Image for post
Image for post

Follow us on Twitter @Synced_Global for daily AI news!

We know you don’t want to miss any stories. Subscribe to our popular Synced Global AI Weekly to get weekly AI updates.

Image for post
Image for post

We produce professional, authoritative, and…

Synced

Written by

Synced

SyncedReview
Synced

Written by

AI Technology & Industry Review — syncedreview.com | Newsletter: http://bit.ly/2IYL6Y2 | Share My Research http://bit.ly/2TrUPMI | Twitter: @Synced_Global

SyncedReview

We produce professional, authoritative, and thought-provoking content relating to artificial intelligence, machine intelligence, emerging technologies and industrial insights.

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store