Google’s Zero-Shot Cross-Lingual Voice Transfer for Dysarthric Speakers

Synced
SyncedReview
Published in
3 min readSep 30, 2024

--

In recent years, Voice Transfer (VT) technology has made notable strides, particularly in applications such as Text-to-Speech (TTS), Voice Conversion (VC), and Speech-to-Speech Translation. However, achieving high-quality zero-shot or one-shot voice transfer, especially for unseen speakers, remains a significant challenge.

In a new paper Zero-shot Cross-lingual Voice Transfer for TTS, a Google research team presents a new VT module that seamlessly integrates into a multilingual TTS system, enabling voice transfer across languages.

The team summarizes their main contributions as follows:

  • The team presents a zero-shot VT module that can easily be incorporated into advanced TTS systems. This module enables voice transfer from a previously unseen speaker using just a short reference speech sample, while maintaining high quality and fidelity.
  • The VT module allows voice transfer even when the language of the input speech sample differs from the target language, showcasing its cross-lingual capabilities.
  • Novel bottleneck layers are proposed, which significantly…

--

--

Synced
SyncedReview

AI Technology & Industry Review — syncedreview.com | Newsletter: http://bit.ly/2IYL6Y2 | Share My Research http://bit.ly/2TrUPMI | Twitter: @Synced_Global