Google’s Zero-Shot Cross-Lingual Voice Transfer for Dysarthric Speakers
In recent years, Voice Transfer (VT) technology has made notable strides, particularly in applications such as Text-to-Speech (TTS), Voice Conversion (VC), and Speech-to-Speech Translation. However, achieving high-quality zero-shot or one-shot voice transfer, especially for unseen speakers, remains a significant challenge.
In a new paper Zero-shot Cross-lingual Voice Transfer for TTS, a Google research team presents a new VT module that seamlessly integrates into a multilingual TTS system, enabling voice transfer across languages.
The team summarizes their main contributions as follows:
- The team presents a zero-shot VT module that can easily be incorporated into advanced TTS systems. This module enables voice transfer from a previously unseen speaker using just a short reference speech sample, while maintaining high quality and fidelity.
- The VT module allows voice transfer even when the language of the input speech sample differs from the target language, showcasing its cross-lingual capabilities.
- Novel bottleneck layers are proposed, which significantly…