Speak a Foreign Language in Your Own Voice? Microsoft’s VALL-E X Enables Zero-Shot Cross-Lingual Speech Synthesis
Published in
4 min readMar 13
--
robotic. The leveraging of deep neural networks in recent years has dramatically transformed TTS, enabling conditioning on factors such as stress and intonation to achieve higher quality and much more humanlike results. Contemporary TTS models however still perform best when dealing with a specific…