IBM Watson Text to Speech: Neural Voices Generally Available
Enable better customer interaction and conversation with clear, crisp, more natural-sounding voice technology.
We are pleased to announce that IBM Watson Text to Speech (TTS) service has introduced a new set of voices based on the latest neural techniques and technologies that provide a more human-sounding synthesized speech. These new voices are now generally available in both our public cloud and private cloud offerings.
Previously, speech synthesis was based on copying short speech segments from a recorded voice data set and then concatenating them together.
Today, our speech synthesis is based on the latest voice technologies in three (3) deep neural networks (DNNs), which learn various aspects of speech during the training process.
At the time of speech synthesis, the DNNs predict the pitch and phoneme durations (prosody), spectral structure, and waveform of the speech, making the voice output crisper, clearer and much more natural-sounding.
The advantage of this modular approach is that it enables fast and easy training, as well as independent control of each component. Once the base networks are trained, they can then be adapted to a new speaking style or voice for branding and personalization purposes.
— Ron Hoory, Senior Technical Manager for Speech Technologies, IBM Research
Learn more about our approach here.
14 New Voices
Our new neural text-to-speech voices are now available for the following languages:
Brazilian Portuguese, English US, English UK, French, German, Italian, Japanese*, Spanish Castilian, Spanish North American, and Spanish Latin American. (*Japanese will be available early Q3)
Click & Take A Listen!
To hear for yourself how natural these voices sound, take a listen to the voice samples below.
“As this potentially dangerous situation unfolds, check back with weather.com and The Weather Channel for the latest information, as well as for current watches and warnings.”
“You’ve requested next-day shipping for your package. Please note that someone will need to sign for it upon delivery.”
“Si quieres ser sabio, aprende a interrogar razonablemente, a escuchar con atención, a responder serenamente y a callar cuando no tengas nada que decir.”
“Abdicar de seus mandatos neste momento é mais uma artimanha daqueles sobre os quais recaem gravíssimas suspeitas.”
“Neben Blitz, Donner und starkem Regen seien am Wochenende Hagel und Sturmböen möglich, teilte der Deutsche Wetterdienst am Freitag mit.”
Documentation: Read all about our TTS capabilities, languages and voice technologies. In addition to our neural voices, our current standard concatenative voices will continue to remain available and supported.
Demo: Try out our TTS languages and voice technologies for yourself.
Whitepaper: Read all about the science behind the technology of our new neural voices in our whitepaper: “High quality, lightweight and adaptable TTS using LPCNet”.
Ron Hoory, a Senior Technical Manager for Speech Technologies at IBM Research, is a contributing author to the whitepaper noted above and this blog.