IBM Watson Text to Speech: Neural Voices Generally Available

Enable better customer interaction and conversation with clear, crisp, more natural-sounding voice technology.

Kati Venturato
Jun 28, 2019 · 3 min read
Photo by Oleg Laptev on Unsplash

We are pleased to announce that IBM Watson Text to Speech (TTS) service has introduced a new set of voices based on the latest neural techniques and technologies that provide a more human-sounding synthesized speech. These new voices are now generally available in both our public cloud and private cloud offerings.

Previously, speech synthesis was based on copying short speech segments from a recorded voice data set and then concatenating them together.

Today, our speech synthesis is based on the latest voice technologies in three (3) deep neural networks (DNNs), which learn various aspects of speech during the training process.

At the time of speech synthesis, the DNNs predict the pitch and phoneme durations (prosody), spectral structure, and waveform of the speech, making the voice output crisper, clearer and much more natural-sounding.

The advantage of this modular approach is that it enables fast and easy training, as well as independent control of each component. Once the base networks are trained, they can then be adapted to a new speaking style or voice for branding and personalization purposes.

— Ron Hoory, Senior Technical Manager for Speech Technologies, IBM Research

Learn more about our approach here.

14 New Voices

Our new neural text-to-speech voices are now available for the following languages:

Brazilian Portuguese, English US, English UK, French, German, Italian, Japanese*, Spanish Castilian, Spanish North American, and Spanish Latin American. (*Japanese will be available early Q3)

Watson Text to Speech Neural Voices

Click & Take A Listen!

To hear for yourself how natural these voices sound, take a listen to the voice samples below.

US English - Lisa

As this potentially dangerous situation unfolds, check back with weather.com and The Weather Channel for the latest information, as well as for current watches and warnings.

US English - Michael

You’ve requested next-day shipping for your package. Please note that someone will need to sign for it upon delivery.

North American Spanish - Sofia

Si quieres ser sabio, aprende a interrogar razonablemente, a escuchar con atención, a responder serenamente y a callar cuando no tengas nada que decir.

Brazilian Portuguese - Isabela

Abdicar de seus mandatos neste momento é mais uma artimanha daqueles sobre os quais recaem gravíssimas suspeitas.

German - Dieter

Neben Blitz, Donner und starkem Regen seien am Wochenende Hagel und Sturmböen möglich, teilte der Deutsche Wetterdienst am Freitag mit.

Additional Resources:

Documentation: Read all about our TTS capabilities, languages and voice technologies. In addition to our neural voices, our current standard concatenative voices will continue to remain available and supported.

Demo: Try out our TTS languages and voice technologies for yourself.

Whitepaper: Read all about the science behind the technology of our new neural voices in our whitepaper: “High quality, lightweight and adaptable TTS using LPCNet”.


Ron Hoory, a Senior Technical Manager for Speech Technologies at IBM Research, is a contributing author to the whitepaper noted above and this blog.

IBM Watson

AI Platform for the Enterprise

Kati Venturato

Written by

Product Manager @IBMWatson

IBM Watson

AI Platform for the Enterprise

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade