IBM Introduces Neural Voices for Arabic, Dutch, Korean, Australian English, and Mandarin Chinese

Improve customer experience with natural, clear, crisp voices powered by deep neural networks

Rachel Liddell
IBM Watson Speech Services
3 min readDec 3, 2020

--

Meet our newest Neural Voices! They join the Watson Text to Speech portfolio to strengthen your ability to serve customers across the globe. Neural Voices improve customer interaction with a clear, crisp, natural sound.

In this release, we have updated eight existing voices for Arabic, Dutch, Korean, and Mandarin Chinese. We also added two new voices for Korean and two new voices for Australian English. All voices are available in the IBM public cloud.

Selection of Voice Samples

Here are samples from each new language added. We’ve released seven other new Neural Voices as well, which you can listen to in our catalog.

Neural Technology

These new voices sound natural because they leverage deep neural networks (already in use for other languages in the IBM portfolio). Previous training techniques synthesized speech by connecting small segments of speech, called phones, to form words. This method can lead to choppiness. Neural Voices, on the other hand, employ deep neural networks (DNNs) in their prosody and acoustic models. Deep neural networks produce more natural sounding voices, even with less training data. Neural Voices also require less customization, since they sound smooth out of the box. With this release, IBM now offers Neural Voices or Enhanced Neural Voices for all supported languages.

Below is a diagram of the steps of neural synthesis at runtime. Your text is analyzed, then deep neural networks predict the pitch and phoneme durations (prosody), spectral structure, and waveform of the speech.

Customization Support

You can apply all customization tools available for Neural Voices to the voices in this release. That includes pronunciation dictionaries, which adjust the articulation of domain-specific terms or brand names. You can also leverage the Synthesized Speech Markup Language (SSML) to tailor the pitch, speaking rate, and pauses of any Neural Voice. See our documentation on customization here.

Use Cases

You can use these Neural Voices for any use case, but virtual agents are the most common implementation. Watson Assistant offers a voice channel integration, so you can serve your customers on the phone, using your favorite Watson Text to Speech voice. Your users will immediately get the information they need, without waiting for a human to respond.

Watson Text to Speech enables personalized customer experiences. The service synthesizes words faster than real time. That means text can change for each customer to read an account balance, confirm the spelling of a name, or even report the weather, all in a crisp, natural fashion.

Documentation

Now that you’ve learned about these new voices, go and give them a try! Here are some helpful links to get you started.

Stay tuned for a new Canadian French voice coming out in January 2021!

Note: The ar-AR_OmarVoice has been renamed to ar-MS_OmarVoice. If you currently use the Arabic voice using the previous name, you will need to update the voice model in your API calls to access customization.

--

--

Rachel Liddell
IBM Watson Speech Services

Rachel is a Product Manager for Watson Assistant. She focuses on channels and integrations.