Watson Speech improvements for British English, German, and French

Rachel Liddell
Aug 19, 2020 · 3 min read
Image for post
Image for post

Speech recognition quality is key to answer your customers’ questions, reduce average call handling time, and to deliver the best possible customer experience. In order help our clients meet these goals, the IBM Watson Speech team has upgraded our British English, French, and German speech to text models. Using deep learning and neural networks, we improved the models’ performance for customer care, focusing on the telecommunications, banking, retail, and insurance domains. The upgrade delivered substantial improvements in accuracy. Word error rates decreased by as much as 50% relative to previous model versions. These upgrades will help you communicate with your customers, support your employees, and caption media to increase accessibility.

Alphanumerics and Accents

Beyond general accuracy, we targeted some particular transcription tasks. We improved the models’ performance on letters and numbers, expanded their vocabularies, and strengthened their acoustic robustness for background noise and speaking styles. We also increased the versatility and flexibility of British English to better handle different regional accents. Accuracy increased for Irish, Scottish, Welsh, Midland, and Northern English accents, as well as accents across Greater London.

Image for post
Image for post

Model Types

All of our languages support both telephony (narrowband) and high quality (broadband) audio. Telephony audio has a sampling frequency of 8kHz and is typically found flowing through call centers and over landlines. The narrowband model is designed for telephony, but you can use it for audio from other sources as well. We design our broadband models to transcribe audio with a sampling frequency of 16kHz. You’ll find this audio in meeting recordings, video conferencing applications, mobile applications, and rich media sources. The German and British English updates apply to both the broadband and narrowband models. For French, we’ve released the broadband model, and the narrowband model is on its way!

Access the upgrade

You can start using these upgraded models here. If you need guidance, check out our documentation. If you’re an existing user of British English, French, or German, we will automatically upgrade you to the newest model. That way, you can immediately use the update. If you currently use custom models, we won’t automatically switch you over to the new version. That way, your performance stays consistent. When you’re ready to benefit from the release, upgrade your custom model using these instructions.

If you want to learn how to assess these new models, check out our article “How to Properly Evaluate Speech Models.”

New Voices

Also, don’t forget to listen to our new and improved neural voices. We improved the naturalness of Kate, a British English voice, and added two new neural British English voices, James and Charlotte. We also added the Nicolas voice, a new neural voice for French. You can listen to them here. These improvements enhance and broaden our support for French and British English.

Feel free to reach out with any questions or concerns. Happy transcribing. :)

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store