IBM’s New Watson Large Speech Model Gives Generative AI a Voice

Marco Noel
IBM Watson Speech Services
3 min readDec 12, 2023

Authors and Key Contributors: George Saon, Gakuto Kurata, Daniel Bolanos, Gretchen Tietge, Peter Keller, Kelly To

We are now announcing our new Large Speech Models (LSM) for Watson Speech to Text in closed beta only. If you wish to apply for our closed beta user sponsor program, you can fill out this Monday form.

IBM’s New Watson Large Speech Model Gives Generative AI a Voice
Photo by Andreas Klassen on Unsplash

Everyone has heard of large language models, or LLMs, since generative AI has entered our daily lexicon through its amazing text and image generating capabilities, and its promise as a revolution in how enterprises handle any number of core business functions and use cases.

More than ever before the thought of talking to AI through a chat interface, or have it perform specific tasks for you in the same way a human can, is a tangible reality and enormous strides are taking place to increase the footprint of this technology across our daily experiences as individuals and consumers.

Well what about in the world of voice?

So much attention has been given to LLMs as a catalyst for enhanced chat capability that virtually no one is thinking about one of the most important areas where customer care can see a major win, and that is in the modern contact center currently dominated by rigid conversational experiences (yes, IVR is still the norm), and human agents that are ever fewer and increasingly overwhelmed. Enter the world of large speech models, or LSMs. Yes, LLMs have a more vocal cousin injected with all the same benefits and possibilities you expect from generative AI, but this time your interface is the phone.

Get Ready for the New Watson Large Speech Model
Photo by aboodi vesakaran on Unsplash

Get Ready for the New Watson Large Speech Model

Over the past few months, IBM watsonx and IBM Research have been hard at work developing a new, state-of-the-art Large Speech Models (LSM). Based on transformer technology , Large Speech Models take vast amounts of training data and model parameters to deliver near human speech recognition accuracy. Purpose-built for customer care use cases like self-service phone assistants and real-time call transcription, our LSMs deliver highly accurate transcriptions out-of-the-box to create a seamless customer experience.

We are really excited to announce that IBM deployed new LSMs at the end of November for English, Japanese and French languages, exclusively in closed beta for Watson Speech to Text and watsonx Assistant phone customers.

But there’s more. The LSMs showed comparable performance on long-form use cases as well (e.g., call analytics, call summarization) as it was trained on this type of data with longer utterances.

How can you get started with these models?

Apply for our closed beta user sponsor program through this Monday form, and our Product Management team will reach out to you to schedule a call. As the Watson LSMs are in closed beta, not all features and functionalities are available yet, but we are continuously developing and improving as we speak.

--

--

Marco Noel
IBM Watson Speech Services

Sr Product Manager, IBM Watson Speech / Language Translator. Very enthusiastic and passionate about AI technologies and methodologies. All views are only my own