Voice Tech Trends in 2021

Anna Prist
6 min readDec 22, 2020

--

Another year, another trend forecast for Conversational AI! Predicting the next year’s buzz

2020 has been a real “annus horribilis” for the entire global economy and for each one of us. The COVID-19 pandemic has affected every aspect of our lives, crashing businesses, and having a significant impact on Conversational AI.

AI-powered chatbots and virtual assistants were at the forefront of the fight against Covid, helped screen and triage patients, conducted surveys, provided vital Covid-related information, and more. And quite naturally, we saw more conversational agents in telemedicine — from FAQ-chatbots and virtual consultants to chatbot-therapists — that made health services more accessible to people who can’t leave their homes (and there were billions of them in the spring of 2020). Needless to say, conversational agents combating the virus are becoming a long-term trend, and next year we will see more solutions complying with the ‘less touching, more talking’ rule.

Despite the difficulties, consumer data show that hearables ownership by US adults has risen by about 23% during that period while voice assistant use through hearables grew by 103% from 21.5 million in 2018 to 43.7 million in 2020. The data show that hearables and voice assistant adoption are complementary technology trends.

The voice and speech recognition market is expected to grow at a 17.2% compound annualized rate to reach $26.8 billion by 2025. From this perspective, we can conclude that voice UX became a real pragmatic innovation.

In the coming year, we expect to see more technically-driven smart devices and more support for users. Assistants are expected to become proactive and distinguish users in order to bring personalized content. This especially applies to kids’ content, where certain restrictions are necessary.

In 2021 we will definitely see:

Voice assistant in a mobile app

Voice in mobile apps is the hottest trend right now, and it will stay so because voice is a natural interface.

Natural interfaces are about to become obsolete and even displace swiping and typing. Voice-powered apps increase functionality, saving us from complicated navigation, form-filling, overlaid menus, support, etc. They make it far easier for an end-user to submit their request — even if they don’t know the exact name of the item they’re looking for or where to find it in the app’s menu. Pretty soon, users won’t just appreciate the greater functionality and friendliness of a voice-powered mobile app, they’ll anticipate it.

Outbound calls and smart IVR powered with an NLU

It’s not about hollow-hearted cold calls. These are the smart solutions that will replace agents in call centers pretty soon because they are effective, performant, and easy-to-customize. There are more and more companies offering such services and it seems like a reasonable guess that this is where the calling sales are moving.

Voice Cloning

Or a voice replication technology. Machine learning tech and GPU power development commoditize custom voice creation and make the speech more emotional, which makes this computer-generated voice indistinguishable from the real one. You just use a recorded speech and then a voice conversion technology transforms your voice into another. Voice cloning becomes an indispensable tool for advertisers, filmmakers, game developers, and other content creators.

Voice assistants in smart TVs

Smart TV is an obvious placement for a voice assistant — you don’t really want to look for that clicker and spend some more time clicking when you can use your voice to navigate. All you need to do is press and hold the microphone button and speak normally — there’s no active listening, and no need to shout your commands from across the room. With a smart assistant on your TV, you can easily browse the channels, search for the content, launch apps, change the sound mode, look for the information, and many more, depending on the TV model.

Smart displays

Last year, we said that smart displays were on the march because they, pretty much, expanded voice tech’s functionality. Now, the demand for these devices remains high because smart displays showed a huge improvement over the last year as more customers preferred them over regular smart speakers. In the third quarter of 2020, the sales of smart displays hit 9.5 million units. In other words, it grew by 21% year-on-year As a result, the market share of this product category rose to 26% from 22% last year. Therefore, we expect there will be more customized, more technologically advanced devices in 2021. Smart displays, like the Chinese smart screen Xiaodu, are already equipped with a suite of upgraded AI-powered functions, including far-field voice interaction, facial recognition, hand gesture control, and eye gesture detection. What’s next?

Voice for business

In 2021 we will definitely see more solutions to improve business processes — voice in meetings and voice for business intelligence. Voice assistants will be highly customized to business challenges, integrated with internal systems like CRM, ERP, and business processes. Furthermore, more SMBs see enterprises making profits and in 2021 there will definitely be more companies looking for voice-first solutions.

Games and content

More games, learning, and entertaining content are expected since tech companies like Amazon, Google, and other voice-first tools’ developers push their builders to the market. Advertising via smart speakers and displays is a great chance to promote a product. So, communications and entertainment market majors like Disney Plus or Netflix partner up with new tech platforms to become a first-mover. 2021 will bring us more partnerships like these, more games, and education skills from third-party developers.

Voice in the gaming industry

When talking about Conversational AI and gaming, one cannot fail to mention text-to-speech (TTS), synthetic voices, and generative neural networks that help developers create spoken and dynamic dialogue. Now it takes a lot of time and effort to record a voice for spoken dialogues within the game for each of the characters. In the upcoming year, developers will be able to use sophisticated neural networks to mimic human voices. In fact, looking a little bit ahead, neural networks will be able to even create appropriate NPC responses.

Multimodal approach

More and more developers come to the conclusion that a device ecosystem and multimodal approach are much needed. A voice assistant may simultaneously live on your mobile phone, smartwatches, smart home, and smart TV. Obviously, ‘1 assistant, few devices’ is the right approach here.

Interoperability

On the contrary, another idea could go pretty strong in 2021 — bundling multiple virtual assistants in a single device, if that’s practical for a user. A year ago, Amazon launched a project that could allow Alexa to be bundled together with other virtual assistants in a single device. It made sense, for customers could choose which voice service would best support a particular interaction. And although this could be a great marketing moment for Amazon, which could gather smaller marketeers; some of the largest players in the space like Google, Apple, and Samsung abandoned the idea for obvious reasons. Still, we’ll see where this idea goes pretty soon.

Beyond this, work on security and privacy advancements, as well as work on skill discovery and monetization would continue due the following year. Before the technology is scaled up to widespread use, the users must be convinced of its safety and correct operation.

More and more developers are diving into conversational AI, which means we will see some really great things, and the voice tech will do a world of good!

Let this New Year bring us more voice experiences!

Happy holidays!

--

--

Anna Prist

I write of great minds and smart machines that change the world for a better future