The Current Challenges Of Speech Recognition 💬

Onlim
Onlim
Nov 30 · 4 min read

During the last few years, speech recognition has improved a lot.

www.onlim.com

This can mainly be attributed to the rise of graphics processing and cloud computing, as these have made large data sets widely distributable.

With recent developments, it’s going to be interesting to see how the momentum of rapid growth can be maintained and how the current challenges of speech recognition will be dealt with.

The last 5–10 years in Automatic Speech Recognition (ASR) have been mainly focused on minimizing mistakes while decoding voice inputs. That’s what made widely known systems like Siri, Alexa, and Google Assistant possible. It’s through these popular voice assistants that voice recognition has made its way into our everyday lives.

In this article, we’ll look at the current challenges of speech recognition and what developments we can expect in the future.

The current challenges of speech recognition are diverse

The current challenges of speech recognition are caused by two major factors — reach and loud environments. This calls for even more precise systems that can tackle the most ambitious ASR use-cases. Think about live interviews, speech recognition at a loud family dinner or meetings with various people. These are the upcoming challenges to be solved for next-gen voice recognition.

Beyond this, speech recognition needs to be made available for more languages and cover wide topics. Because as of now, ASR needs a lot of data to work well and some of it just hasn’t been collected for certain languages and topics. Without adding these, ASR systems will remain noticeably handicapped.

The use-case for voice assistants and Voice Powered User Interfaces (VUIs) is simple. They allow humans to give voice commands to machines, which these can translate into actions. As clear as the use-case appears to be, the best method for human-machine interactions is still being shaped. Naturally, this comes with challenges for speech recognition.

Speech recognition software isn’t always able to interpret spoken words correctly. This is due to computers not being on par with humans in understanding the contextual relation of words and sentences, causing misinterpretations of what the speaker meant to say or achieve.

Comparing humans and VUIs, the speech recognition systems are lacking millennia of contextual experience and VUIs still encounter challenges when trying to understand the semantics of a sentence.

We’d usually assume that computerizing a process would speed it up. Unfortunately, this is not always the case when it comes to voice recognition systems. In many cases using a voice app takes up more time than going with a traditional text-based version.

This is mainly due to the diverse voice patterns of humans, which VUIs are still learning to adapt to. Hence, users often need to adjust by slowing down or being more precise than normal in their pronunciation.


Build better voice apps. Get more articles & interviews from voice technology experts at voicetechpodcast.com


VUIs are oftentimes challenged when voice inputs divert too much from the average. Especially accents can pose a big challenge. While systems are getting better there’s still a big difference in their ability to understand American or Scottish English for example. Even a simple cold can be a reason for voice commands not to work as well as usual.

Chatbots, voice assistants and AI,

stay informed with the free Onlim newsletter.

Background noise and loud environments

To make the most of VUIs a quiet environment helps a lot. Whenever there is too much background noise speech recognition will be challenged. Making it especially hard to use them effectively in the urban outdoors or large public spaces/offices. With the use of specific microphones or headsets, the limitations can be decreased but it requires an additional device, which is never desirable.

For a voice assistant being able to learn, data inputs are needed. These can be generated through paid research or studies, which is a very limiting approach. Specifically, when compared to the sheer endless amount of data that is created through everyday usage of voice systems. Yet, the use of this data must undergo well-placed scrutiny, as the thought of having all their voice inputs collected doesn’t sit well with many people. Most importantly when these data sets are controlled by large companies that want to make a profit, keeping user data safe can easily become a conflict of interest. Therefore, a great challenge of voice recognition lies in making data input available for AI, but still, acknowledge the need for data privacy and security.

The Current Challenges Of Speech Recognition.
Click to Tweet

We grow with our challenges and the same holds true for VUIs. With ongoing input, they will be able to learn more consistently. Allowing for even more unique use cases and speech recognition capacities.

— -

This article was originally published at onlim.com


Something just for you

Voice Tech Podcast

Voice technology interviews & articles. Learn from the experts.

Onlim

Written by

Onlim

Automating customer communication through chatbots and voice assistants. 👉 www.onlim.com/en

Voice Tech Podcast

Voice technology interviews & articles. Learn from the experts.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade