6 problems AI faces in speech recognition

3 min readAug 15, 2017

All large companies are investing in voice recognition and the world is slowly yet steadily adjusting to the new technology of Artificial Intelligence (AI). So why is it taking so long, why isn’t it part of our day to day lives yet? Here are the 6 reasons why.

You go to a store to look for a particular colour and brand of a product. You ask an employee if the product you want is available. The employee goes to the warehouse, checks his inventory for the product, and comes back a while later, only to tell you that your product isn’t available anymore.

Now imagine this, you enter the same store and tell a tiny device the product you want to buy. Within a second, a voice tells you the exact availability of your product, and, if unavailable, gives you details on the outlets where the product is available.

The AI device does this by internally scanning through all the digital inventory systems. With numerous benefits in relation to cost logistics and more importantly convenience, why hasn’t the art of speech recognition and personal assistants been perfected yet?

With science making huge strides in sound wave recognition, we take a look at some of the main problems researchers are facing when decoding speech to text.

Noise

Voice recording machines detect sound waves that are generated through speech. Background noises in rooms make it hard for systems to understand and distinguish between the specific sound waves from the host voice. This blurs the sound picked-up by the devices, confusing, and limiting its processing ability.

Echo

Echoes are basically sound-waves reflected across various surfaces, such as walls, tables, or other furniture. This leads to a disorganised return of sound waves back to the receptors, thus reducing clarity.

Accents

A wide range of accents in every language is another factor that leads to difficulties in speech recognition. If the same word can be pronounced in a number of different ways, the syllables and phonetics of the same word tend to vary, making it harder for the machine to process.

Similar Sounds

Similar sounding words and phrases can prevent proper encoding and decoding of the voice message. For example “Let’s wreck a nice beach“ and “Let’s recognise speech“ are phonetically very similar and can easily confuse the device.

Machine error

Accuracy levels of voice detection have high error rates. Machines still face about 8%-12% of errors, which is more than twice as much as humans make in their day to day speech. Errors in the encoding of collected data are crucial to performance, as it’s the first step for the voice recording devices to act upon.

Disorganised Speech

The bringing together of words in our daily conversations, mean that many words and phrases merge together. This is unsuited to machine and voice to text recognition, as it makes it harder to recognise specific words or phrases that will influence the device’s consequential response and actions.

In all, no matter how advanced these machines may be, the above factors will continue to be a hinderance on the development of AI assistants moving forwards. However the speed by which science and technology have been developing, all big companies are focusing on creating the optimum voice recognition devices, and sooner or later they creases will be ironed out, and we will all have a voice enabled robot that will run our homes as well as our lives.

Find out more about RAF 100 event and What is STEM

Make sure to follow us on LinkedIn to access our exclusive content! #raf100event #WhatIsSTEM

6 problems AI faces in speech recognition

Written by RAF100STEAM