The ever-changing landscape of voice technology

Published in

Kainos Applied Innovation

6 min readOct 10, 2019

“Hey Siri, show me an amazing speech recognition article”
“Ok, here’s what I found on the web…”

The year is 2007. I‘m in high school, and it’s lunchtime. I’ve got my hands on a slick new Nokia E62. My friends are gathered watching me in awe as I call my mum — but get this, without touching my phone! I just used my voice.

Well, the story kinda goes like that. My phone did call mum, but it took 3–4 attempts, and it just seemed to be a bit of a gimmick.

Voice recognition technology has come a long way since then. It’s everywhere now. Our homes, cars, workplace — even McDonalds are getting in on the act! The industry has shifted for both businesses and consumers and only continues to grow. The global market share is predicted to be worth £25.8 Billion by 2025. What once was considered a gimmick, is now the future.

Timeline of the history of speech recognition — Source: https://codeburst.io/html5-speech-recognition-api-670846a50e92

Believe it or not, the first developments began in 1769 by Wolfgang von Kempelen who created an “Acoustic Mechanical Speech Machine”. It does perform speech synthesis (speech by computer or other device). You can hear words like “ma-ma” and “pa-pa”. To me, it often sounds like someone strangling a goose.

Before Thomas Edison invented the lightbulb, he had another lightbulb moment when in 1879 invented the first dictation machine. Then throughout the 20th-century innovations were made by IBM, Bell Labs & Carnegie Mellon University which has formed the foundations of voice recognition like we know today.

Late 2000’s brought along massive strides. Google launched a voice search app, Apple acquired Siri and Alexa became more than just an office conversation at Amazon.

Speech recognition truly took its stand in the consumer market from 2014 onwards, with the release of Amazon’s Alexa — inspiring an extremely competitive home assistant market and rapid innovations in voice technology.

In recent years, the impact of Artificial Intelligence has exploded the growth of voice recognition technology. Google is reporting a reduced error rate of 30% since 2012 — a drastic improvement that is almost all down to AI, well an application of AI called Deep Learning.

Artifical Intelligence subsets — Source: https://nvidia.com/content/dam/en-zz/Solutions/deep-learning/home/DeepLearning_eBook_FINAL.pdf

Deep what?

Deep Learning is a subset of machine learning. With speech recognition, the technique involves using hours and hours of audio to train neural networks which enables the system to make predictions or other words, “understand” us better.

If you are curious to find out who the current leader of voice is: Amazon, Google or Microsoft. Check out my blog below, where I discover how each fared against a diverse range of audio, like accents around the world & Monty Python clips.

Find the Best Solution for Your Use Case — Speech to Text Testing Tool

Find out who the current leader of voice is: Amazon, Google or Microsoft. Discover how each fared against a diverse range of audio, like Monty Python clips…

medium.com

Less screen, more voice

As voice technology continues to improve, innovative opportunities for voice present themselves in areas that we may not have expected or thought possible before:

Healthcare

Virtual Health Assistant | Aiva Health

It's the foundation of everything you do as care providers in an industry dedicated to making sure patients go home…

aivahealth.com

Shopping/Fashion

How Voice Technologies Are Shaping The Future Of Fashion Industry?

When I ask people to imagine what the world will be in 2030, I hear a variety of answers. From self-driving bus to…

upsidelab.io

Education

Students and Faculty love Sonix - Convert your audio and video interviews to text - Sonix is the…

From lectures to thesis interviews, both students and faculty rely on Sonix for their transcription needs. Whether it…

sonix.ai

Gaming

One Hand Clapping

An incredibly unique 2D puzzle platformer that invites you to sing into a microphone to solve musical puzzles. Discover your voice…

baddreams.itch.io

‎Chicken Scream

‎Chicken scream is a fun and interactive non-tapping addictive game where your voice controls the chicken. Make a lot…

apps.apple.com

A crucial impact of voice is with accessibility. Apple’s new 2019 software updates, you can easily navigate your devices using just your voice. Windows 10 has its own voice control system too and Google is working on similar control for Andriod.

Apple voice control feature on Mac

This opens a world of opportunities to users who previously were unable to utilise the full potential of computers.

With future innovations tasks like video editing, engineering and coding will be accessible to everyone and open up more jobs to users with disabilities.

Although, let’s hope it’s all as reliable as intended…

Me: Hey Siri, delete browser historySiri: Tweeting browser historyMe: Nooooo!!!

What’s the hold up then?

As voice has breached its way into a multitude of sectors, this has sparked a slight transition from screen-first as our primary medium of interacting with devices, to voice-first.

But, we can’t argue with facts and the facts show that, well, we just aren't adopting voice like many past predictions have suggested. I’m not a time traveller but the ever-popular prediction of “50% of all searches will be voice searches by 2020” is just not going to happen, in my opinion.

A study by CPE outlines user retention rate issues,

“Alexa skill user retention after two weeks is a dismal 3% (compared to 10–11% for a typical mobile app on iOS or Android)”

Voice assistant use frequency by device — Source: https://voicebot.ai/wp-content/uploads/2018/11/voice-assistant-consumer-adoption-report-2018-voicebot.pdf

Voicebot.ai, says that 35% of us rarely use of voice assistants on our smartphones, something we have access to at almost every point in the day. It’s clear to see that voice assistants have not yet been woven into the fabrics of society.

Privacy is also a widespread issue. Many reports have shown that customers don’t use/own voice assistants because they are too afraid of being listened to. Big tech companies also don’t help themselves when there is evidence backing up these concerns.

If we forget about the technology or even the services behind voice tech, a big caveat to all this is, well, it’s us. It’s human error. Even if voice technology is perfect. I frequently mumble and stutter, and get my words mixed up. Voice interfaces will have to one day become truly conversational and natural to take the reigns as our primary medium of interacting with devices.

We are just getting started

Think of the iPhone, in 2007 it was just a screen capable of calls, internet and email. Then years of technological advancements and over 2.1 million apps later. You suddenly have companies like Snapchat and revolutionary breakthroughs like Flappy Bird.

We are only beginning to realise the true potential of voice technology. Businesses are still experimenting and developing proof of concepts, developers are restricted by their current tools, UI/UX designers have to shift to a voice-first, screen-second mindset and consumers still need time to learn and embrace this new era of technology.

Find the Best Solution for Your Use Case — Speech to Text Testing Tool

Find out who the current leader of voice is: Amazon, Google or Microsoft. Discover how each fared against a diverse range of audio, like Monty Python clips…

medium.com

Kainos Applied Innovation

Blogs from the Applied Innovation team at Kainos.

medium.com

The ever-changing landscape of voice technology

Deep what?

Find the Best Solution for Your Use Case — Speech to Text Testing Tool

Find out who the current leader of voice is: Amazon, Google or Microsoft. Discover how each fared against a diverse range of audio, like Monty Python clips…

Less screen, more voice

Virtual Health Assistant | Aiva Health

It's the foundation of everything you do as care providers in an industry dedicated to making sure patients go home…

How Voice Technologies Are Shaping The Future Of Fashion Industry?

When I ask people to imagine what the world will be in 2030, I hear a variety of answers. From self-driving bus to…

Students and Faculty love Sonix - Convert your audio and video interviews to text - Sonix is the…

From lectures to thesis interviews, both students and faculty rely on Sonix for their transcription needs. Whether it…

One Hand Clapping

An incredibly unique 2D puzzle platformer that invites you to sing into a microphone to solve musical puzzles. Discover your voice…

‎Chicken Scream

‎Chicken scream is a fun and interactive non-tapping addictive game where your voice controls the chicken. Make a lot…

What’s the hold up then?

We are just getting started

Find the Best Solution for Your Use Case — Speech to Text Testing Tool

Find out who the current leader of voice is: Amazon, Google or Microsoft. Discover how each fared against a diverse range of audio, like Monty Python clips…

Kainos Applied Innovation

Blogs from the Applied Innovation team at Kainos.

Written by Matthew Leyburn