The ever-changing landscape of voice technology

Matthew Leyburn
Kainos Applied Innovation
6 min readOct 10, 2019

“Hey Siri, show me an amazing speech recognition article”
“Ok, here’s what I found on the web…”

An illustration of a voice assisstant

The year is 2007. I‘m in high school, and it’s lunchtime. I’ve got my hands on a slick new Nokia E62. My friends are gathered watching me in awe as I call my mum — but get this, without touching my phone! I just used my voice.

Well, the story kinda goes like that. My phone did call mum, but it took 3–4 attempts, and it just seemed to be a bit of a gimmick.

Voice recognition technology has come a long way since then. It’s everywhere now. Our homes, cars, workplace — even McDonalds are getting in on the act! The industry has shifted for both businesses and consumers and only continues to grow. The global market share is predicted to be worth £25.8 Billion by 2025. What once was considered a gimmick, is now the future.

Believe it or not, the first developments began in 1769 by Wolfgang von Kempelen who created an “Acoustic Mechanical Speech Machine”. It does perform speech synthesis (speech by computer or other device). You can hear words like “ma-ma” and “pa-pa”. To me, it often sounds like someone strangling a goose.

Before Thomas Edison invented the lightbulb, he had another lightbulb moment when in 1879 invented the first dictation machine. Then throughout the 20th-century innovations were made by IBM, Bell Labs & Carnegie Mellon University which has formed the foundations of voice recognition like we know today.

Late 2000’s brought along massive strides. Google launched a voice search app, Apple acquired Siri and Alexa became more than just an office conversation at Amazon.

Speech recognition truly took its stand in the consumer market from 2014 onwards, with the release of Amazon’s Alexa — inspiring an extremely competitive home assistant market and rapid innovations in voice technology.

In recent years, the impact of Artificial Intelligence has exploded the growth of voice recognition technology. Google is reporting a reduced error rate of 30% since 2012 — a drastic improvement that is almost all down to AI, well an application of AI called Deep Learning.

Artifical Intelligence subsets
Source: https://nvidia.com/content/dam/en-zz/Solutions/deep-learning/home/DeepLearning_eBook_FINAL.pdf

Deep what?

Deep Learning is a subset of machine learning. With speech recognition, the technique involves using hours and hours of audio to train neural networks which enables the system to make predictions or other words, “understand” us better.

If you are curious to find out who the current leader of voice is: Amazon, Google or Microsoft. Check out my blog below, where I discover how each fared against a diverse range of audio, like accents around the world & Monty Python clips.

Less screen, more voice

As voice technology continues to improve, innovative opportunities for voice present themselves in areas that we may not have expected or thought possible before:

Healthcare

Shopping/Fashion

Education

Gaming

A crucial impact of voice is with accessibility. Apple’s new 2019 software updates, you can easily navigate your devices using just your voice. Windows 10 has its own voice control system too and Google is working on similar control for Andriod.

Apple voice control feature on Mac

This opens a world of opportunities to users who previously were unable to utilise the full potential of computers.

With future innovations tasks like video editing, engineering and coding will be accessible to everyone and open up more jobs to users with disabilities.

Although, let’s hope it’s all as reliable as intended…

Me: Hey Siri, delete browser historySiri: Tweeting browser historyMe: Nooooo!!!

What’s the hold up then?

As voice has breached its way into a multitude of sectors, this has sparked a slight transition from screen-first as our primary medium of interacting with devices, to voice-first.

But, we can’t argue with facts and the facts show that, well, we just aren't adopting voice like many past predictions have suggested. I’m not a time traveller but the ever-popular prediction of “50% of all searches will be voice searches by 2020” is just not going to happen, in my opinion.

A study by CPE outlines user retention rate issues,

“Alexa skill user retention after two weeks is a dismal 3% (compared to 10–11% for a typical mobile app on iOS or Android)”

Voice assistant use frequency by device
Source: https://voicebot.ai/wp-content/uploads/2018/11/voice-assistant-consumer-adoption-report-2018-voicebot.pdf

Voicebot.ai, says that 35% of us rarely use of voice assistants on our smartphones, something we have access to at almost every point in the day. It’s clear to see that voice assistants have not yet been woven into the fabrics of society.

Privacy is also a widespread issue. Many reports have shown that customers don’t use/own voice assistants because they are too afraid of being listened to. Big tech companies also don’t help themselves when there is evidence backing up these concerns.

If we forget about the technology or even the services behind voice tech, a big caveat to all this is, well, it’s us. It’s human error. Even if voice technology is perfect. I frequently mumble and stutter, and get my words mixed up. Voice interfaces will have to one day become truly conversational and natural to take the reigns as our primary medium of interacting with devices.

We are just getting started

Think of the iPhone, in 2007 it was just a screen capable of calls, internet and email. Then years of technological advancements and over 2.1 million apps later. You suddenly have companies like Snapchat and revolutionary breakthroughs like Flappy Bird.

We are only beginning to realise the true potential of voice technology. Businesses are still experimenting and developing proof of concepts, developers are restricted by their current tools, UI/UX designers have to shift to a voice-first, screen-second mindset and consumers still need time to learn and embrace this new era of technology.

--

--