Building Consumer Trust in Voice-Activated Assistants

AnyaGooch
Chirp
Published in
5 min readMar 1, 2017

Here at Chirp, we’re passionate about sound, how it works, how it’s used everyday in nature and by people, and the endless applications it benefits through technology.

When asked recently what some of the blockers to IoT adoption within the Connected Home may be, Chirp’s CTO, James Nesfield noted the importance of the ‘things’ that are coming into our shared space being sympathetic to the way in which we humans share our space with each other.

In this post, we will look at one of these particularly sympathetic mediums — sound, and in particular, speech recognition based technologies and the challenges that could prevent mass consumer adoption.

Within the Connected Home space, Amazon, Microsoft, Apple and Google are all heavily invested in voice-activated assistants underpinned by Natural Language Processing (NLP) algorithms — cue Alexa, Cortana, Siri and Google Assistant respectively.

Voice recognition and NLP, two of the most complex areas of computer science are now being widely adopted by industry. Indeed, the advantages to businesses are clear — increased consumer behaviour data, process automation and of course cost efficiencies, to name but a few.

In 2015, there were 1.7 million voice-first devices shipped. In 2016, there were 6.5 million devices shipped. In 2017, VoiceLabs predicts there will be 24.5 million devices shipped, leading to a total device footprint of 33 million voice-first devices in circulation. This is certainly getting closer to mainstream adoption in 2017 but is still an evolution as opposed to a revolution.

But what are some of the blockers to consumer adoption? Statistics for voice recognition uptake on mobile devices could provide a few clues as to where these may lie. Whilst 39% of smartphone owners in the US are believed to use voice recognition software, this peaks amongst smartphone users of ages 18–24 at 48%.

The older generations seems less keen which could be down to a lack of reliability of previous iterations of this technology. In 1995, the lowest error rate (achieved by IBM) in speech recognition was 43%. After a further 9 years, IBM had cut that error rate to 15.2%. Microsoft are claiming that they have achieved a world record rate of 6.3% under an industry-standard evaluation. In spite of the great achievements in recent years (most tech giants are now able to proudly state an error rate of under 10%) human-level accuracy, which IBM estimates to be at about 4% percent, still significantly wins out.

For those slightly older consumers who struggled through the multiple-choice based cinema telephony booking systems during the many many years the technology was developing, it is unsurprising that speech recognition based products seem an unwise investment.

As part of the generation more hesitant to adopt, I can draw on my own needs and experiences to sympathise with those that are not willing to do so. Aside from historical reliability issues, there are two much more human factors that could play into the adoption challenges:

  1. The very human need to be listened and responded to
  2. How the sound of a human voice can evoke physical and emotional responses in other humans

The need to be listened to

It can be incredibly frustrating for any person when they either cannot be heard, or are not being listened to or understood. This is why freedom of speech, debate and democracy are so widely considered fundamentals in the successful existence of a human society.

To pay for software and/or hardware from a provider that sells it on the basis of its ability to interact with you but then does not consistently do so is enough to test the patience of the most reasonable consumer. In order to increase brand advocacy and consumer adoption, it is therefore important for providers to communicate that these systems are learning machines — the more you use them, the more accurate they become. Expectations must be set in order to build consumer trust and confidence in speech recognition technology.

How the sound of a human voice can evoke responses

This article published by American Scientist explores some of the ways in which listeners are not only affected by the words we say, but also how we say them.

Inflection is of course a factor, however this article looks more closely on the impact of ‘pitch’ and more specifically, how it influences our selection of societal leaders — those within whom we place the most trust. Whilst the findings were detailed and varied, the overarching suggestion is that lower voices generally create perceptions of strength and competence and electoral candidates with lower voices are significantly more likely to win elections.

Featuring the actual sound of the voices that Alexa, Siri, Cortana and Google Home use is a trick that has not been missed by the big players, and there certainly don’t seem to be any voice assistants responding in unwittingly high pitches.

Also working in the field of data-over-sound, and thus providing what we believe to be a sympathetic and natural means of communication for IoT, Chirp understand how important gaining consumer trust is to increasing the use of these game changing technologies. We pride ourselves on our reliability in the most challenging of acoustic environments and are incredibly excited to see speech-recognition technologies continue to become even more reliable — hopefully one day reaching that of human-level accuracy.

To learn more about Chirp’s data-over-sound solutions, please visit us at chirp.io or get in touch at contact@chirp.io

--

--