Bringing Clarity to Voice Technology

Voice assistants: what’s the deal with them anyway?

Aurelia Harjanto
55 Minutes
7 min readFeb 15, 2021

--

I felt like Alice in Wonderland, falling down a rabbit hole of voice technology — a familiar, yet extensive subject I was drawn to learning more about.

In the attempt to appease my hunger for social interactions (and yes, learning of course!), I attended countless webinars during Singapore’s partial lockdown. Not sure how much they helped me in feeling sociable, but indeed I learned a lot from those events. I remember one particular webinar about conversational AI, which sparked my interest and led me down an exciting rabbit hole of research surrounding conversational UX and artificial intelligence.

Around the same time, I discovered that amongst other things, our team and company had a particular interest in Voice User Experiences (VUX), and also that growth and development was a key value I shared with 55 Minutes. This inspired me not only to continue down that rabbit hole to learn more about VUX, but also to start reflecting and turning my learnings into a more organized journey that I could then share with others.

So here we are! Brace yourselves and I hope you enjoy this introductory journey to VUX, exploring buzzwords and discussing virtual assistants (also known as voice assistants).

All these buzzwords — what do they really mean?

It seems fitting to start by thinking back to the webinar that inspired this article, which actually turned out to be a half hour introduction to a virtual advisor called Crystal. I had never heard of a “virtual advisor” back then, but I remember being totally intrigued and engaged by the speaker, despite the webinar happening at 11pm in my time zone.

As I went to find out more, I came across mentions of other familiar and similar-sounding buzzwords like “voice UI” and “conversational UI”. Perhaps it was just my non-caffeinated brain struggling at night, but meanings of all those words and acronyms started to blur. I found myself wondering, “What’s the difference between VUX, VUI, CUI, and CAI?”

It’s so easy to get lost in all the jargon when talking about AI and UX, so I set out to learn more about what these words mean, how they overlap and differ, as well as to map out how they relate to each other.

I made this diagram as a little map, to refer back to if we ever feel lost in the VUX rabbit hole
A simplified list of definitions to serve as an additional guide to the above diagram.

Okay, by now you might be thinking, “why should I care about all this?” or perhaps “How does any of this affect me?”

Well, all the aforementioned processes and technologies can — and probably will — impact all of us. It can affect us personally in our daily lives or even professionally. “The Future is Voice Activated,” a research project that investigates voice adoption in six different markets, found that “62% in Asia Pacific used voice activated technology in the last six months, with India (82%) and China (77%) emerging as leaders in voice adoption.” So, chances are if you’re not using voice-related technology, someone you know is most definitely using it!

As Voice UX is a relatively new field, there are so many opportunities to innovate. And since there is so much to talk about in VUX, let’s start small by looking at a concrete example of how VUX can benefit us in our day-to-day through virtual assistants.

“Hi virtual assistant, tell me about you”

When you hear the term ‘virtual assistant’, do you usually think of a real person? Or do you think of those AI-powered agents like Siri and Alexa? I definitely think of the latter, but a virtual assistant is actually defined as a “self-employed worker who specializes in offering administrative services from a remote location”. Yes, a real person whose job is to help you do things like schedule appointments and manage emails is by definition a virtual assistant.

It’s strange to think that these AI bots have taken over the meaning of a term associated with a person’s job. This led me to ponder whether some time in the future, I might think of a robot doctor when someone says “doctor”. But let’s get back to virtual assistants, which as I mentioned earlier are (confusingly) also known as voice assistants. For consistency and to differentiate from the human assistants, from here onwards I will refer to the AI-powered virtual or voice assistants as “VAs”.

VAs are software-based agents that understand natural language, and can perform tasks for the user based on their spoken commands. These tasks include making phone calls, typing text messages, and finding recommendations on the best sushi restaurants around you.

As the term ‘VAs’ are sometimes confused with “robots,” “AI bots,” or “chatbots,” I’d like to make the distinction between them. “Robots” is an umbrella term for machines programmed by a computer, that is able to carry out specific functions. “AI bots” refers to robots powered by AI, which includes chatbots and VAs. Finally, chatbots can be thought of as a type of virtual assistant, but not all VAs are chatbots. One main difference is that chatbots commonly communicate through text, whereas VAs are usually voice-activated.

These are some other ways in which VAs and chatbots differ:

  • VAs are designed for personal or individual use, while chatbots tend to be used by organizations as first-line customer service agents
  • VAs have the ability to remember the context of your interactions even if there are breaks in between your conversations, while chatbots don’t
  • Chatbots tend to lack any understanding of human emotions, while VAs can interact in a more human-like manner (by using advanced NLP to analyze nuances in what we say to them)
Chatbots are also incredibly helpful, but at the moment they would more accurately fall under Conversational UX, not Voice UX. Let’s not get them mixed up with voice assistants!

“So where do I find these voice assistants? And who uses them?”

According to a study by Juniper Research, there were about 3.25 billion VAs already in use in 2019. They also found that the fastest growing VA devices over the next five years will be smart TVs, smart speakers, and wearables, with Amazon’s Alexa leading the pack. Additionally, market research from strategy consultants OC&C predicted that voice commerce — or shopping using voice devices — will grow 20x from $2 billion in 2018, to over $40 billion by 2022!

Now, try to look around you and see how many gadgets equipped with VAs you can spot! VAs are still most commonly used in smartphones, with Google Assistant and Siri at the lead. However, other brands and companies have also followed suit and developed their own VAs for their newer smartphone models, such as Samsung’s Bixby, Huawei’s Celia, or XiaoMi’s Xiao Ai.

In my case, I found these two VAs: Siri on my MacBook and Google Assistant on my Google Home. I was quite surprised to find that my phone wasn’t equipped with one. VAs in smartphones are ideal as they can be used on the go, which means that chances of interactions with the user is maximized. This is important because through continued use, your VA will become more “intelligent,” and better able to understand you (thanks, machine learning!).

“Will I find them helpful?”

VAs are particularly useful when other traditional input options are not available. As Google’s Beth Tsai (Google Assistant’s Policy Lead) said, “VAs reduce barriers to interacting with technology and make online content accessible and available to a wide audience.” For example, VAs can be especially helpful for people with visual or motor impairments, to perform functions like texting that may otherwise be very difficult.

VAs also come in handy for others, when the usual input options are inconvenient or unsafe. For example, you could order a coffee during your morning run, or ask to set a reminder while driving, both without the need to pause what you’re doing. Personally, I love using Google Assistant to set multiple alarms when I’m baking up a storm at home!

Hey Google, set an alarm for 20 minutes and 45 minutes” — no time wasted washing my hands every time I need to set a new timer. Photo by Theme Photos on Unsplash

As we talk about the many potential benefits of VAs, it’s also worthwhile to address that they may not always be the most user-friendly at this current stage. It’s true that using a VA requires some patience in going through the initial learning curve. There will be times when they misunderstand you, or even out right say that they can’t understand your command, which could be frustrating.

However, I’ll end our journey with good news! The technologies that power VAs are continuously advancing, so it’s good to keep ourselves up to date. An example is how Natural Language Processing is evolving into Natural Language Understanding (NLU), which goes beyond understanding the structure of a language. Though still in its early stages, NLU is designed to understand the intended meaning and nuance of what we say (or write) in different contexts. Sounds promising, right? I hope you’re excited in joining me to explore your VAs more!

— For those wanting to learn more about or dig deeper into voice assistants and voice UX… stay tuned for 55 Minutes’ next article about the cultural aspects of designing voice assistants!

Aurelia is a UX Researcher at 55 Minutes. She studied Medicine and Psychology in London, but her creative side led her to search for a different path! Now, she is in Singapore in a role that utilizes both her critical and lateral thinking skills. Outside of work, you can find her baking, cooking, or hanging out at a cafe, dreaming of future travels.

--

--

Aurelia Harjanto
55 Minutes

UX Researcher & Designer | Experimental Baker & Specialty Coffee Enthusiast