Are Voice Assistants becoming Human-like?

Spoiler alert, they aren’t.

Sahiba Johar
The Design Perspective
3 min readApr 7, 2017

--

Many of you may have heard of the exceedingly popular voice assistant devices such as the Amazon Echo and the Google Home. These novelty items seem life changing when you realize you can use your voice to get your questions answered instead of having to type. Mind blowing, right? Well, kind of. I’ve read many articles that claim both of these devices seem like you’re talking to a human and therefore are natural to talk. My experience leads me to think otherwise.

The Home and Echo are great until you have to learn a new way of communicating to get the device to understand you. For example, I can’t just say “OK Google, play Orange is the New Black”; I have to say “Ok Google, play Orange is the New Black on Netflix on Livingroom Chromecast.” If you’re like me, then you probably think that mouthful of a sentence is nothing close to how you actually communicate in daily life. A user has to articulate their sentence in a way that the device will understand it. That means, the user needs to double, maybe even triple think the sentence structure before asking either of the devices a question or giving them a command. I don’t put in that much thought when I’m asking someone to play a show on the TV so why do I need to think so hard when asking a voice assistant to do the same thing? In my everyday communication, there are many defaults that are understood without me having to say them explicitly every time; maybe these need to be built into these devices for a more human-like or natural, everyday conversation.

Let’s take a few steps back and just think about the fact that users are now going to be talking to a somewhat cylinder-shaped device. This device does not have a face or a screen. It’s like talking to one of those suspicious characters in the movies that if covering up their face with a hoodie and all you hear is this voice. In this case, the voice is coming out of a speaker. Not having a “face” makes the device seem even farther from actually being human-like.

Not having a screen poses another problem. There is no easy way for a user to get quick confirmation​ that what they said is what the device heard. When using the Google assistant on the Google Pixel, a speech bubble shows what you’re saying to give you instant feedback. Another speech bubble appears with the assistant’s response. Because the Echo and Home don’t have a screen, their associated apps have a section where a user can see what they said or better yet, what the device heard. However, it feels like a burden to now have to whip out your phone, open the app and navigate to the section just to see. A user now needs two devices to get the job done.

To be fair, there is one aspect of these devices that does indeed seem human. These devices do get confused just like you and me. Once, I asked Alexa to play Coldplay. “She” said she couldn’t find any Coldplay music in my library but the second time I gave her the same command, she played Coldplay right away. I didn’t change the structure of my sentence. All I did was repeat myself. One of my co-workers jokes saying that she’s just like a teenager. She listens and responds to you when she feels like it. I guess if that’s the type of human feeling you’re going for then they’ve definitely hit the nail on the head with this one! Jokes aside, someone once told me that human error is far greater than machine error. These devices seem to be the exception to that rule. They get confused, they don’t understand commands, and sometimes they just don’t want to listen.

This brings me back to my main argument. When people solicit these AI-based devices as being human, I challenge what their definition of a human actually is.

--

--