At Elastique we’re always looking for new and innovative ways to reach users and create a unique experience. Interaction through a voice user interface (UI) is not new, but the current state has never been more promising, and the possibilities keep on growing. Today we’re looking into the how’s, what’s and why’s of Voice UI (in our opinion).
- Voice UI is the new kid on the block, but it isn’t actually that new
- The accessibility of the technology allows voice-driven systems to be available to the public
- Voice UI goes hand in hand with the increasing popularity of podcasts and audiobooks
- Netflix, Amazon, and YouTube are experimenting with new ways to tell stories by experimenting with interactivity
- Designing for voice means establishing a new relationship with your user and understanding their context
In the industry, there’s a lot of buzz about Voice UI. It’s supposed to be the next step in human-computer interaction (I mean, the number’s don’t lie — according to Comscore, 50% of our internet searches will be done using voice by 2020), but — seriously — what is it? Is it a replacement of the smartphone and will we only interact using our voices from now on? Should we start every conversation we have with a product with “Hey X”? Or, is it just a gimmicky (and slightly fancy) way to access the current weather information? And more importantly — what could it mean for service X or brand Y?
The truth is, we don’t really know (yet). Voice UI and the way we interact with it is still in its infancy. For us, that means that there’s no right or wrong way to approach voice-based interactions because there’s no right or wrong yet. Not a specific rule set of how it “should” work, which makes it very, very, very exciting. (yes, 3 x very is fitting!). It slightly reminds us of the beginning of the internet as we know it. When it was the World Wild West, with room to experiment and no paved roads we needed to travel yet.
But is this really new?
One of the first occurrences of voiced based interaction, as is the case with a lot of new technology, was in science fiction. The famous line by HAL 9000 “I’m sorry Dave, I’m afraid I can’t do that” in 2001: A Space Odyssey back in 1968 already showed us what Voice UI and the artificial intelligence behind it could do. Of course, in A Space Odyssey Hal ends up disabling life support functions, and hopefully, that’s a feature that won’t be included in any smart system in the near future.
“Hey Elon, play a track from SoundCloud”
Years later Voice UI slowly became more realistic, as presented in the science fiction films. First widely in car kits, where we could ask our phones via a Bluetooth device to call a specific contact, and more recently in literally any device. Microwaves, fridges, cars and in some cases even light bulbs are voice-enabled nowadays. But today we’ll focus on the introduction of smart assistants like the Google Assistant, Amazon’s Alexa and Apple’s Siri.
A Siri is born
In 2011 Apple introduced Siri as a core mechanic of their then new iPhone 4S. By a single press of a button, users were suddenly able to ask Siri for anything. From the weather to IMDB ratings, the internet was suddenly accessible by just using your voice. Now, 11 iPhones later, Google and Amazon are competing with Apple for a share in the smart assistant market, each with their own features, tone of voice and ecosystems.
Smart assistants are mostly branded towards home automation: connecting devices and allowing them to communicate with each other. As the hilarious Google Home Alone again ad explains: just ask for anything, Google will finish up the rest. Users can ask for the weather, play YouTube video’s on their connected TV’s, set the temperature of their Nest or even get groceries using just the power of their voice.
As the Google ad also shows, interacting with a Voice UI is — surprise, surprise — using just our voices. There’s no way of displaying any visual feedback on a screen because there is no screen. We can literally only rely on the power of our voice to present our content. This goes hand in hand with the increasing popularity of podcasts and audiobooks. But what makes the podcast so popular? The BBC explains this in 3 factors. It’s accessible, a lot of media platforms offer the ability to listen to podcasts. There’s a sense of community and intimacy with the host, and lastly, it’s fragmented. I mean, there’s even a podcast about gel pens. What more can we say?
Audible, Amazon’s subscription-based audiobook service is taking their audiobook game even further, and launched their first choose-your-own-adventure style audio books, powered by Amazon’s own Alexa. As digital storytellers, you can imagine this trend of (interactive) audio storytelling really tickles our ears. The technology that could go with it is evolving rapidly, and the possibilities are endless. Text-to-speech services like the Google Text-to-Speech or IBM Watson are getting more and more realistic and even go as far as literally programming emotion into their voices. Creating podcasts is as easy as downloading an app, and more importantly, there is an audience who wants to listen.
YouTube and Netflix are jumping on the interactive storytelling train as well, with Netflix’s Bandersnatch, for which they created a whole new branching system and YouTube’s recent announcement of investing in a new Choose-your-own-adventure style series. Of course, choose-your-own-adventure style books have been around for decades but never really took off.
But let’s take a look at some of the interesting cases we’ve come across the last few years:
Waking up with the NOS
The Dutch Broadcast Foundation started experimenting with the Google Home right-away. Just saying “Hey Google, talk with the NOS” opens up the application, and from 7 in the morning an audio version of the wake service is available to the early users. This audio version is prerecorded — so no robot voices were heard in the process.
Everybody needs an assistant
For $1 a minute, Fin would take care of every chore you don’t want to do. This human/A.I. assistant took the best of both worlds, by combining artificial intelligence and machine learning with actual humans performing a chore, so it could learn from them. The cool thing about Fin was the human touch. You would actually have a conversation with a real human, even building a relationship, instead of “demanding” actions from a virtual one. Instead of waiting for the technology to catch up, Fin took a head start by launching this hybrid assistant.
When Westworld’s season finale hit it seemed like the doors to this magical western world would be closed for a very long time. Luckily for the hardcore fans, Westworld’s creators build their story-world on a new platform: The Amazon Alexa. Using just their voice users could navigate through this world, in a Get Lamp-type interaction, as if you were actually there as well, and go on their own quest for consciousness.
So, where do we start designing for voice? There are (at least) 3 key components that stand out, and can make or break your experience: Emotion, context, and information.
An enjoyable experience via emotion
Giving human-like traits to voice interaction allows you to create a relationship that lasts with your user. It’s no surprise that Google hired writers from Pixar to make their assistant seem more human. They wanted the Google assistant to be a part of a users daily life, and the best way to do that is to establish a relationship with the user.
As a human has a personality, with their own jokes, quirks and habits, a virtual assistant should have them too in order to relate to the user. Create voice users will want to talk to, instead of need.
And of course, keep the conversation going in a natural way. As with any blind date, we don’t want the interaction with a user to end up in some awkward silence. This means creating a context for a natural next step for the user and always allowing a way out.
Using voiced based interfaces we’ll be able to connect with our users in an entirely different context than we did before. Voice assistants are mostly used in the car, as we need to focus on the road. It’s also less embarrassing to say “Hey Siri” out loud in the car, since we already got used to this by using car kits for years. It’s important to take into account what the context of the user is at the moment of interaction.
Knowing this, we probably need to adjust the way we usually communicate with our users, or even our core service. Designing for Voice UI is not just delivering the exact same content using voice, just like publishing an ad for Instagram isn’t just rotating your television commercial to fit on the screen. It’s an entirely new experience.
As enthusiastic about voice as we are, we do know that there are some scenario’s where you really shouldn’t use a Voice UI. For example, requiring the user to shout out their password or social security numbers is probably not such a great idea. Or when you have to tell your user a lot of specific information they might want to write down. Because let’s face it, you probably already forgot the restaurant’s specials by the time the waitress was finished presenting them. It’s important to give users a choice. For example, when reading out the news, it’s probably better to first give a short summary of the article, and let the user decide if they want to continue, then just blasting out a 2000 word article and requiring them to shout “Stop, Google Stop!!”
Where do we go now?
Voice UI still has a long way to go, but the development is going fast. Just recently Google’s crazy realistic Google Duplex started their roll-out in the United States, expanding to other iOS and Android devices in the future. Apple’s new airPods can trigger Siri and Amazon’s Alexa now handles patient information 🤔.
And it won’t stop there. We foresee a future with a lot more audio based experiences, powered by techniques that bring us closer to those science fiction movies. Honestly, we couldn’t be more excited for these developments, as it gives us new ways and a new medium to tell stories that enhance the story-world of user. It will allow us to create almost seamless experiences in which the technology becomes invisible to the user, which is for us the ultimate goal. Because when the technique becomes invisible, the user will forget about it, and that really does blur the line between fiction and reality.
Did you like this article? And do you want to keep up to date on our next posts? Follow us on Linkedin and see directly when a new post is live!
Elastique is a dynamic digital boutique agency. A collective of tech wizards, concept guru’s and design rockstars dreaming up award-winning digital experiences. We work with leading brands and media companies to create innovative solutions that truly stand out. This is what Elastique stands for and we are always looking for people who want to stretch the paradigm of storytelling and interactivity. Get in touch at https://elastique.nl or email@example.com. We look forward to hear from you!