Ok, Google. What do we do now?
UX reflexions after using Google Home for a week.
A few weeks ago, Google announced a wide range of new devices and softwares, all with the commonality of being embedded with its Google Assistant. Pillar product amongst the Google range is the Home — that connected speaker, Echo competitor, that wants to be the mastermind of your smart home.
For a while at Designit Tokyo we’ve been curious about conversation-like interactions (our website talks to you) and the possibilities that voice command can offer. We didn’t really need a Google Home at the office, but for the sake of experimentation, and because it’s always fun to play with a shiny new toy, our Design Director was kind enough to order one.
After Ok-Google-ing for more than a week, there are a few things I wanted to share regarding the product itself and its core interaction form: voice. The current Google Home and all its competitors out there are the first born of a new generation of devices that you speak to and can talk back. What can we learn from the first iteration of voice-commanded home assistants, and what are the challenges ahead for UX designer to craft these experiences?
Product designers have known it for a while, on-boarding is a key moment to a successful user experience. How users feel while interacting with your product for the first time doesn’t only define how well they understand it, but also shapes their expectations and emotional attachment to it.
After unboxing the beautiful product, plugging it to the power, admiring the colorful circles spinning around while it get its stuff together, we couldn’t wait to finally say the magic words. “Ok, Google”. But no. The setup process isn’t done through voice, but through the dedicated smartphone app.
We also had quite a hard time configuring multi-accounts setting, which required a phone call to Google’s customer support.
Long story short, the first interactions with our Google Home was somewhat disappointing. Being forced to rely on a smartphone app and a call center to get the device started doesn’t fulfill the promise of natural interaction and Google-intelligence at voice-reach. A seamless product should have a seamless on-boarding.
Then, using the device is supposed to be intuitive — just speak to it, right? Yet, even if speaking is a natural behavior for people, commanding a machine through voice isn’t quite that intuitive. Can I ask what I want? Are there any specific keywords I should be using? Once again, users have to rely on the app to figure this out.
Teaching people how to use their smart devices will be critical to make sure they have a smooth experience with it. How can this be achieved through voice interaction? If we have the luxury to rely on tutorial screens in GUIs, teaching people how to interact with the devices and what are its capabilities will have to be done in a much more intuitive way for voice interfaces.
Commands and feedbacks
Once the machine is all setup and has been trained to recognized each user’s voice, you can start asking a whole bunch of questions. Preceding every request, users have to hail the smart assistant by pronouncing the now famous “Ok, Google”. Then comes a little moment of awkwardness when you’re trying to make sure you have the machine’s full attention — what we would call user-feedback or simply “how do I know that this thing is even listening”.
The device is signifying that it’s actively listening when the colorful circles on its top surface are spinning around. A signifier that is unfortunately both inexplicit and un-natural. In most cases, when we’re hailing our Google Home, the top surface is not in plain sight, making the feedback simply invisible. Second, because voice is the primary interaction form, it would seem natural that the feedback is given orally. The connection between a visual signifier and the confirmation that the conversational channel is open, is not obvious enough.
In some cases, it’s actually quite the opposite. The feedback is way too obvious, even obstructive. The main use case we’ve found for our Google Home is to play music out loud in the office. It’s a pretty good speaker after all… When the music is on and you want to interact with the assistant, say, to raise the volume, the speaker will have to stop the music in order to take your request.
Google Home playing “In the Air” by Phil Collins. Approaching 3min 30 sec.
Me: “Oh, it’s my favorite part. Ok, Google. Play louder.”
Google Home stopping the music to take the request. Spins its little circles around for a second. Plays music again, barely louder. Excitement over. Moment ruined. Back to work.
Digital designers have learnt the importance of micro-interactions in shaping enjoyable and usable experiences. That tiny feedback animation, hovers effect, color change, plays a huge role in guiding users seamlessly through the product. This stays true with voice. and there’s a whole lot of work that needs to be done to provide a friendly and intuitive experience.
Just like Google’s Material Design or Apple’s iOS guidelines are intending to standardize GUI interactions, we should work on standards for the voice interface. How do you know that a device is listening? How do you input a command? (I’m seriously doubting the usability of the “Ok Google” command).
How do you know that it properly understood your request? (I asked for “punchy” music, not “punky”) and how do you recover from mistakes? This is especially important given that voice is a very approximative interaction model, with a lot of room for interpretation. How much is “a bit louder”?
Managing users’ expectations
Another issue when voice becomes the primary mode of interaction is that it creates wrong expectation about what the device can do. It’s not because you can speak to your device that it’s actually smart. Or that it can have a conversation. Or that it’s even meant to do so. If we want to avoid deceiving our user’s, it will be crucial to manage their expectations and to create user experiences that are not misleading regarding the possible capabilities of the device.
The way we use the Home in the office is somewhat limited since we mainly use it as simple speaker. To be honest, we still haven’t found any good use case for it. Or maybe we didn’t try hard enough.
Anyhow, it seems that everything you want to ask your Home could be done more easily by taking out your phone. Need to find a restaurant near by? It’s easier to visualize it in your map rather than having Google Home read you the address out loud. Ordering a Uber too could seem easier through voice, but core features of the service such as selecting your car type of reviewing your driver’s profile aren’t easily accessible.
The probable strength of a smart home assistant will be when connecting with other devices, becoming the centralized point of control to everything in the house. Then, the number of use cases, and the use of voice to control pretty much anything around you will be attractive — if the interaction between you and the machine is smooth enough.
Despite this rather negative review, we should give the good people at Google a lot of credit for the admirable work they have been doing to make what has always been science fiction fantasies a reality available for a hundred bucks.
Pioneering new technologies with radically new interaction models is always a series of trial and errors and experimentations. The technology is getting there, quickly. But when it comes to how people can interact with this new technology, it still feels like a long way to go. The design of vocal interactions, and overall the design of how we can interact with machines that seem to have some form of humanity is creating exciting mew challenges for designers.