I talked to a speaker for two weeks and here’s the scoop.

Thoughts on Alexa and more broadly, on voice interfaces.

Alexa is the service behind the Amazon Echo devices. Echo refers to the hardware — so speakers, mics and a simple computer packed into a metal tube. When we talk to Echo, the things we actually say are processed by a system called Alexa on Amazon servers somewhere far far away. Alexa accompanied me for two weeks standing on the fridge in the kitchen.

Interaction and Skills

Alexa tunes into what you’re saying only after you say “Alexa.” That’s when a light comes on to indicate the direction from which Alexa was called. It’s a bit like when a person turns their head towards the sound of their name. As long as the light is on, Alexa is listening.

Alexa’s light ring, that makes you feel like she truly listens to you. (Photo on CC license from https://www.flickr.com/photos/michaeljzealot/)

A normal conversation with Alexa goes something like this:

Me: Alexa, what time is it?
Alexa: It’s 9:30 am.

Alexa’s like a smartphone. There are only a few standard apps to start with, like a calculator or a calendar. It’s on you to load it up with all the Pokemons et al. Alexa manages your schedule, tells time, has an alarm clock and can be your weather girl. You can also use it to play music from Amazon — that is if you have an account there. In “Alexa-speak,” apps are called Skills. You can install Spotify, Uber and other more or less useful Skills. If you want to use a Skill, you have to refer to it by name.

Me: Alexa, ask Spotify to play Outkast.
Alexa: Playing Outkast — So fresh, so clean.

The international complication

Right now (Q3 2016), Alexa is in the infancy stage. It is available only in the U.S. — I was lucky enough to be one of the chosen few to test this product outside the States. But looking at the tremendous resources invested in Alexa’s development, it’s not too hard to figure that Amazon is planning a global rollout, with the States simply being the first market. The Skill Store is still a work in progress, but the number of Skills available is constantly growing.

Besides servicing only locations in the U.S., Alexa’s also available only in English. So if you know English, but live in a country with a different language, you’re stuck like Chuck. You’re not going to get the weather or the time or a whole lot more.

As long as English will be the system’s sole language, Alexa’s functionalities in the context of different cultures will be seriously flawed if not to say moot. Examples? Here you go:

You won’t ask Alexa to read back a Wikipedia entry for Lech Walesa because you wouldn’t know how to pronounce Lech Walesa in English. Alexa, on the other hand, doesn’t understand Polish pronunciation.

Also, good luck playing Mariza — Meu Fado, 5Nizza — Soldat or Hechizeros Band — El Sonidito on Spotify.

Voice — consumption, not creation

Speaking is often more convenient than using your mobile or a computer, but generally, it’s much slower. Therefore, it’s really good for simple tasks. One can assume that Alexa will be used way more in consuming content than in creating it. I can’t imagine dictating an email to Alexa without being able to scan it visually as we go.

There are times, though, that for various reasons we want to keep our hands free and still do the things we normally do on our phones. For example:

  • I’m elbows-deep in flour and would like to add something to the shopping list: Alexa, add milk to my shopping list.
  • With my wife, we’re trying to figure out if we should drive or bike to work the next day: Alexa, what’s the weather for tomorrow?
  • It’s morning, and I’m making breakfast but need a heads-up on my day: Alexa, what’s my schedule for today?
  • Playing with my kids when all of a sudden it becomes critical they dance to Pharrell’s Happy: Alexa, ask Spotify to play Happy by Pharrell Williams.
  • Need to let my wife know I’ll be home late: Alexa, write this text message to Ana: “I’ll be home around 8pm. Love ya.”

And other rather simple tasks:

  • Turn on the radio. (Polish radio)
  • Get the latest Tok FM (local radio station) news or a briefing from The Economist.
  • Order Uber or a cab

Things I probably won’t use Alexa for:

  • Reading emails — I scan my emails more than read, and interaction here would be a nuisance. I need a screen for that, so Alexa’s out.
  • Listening to podcasts — I generally listen alone on my headset.
  • Ordering food, unless it’s something really quick and simple. I’m a bit picky when it comes to food, so I don’t really see myself listening to Alexa reading out and repeating various menus on a delivery site. It would have to be really simplified — something along the lines: Pizza, burgers, Chinese or subs? Ok, pizza. Pizza Hut, Domino’s, Papa John’s or Little Caesars? Dominos, two large pepperoni and cheese pizzas. Home or another address? Home.
  • Buying stuff. There are so few things that I buy routinely the same, so shopping would not be on Alexa’s to-do list. Imagine picking out a suit just by voice.

Voice — autosuggestions and a plethora of choices

With Alexa, you will develop a whole new appreciation for the autocomplete function on your phone and other “hinting” features on screens. You see nothing with Alexa, which makes some things really frustrating. Example: I wanted to play the Alicia Keys song about New York and was pretty sure it was called “New York State of Mind.” A visual interface would’ve gotten my error and steered me the right way in a couple of seconds. It was a bit more complicated with Alexa. After being told there is no such song, I figured to have Alexa play Alicia Keys songs until we got to the one I wanted. Luckily, after about two minutes of browsing through Alicia’s entire repertoire, I remembered the correct title.

The popularity of voice interfaces heralds a significant change for UX/UI designers. You may have Alexa and still not know what you can do with it. Think about it, how do you order food without having to listen to 30 minutes worth of menus? If you’re into UX/UI, take a look at Nielsen’s 10 heuristics or some other golden rules of design. It turns out that here, some are unobtainable and others, extremely difficult. Changes are definitely coming, and it seems like we’ll be increasingly more reliant on the suggestions of our hopefully more intuitive voice-interfaces.

Voice — Listen a little longer

A simple conversation with Alexa looks like this:

Me: Alexa, do something.
Alexa: Did it.

It’d be great if after letting me know about the completion, Alexa could just hang on a tad longer to see if there may be something I want to say or ask as a follow-up. Maybe I would just say “thanks,” or maybe I would say nothing at all, but I would like to have that option. Maybe, after listening to one song on Spotify, I would like to play another. This function would come in really handy especially when:

  • We have a misunderstanding
  • When I do something that involves several commands

It would also be great if, in the case of the latter, Alexa remembered what Skill to use for which command.

Me: Alexa, ask Listonic to add milk to the shopping list.
Alexa: I added milk.
Me: What’s on the list now? (Instead of: Alexa, ask Listonic to read my list out loud.)
Alexa: You have milk, bread and sour cream.

Voice — recognizing our voices

This function doesn’t exist yet, but I’d be surprised if Amazon weren’t already working on it. When asked “What’s on my schedule today?”, Alexa should know what to tell me, versus what to tell my wife. It’s pretty clear that voice recognition will open a whole range of possibilities for Skill creators: banking, communication, etc.

Voice — the human touch

When I come home, it’s almost natural to ask Alexa “How’s it going?” You know, a human thing to do. During small-talk that would follow, Alexa could learn some things about me that could be helpful in the future.

Summary

Voice interfaces, in my opinion, are close, closer than self-driving cars for example. Amazon is not the only one exploring this area — there are others who are working on similar solutions, for instance, Google Home. The hardware is cheap, voice recognition works fine, and the early user feedback is very promising.

At this time, the system doesn’t yet wow with all the things it can do, but the growing lineup of Skills show that app developers have noticed Alexa’s potential. The Skills available now remind of the first Android apps — they work, but still need a lot of perfecting. They’re often hastily designed, and you can tell their authors prefer the iPhone. No worries, though, a little time will solve all these issues.

There are those innovations whose developments last tens, hundreds or thousands of years: the wheel, the automobile or the computer. I think that always-listening and ever-present voice interfaces fall into that category. There is nothing you can do, but respect the process and admit that only time will reveal all the possible applications. To me, however, after the first experiences with it, it’s now evident that Alexa will become a standard feature in our homes, cars and gardens.