Amazon Echo: My review as a Mozillian

(Note: This is my personal review. I just happen to be a Mozillian.)

Amazon Echo was launched during the last year, but is getting a lot of buzz lately thanks to their black Friday sales record and more recently their Super Bowl ad blitz. I have been using/reviewing this product for some time and thought of writing a review through the eyes of a Mozillian. Mozilla firmly stands for user choice and privacy. Thus, this is also a hypothetical thought experiment on how Mozilla would have gone about building a product like this. I would like to hear what other Mozillians think. So here we go.

Amazon Echo is an interesting product for not only what it does today, but also more importantly for what it possibly could do tomorrow. There is a good chance that the next generation will experience the web via connected products like these rather than sitting at a desktop or pulling out the smartphone.

Making Tech less geeky for regular users

The Echo is marketed as a connected music player if you go by the first couple bullet points of its description on Amazon. It looks and feels like a speaker. It does not look geeky at all, has no buttons except 2 small ones on top. An average user will feel comfortable placing it centrally in the house (which is exactly where Amazon wants them to place it!). However, it is actually a virtual assistant at home — a coffee table gadget that you talk with. It gets you the answers (a la Siri), and in addition, it can get things done around the house via IOT protocols. It is compatible with WeMo, Philips Hue, SmartThings etc. In other words, it is all set to become a smart home hub. Amazon had made no big deal when this product was offered first (unlike what they did with their Fire Phones, which did not end well). Still, Echo has made a quiet entry into tons of houses around US while the tech press was busy speculating the next IoT moves of Apple (with TV/HomeKit) and Google (with Nest/Weave).

(Almost) always listening. So Privacy?

It has no camera (in this version. but probably it is coming in the next revision). So, it can’t see yet, but boy can it hear! It has 7 microphones so it can listen to a command from any corner of the house. However, it does not listen to you all the time (unlike Samsung TVs that caused a media uproar last year and triggered the viral 1984 meme), unless you use a wake-word. You can argue it does listen to the wake-word all the time, but we guess that wake-word recognition happens offline. That wake-word can be “Alexa”, “Amazon” or “Echo” and you can select that via their companion app. It lights up the “visual ring” on Echo while listening, so users can see it. But the notion of having something in your living room with not 1 or 2, but 7 always-perked-up ears is a scary one. Especially, when you happen to use that keyword in another conversation at home, it “perks up” its ears with that visual ring. If you are in another room, you will not even see the visual ring to be aware that it has “opened” its ears. Clearly, there is a dilemma between convenience vs. privacy here. Echo does not have a screen of any size, so Amazon is trying to push a behavior (and paradigm) change for users.

How do we use it? (to the extent allowed by Amazon)

Echo/Alexa does a pretty good job of recognizing our accents at my home, and interpret the intent. It works for the most part, however, occasionally it can’t understand some simple (in our view) questions. Interestingly that limitation (that it does not do full NLU) is not stated anywhere in the marketing materials perhaps in the hope that such technologies get better over time. We use Echo to play songs from Pandora or iHeartRadio. I ask “What’s the news today?” and it starts the NPR news highlights from TuneIn, added with a local weather report. San Diego weather does not change much :), but still it makes the news brief feel a bit personalized. I have set up a few room lights to control with Alexa and we use that a lot. It works 99% of the time (occasional “not responding” errors). The caveat here is that I need to program Philips Hue lights separately with its own app, name them, or create “scenes” and just use Echo to control them after the fact. This is a one-time exercise (but explains what people are trying to sell everything from the same company/standard). We also use Alexa for setting timers for cooking and kids’ practice tests etc. The intriguing fact is that Echo can listen correctly even when it is playing loud music on itself. (Nice job by Eng. with cancellation). You can add things to the shopping list and buy 1-click via, you guessed it — Amazon. Echo can get some other web services like booking rides from Uber, Pizza from Domino’s etc. However…

All of these service and partnerships are determined & curated by Amazon. They are gate-keeping (and negotiating the rev share agreements) here. (Silos again). What if the user wanted to know the prices of X in their local area stores? What if the user wanted to add something to their say, Flipkart shopping list? Or order Pizza from nearby Pizza Hut store? Tough luck. But again, this is designed by Amazon the way they saw fit.

Gatekeeper advantage

I have not looked into this yet, but Amazon could collect (or may be collecting) data on the competitive stats. E.g. Users using Spotify, Pandora to play certain tracks (popularity stats etc) can provide some insights to improving Amazon music store. Also, there are a lot of insights you can gain about the user based on their shopping lists, local weather, traffic, ambient conditions and recent questions. “So John just asked me about the weather (which I know is rainy), he has an umbrella in his shopping list, let’s offer him kids raincoat ads on desktop next time he logs in.” Also, Amazon can use and monetize the “power of defaults” nonetheless.

Newer “Skills” (add-ons, anyone?)

Echo did not launch with it’s own app store, but has taken a cue from the add-ons that their community builds (where have we heard this before?). Everyday, several “skills” are being added and users can look them up to use. However, there is no easy discovery (or surfacing) of the skills here. This part is all manual (I need to read up their emails on new skills). I talk to Echo everyday to test it out, so I am using it probably more than an average user but I am unsure how much of it a regular user discover this without a bit of tinkering or reading up. I feel like Echo should tell me about its new skills once in a while (interesting problem for UX). Perhaps answer questions like “Alexa, what new skills you learned recently?”. I can see Amazon would figure this out in future updates.

Kids (and “teachable moments”)

Being a novelty, this product is a big hit with our kids. Both my kids bombard it with song requests, check spellings, historical facts, get one liner jokes (jokes seem all kid friendly) etc. At times, they get their friends together and just ask away Alexa. (Alexa can also play games like Jeopardy, and there are lots of “Easter eggs” too). What they didn’t know was that the app records all the queries and keeps a log. The group of kids was surprised when I asked them a day later “which one of you had asked Echo these funny questions?”. Instantly, our conversation turned to how someone can track us on the Internet if we are not careful (with a plug thrown in for how my employer fights against it.) A teachable moment indeed! :)

Some gaps

There are a few more gaps with the Echo. E.g. it does not do multiple calendars yet. As a family setting up kids’ activities into 2 calendars, or adding to-dos I still have to rely on IFTTT, which is not a regular-user-friendly way. It does not have speaker (or talker) identification yet (which is a harder problem to solve than using imaging). There are other products like Silks Labs’ Sense that are relying on cameras for telling cats apart from humans, which is a great use case to build upon. But with speaker identification, lot more personalization use cases could be added.

In Summary

In general, I see that these kinds of products will find much wider acceptance some day soon. A lot of people think talking to machines feels a bit silly (yes, it does today), but paradigms can change fast. (We did not have touch devices 10 years ago and it did feel a bit funny to “pinch and zoom” screens for a while. But now my kids wonder why ATM machines are not touch-enabled everywhere.) Also, carrying your phone with you at home feels forced if there is something users can just “access” the technology from anywhere at home. However, as people will have something like this in their living room 24x7, a product that provides more control & privacy to them will be sought for. Or so I hope.

Show your support

Clapping shows how much you appreciated Sandip Kamat’s story.