Will the Next Alexa Have a Camera?

Most ‘smart’ appliances on the market today are essentially the same appliances companies have been selling for decades, with a thin veneer of networked tech slapped on. While ‘smart’ appliances are increasingly connected to the internet, they’re still not very inter-connected. Each product comes bundled with a single-purpose mobile app: Phillips Hue lights, GE refrigerators, Wemo switches, the Nest products, August smart locks, Canary cameras, etc. People have been talking about smart homes for years, but our homes are still pretty dumb.

This is a common theme in many classes of device: you start with a product that has a few electronic functions added, and then those functions are delivered with chips, and perhaps they gain an interface and then a screen, and more and more functions (and probably multi-function buttons) — and then, somehow, you’ve built a little weird custom computer without actually meaning to, and all the little silos of features and functions become unmanageable, both at an interface level and also at a fundamental engineering level, and the whole thing gets replaced by a real computer with a real software platform. And this new computer is almost certainly made by a different company.

Benedict Evans above was talking about car interfaces, but his observation is equally applicable to homes. Instead of a collection of home appliances, each with their own closed-loop interfaces, we need a new software layer that sits above and stitches together the various devices in our homes.

Take, for example, the humble doorbell. Increasingly homeowners are replacing their doorbells or voice intercoms with video systems. These new systems allow you to see who’s at the front door before buzzing them in, thanks to a wall-mounted intercom screen. But why not check who’s at the door on whichever screen is presently most convenient? Odds are the most convenient screen is your phone, not the single-purpose screen in the hallway. And in many cases, you shouldn’t need to look at the camera feed at all — what if the platform could notify you that “the USPS delivery person is here” by detecting their blue uniform? Or what if the platform could recognize you and your family members, and automatically let you in? Face recognition combined with other signals (phone geolocation, etc.) would likely be every bit as secure as a traditional metal key, and would require less fumbling in your pockets (so long as you don’t have a power outage, anyway).

The point is that more and more these are software challenges, not hardware challenges. The model is inverted — the various appliances and sensors report their inputs and outputs to the platform via APIs, and the ‘intelligence’ will increasingly live one level up, at the platform layer.

The race to build that platform layer is on, and Amazon’s Alexa is leading the pack (with Apple’s HomeKit and Google Assistant close behind). It turns out that voice is a great interface for homes. But as powerful as voice input is, there’s one other key input mode that’s still missing: video input. Just as the camera is a crucial component of iOS and Android, a camera will likely become a crucial component of whichever software platform dominates the home.

A security camera from D-Link — a relatively standard security camera, but the form factor would be a natural extension for Alexa

A camera would enable a whole new class of interactions. Here are a few —

Security Cams / Baby Monitors / Doorbell Intercoms

Let’s start with the obvious. Security cameras and baby monitors are big consumer electronics categories. Just as smartphone cameras have pretty much eliminated the market for standalone point-and-shoot cameras, adding a camera to Alexa may very well put a sizable dent in the existing security cam / baby monitor markets. Why buy a single-purpose device when you can buy an Amazon Echo that’s a security camera, plus 100 other things?

Video Chat

“Call grandma” — the most wholesome use case. You can video chat from your phone, and pass your phone around (Apple’s FaceTime, etc.), but having a fixed-position camera is often more convenient. Nearly every office conference room has a camera equipped screen; why not in your home as well?

Eye Contact + Wake Word

The Echo is triggered by the wake word “Alexa.” While this works just fine most of the time, it’s occasionally a very unnatural interaction (“Alexa, Alexa, Alexa…”). In normal (human) conversation, it’s usually implicit who we’re addressing based on eye contact. A camera would allow Alexa to use eye-contact / ‘gaze estimation’ as an alternative wake trigger. You’d look at the device, it would light up indicating it was listening, and you could ask your question directly, skipping the “Alexa” prompt.

Face Recognition

When Alexa responds to requests, it does so without any knowledge of who is making the request. It’d be handy if Alexa could distinguish between different members of a household. This would enable purchase permissioning; you might want to prevent your kids from ordering a Dominos pizza, for example. Or imagine if Alexa could play your Spotify Discover Weekly playlist. Alexa could get partway there with voice identification, but a camera would increase the fidelity.


A camera would enable Alexa to track when people are home and when they’re away (at work or on vacation). This information could have a big impact on your electricity / gas bill, amongst other things. Alexa could shut the lights off when you leave a room, or turn down the heat and AC when you’re away (the Nest Learning Thermostat was one of the first products to leverage presence info in this way).


Imagine you could control volume with a hand gesture, instead of announcing to the room “Alexa: volume 4.” Same for ‘pause,’ ‘next track,’ etc. The Queen of England signals to her staff with the position of her handbag (i.e. placing her handbag on the table means ‘rescue me from this conversation’). Maybe your Alexa commands could be similarly discreet and queenly.

Are people going to want this stuff? In public we’re pretty accustomed to being recorded by security cameras, but do we want a camera in our homes? Will we really be comfortable walking around half-naked in our living rooms, streaming directly to Amazon’s servers?

It’s possible these concerns will prevent smart home products with cameras from shipping. But I doubt it. If history is any guide, most people are willing to trade convenience for privacy. Google, after all, has a copy of nearly all of my written communication on their servers (and I’m mostly fine with it).

Perhaps this time will be different. If not, here’s to greater convenience and easier living in 2017 and beyond, under the watchful eyes of Alexa and friends.