Why Voice isn’t Selling…Yet

Originally published in VentureBeat.

Personal digital assistants have taken the world by storm. We have Siri and Google Assistant on our phones, and many of us have Amazon Echo and Google Home, as well. Voice user interfaces, which have been available to consumers since the 1990s, have finally gone mainstream as a way to play music, set timers, and control smart homes.

But voice hasn’t really taken off for ecommerce, despite Amazon’s leadership in the smart speaker market. When asked about this a few months ago, Amazon CEO Jeff Bezos said “Voice interface is only going to take you so far on shopping. It’s good for reordering consumables, where you don’t have to make a lot of choices, but most online shopping is going to be facilitated by having a display.”

Amazon has since released the Echo Show, which does have a display. But even on the Echo Show, the shopping experience is marginal compared to shopping on laptops and phones. The main shopping use case on Echo Show is still reordering, rather than finding something new and buying it.

Nevertheless, given our strong preference for voice as a natural interface, it seems inevitable that we’ll use voice interfaces for shopping. But what will it take to get us there?

Bezos is right that most online shopping is facilitated by a display. But requiring a display defeats many of the advantages of a voice interface — most significantly, not needing to be close enough to a screen to see or touch it. What will it take to get people to move beyond reordering and shop for everything using voice?

What voice does best.

Voice interfaces work well for products that people describe by brand name, e.g, an iPhone 8. There may still be a few product variations to choose from (e.g., color), but it’s not hard for a voice interface to address these through a simple dialog. Shoppers who use voice interfaces to play music, for example, are already familiar with this kind of interaction.

Beyond brand name merchandise, voice search can work for products shoppers aren’t very particular about, like milk. In fact, some retailers, like Trader Joe’s, have already limited their catalogs so many shopping decisions are already made.

Thus, retailers can apply sensible defaults to category queries, such as assuming the searcher wants the best-selling product in a category. Retailers can also use personalization to limit choices, especially in domains like apparel, using information like the shopper’s gender and size.

Voice recognition is only the first step.

Voice recognition isn’t perfect, though it has gotten quite good. But while voice may change how we shop, it doesn’t change the goal of shopping or the essential challenge faced by retailers: recognizing the shopper’s intent and connecting them to products that satisfy that intent. Pairing voice recognition with a bad search engine is simply putting lipstick on a pig.

Search engines still often fail to understand the shopper’s intent, but at least a shopper can scroll down the page to find what they want or refine the search query. When it comes to query understanding, voice is a far less forgiving medium. Shoppers can try repeating or rephrasing their search queries, but they quickly become frustrated and give up.

It’s essential that a voice-based shopping interface understands the shopper’s search query. It’s easy to see where this goes wrong today. For example, a search on the Echo Show for “girls’ shoes without laces” returns no-tie shoelaces as a top result. While the result is related to the search query, it’s not what the shopper asked for. Query understanding is essential in order to make shopping by voice successful.

Navigating choices requires a conversation.

Reproducing an in-store experience for shoppers using a voice interface requires providing a way for them to navigate among choices. Voice interfaces need to support context so that “show me men’s shoes” followed by “show me black ones” refines the original search to men’s black shoes. The same goes for refining by size, style, and other attributes.

A voice interface should offer sensible refinement options when the shopper’s original search query is too broad. That might mean suggesting different categories (e.g., boots, loafers) or prompting the shopper to specify an attribute like price (e.g., “How much were you planning to spend?”) that significantly narrows down the options.

A conversational interface allows shoppers to clarify and refine their intent, much like a conversation with a helpful sales associate.

We’re just entering an age of voice interfaces for mainstream consumer applications. It will take some time to work out the kinks, improve query understanding, and figure out the design of conversational interfaces. But we’ll get there. Someday, our grandchildren will even marvel that anyone ever used a keyboard for shopping. Of course, they’ll only find keyboards in a museum.