Building a conversational assistant platform for voice-enabled shopping

Shankar Bhargava
Walmart Global Tech Blog
4 min readOct 21, 2019
https://corporate.walmart.com/newsroom/2019/04/02/want-walmart-to-help-you-grocery-shop-with-our-new-voice-capabilities-just-say-the-word

At Walmart Labs, we strive to help our customers save money and time and are always exploring cutting-edge solutions to achieve this goal. Voice assistants for shopping, and in general voice-enabled shopping experiences, have immense potential in reducing friction and saving time for our customers. Friction in shopping are annoyances which slows the customer down and causes them to drop-off from the shopping journey. This could include things like a slow loading website, complicated menu, etc. With more than 270 million customers shopping for groceries and other products at Walmart every week, the impact these technologies will have is huge. That is why Walmart introduced Walmart Voice Order in April.

A peek into our NLP efforts — Converse Platform:

Building a true voice assistant for shopping requires solving many problems in Natural Language Understanding (NLU), product knowledge base, search and personalization. The ideal shopping experience varies greatly across different product categories like groceries, electronics, health and beauty, etc., and the problems that need to be solved for each of these areas are also very different. The design considerations also vary across low-consideration purchases, such as reordering milk vs high-consideration purchases, such as buying a new flat-screen TV for your home. For reordering milk, the customer will likely want the shopping assistant to be able to pick the exact same brand and type without any prompts, but for a flat-screen TV, the customer may prefer to do more research on various specifications, ratings and price.

To address these problems, and to quickly scale voice and natural language interfaces for a broad range of retail-related use-cases, we started by building a conversational AI platform called “Converse.” To build out this platform we started with a few key goals:

1. Support for multi-turn dialogue

The platform we built provides the base NLU capabilities to truly understand the meaning of customer queries and to map them to products or commerce actions. At the core of a voice commerce platform is the ability to understand users’ natural language queries and take the appropriate action. For example, when the user says, “add organic strawberries to my cart,” the system should be able to understand the intent of the user, identify the entities in question, understand the context and be able to take the appropriate action. In addition, the system was built from the ground up to support “multi-turn conversations” that allow the customer to have back and forth conversations with the assistant. This requires the platform to be able to understand conversation context — i.e., whether the interaction is part of the existing query, or the start of a new one— and support dialog-state tracking and dialog policy. The components that form the core of end-to-end retail dialogue systems are:

  1. Intent classification
  2. Named entity recognition
  3. Dialog manager
  4. Question-answer system
  5. Task executor
  6. A retail knowledge base

All of these components form the core of the Converse platform. A tutorial on end-to-end question answering systems from the AAAI conference this year is here.

2. Differentiated and customizable user flows

A true shopping assistant needs to support a wide variety of user needs. A customer shopping at Walmart might want to reorder their weekly groceries, explore the fresh style in women’s fashion, schedule an order pickup or price compare the latest electronics gadget. Each of the above use-cases will require a unique conversation flow. To address these, we built the voice platform to be completely customizable where the conversation flows can be customized to address the varied needs.

3. Reducing shopping friction for customers

A key goal in developing the Converse platform was to reduce friction for customers. Voice as a medium is great at achieving simple tasks with the least amount of friction. It’s especially helpful in low-consideration purchases. Converse deeply integrates with our search and personalization stack at Walmart Labs, which plays a key role in returning the right products to the user’s voice queries. With voice, users can’t see multiple search results at the same time in order to make product choices, and studies have shown that users typically don’t like to hear a long list of search results. Because of this, Walmart Voice Order search results are optimized for “Precision at 1”. Personalization plays a key role in not only understanding the user’s query context but also customizing the search results and interactive responses.

4. The platform is built ground up to be multi-modal

Voice by itself is a low-bandwidth medium and might not be ideal for high-bandwidth use-cases like shopping discovery — e.g. when a customer does not specifically know what brand or size or other characteristics of a product. To truly reduce friction for a broad range of use-cases, we have built the Converse platform to be truly multi-modal. This means if Converse detects that the customer is unsure of what they wish to order, we can seamlessly push the conversation to a more visual medium, such as the video screens on the latest smart assistant devices or to the Walmart app on the customer’s phone, to complete the transaction — and then return them to the voice medium .

Groceries and beyond

Currently Walmart Voice Order is only available for ordering groceries via Google Home assistants and Android mobile devices, but stay tuned as we work on expanding voice-enabled shopping capabilities to other product categories and more platforms. In future posts, we will share more about unique technical challenges and the solutions as we dive deeper into building a robust retail voice assistant for our customers.

--

--