Joint Intent Classification and Entity Recognition for Conversational Commerce

Deepa Mohan
Walmart Global Tech Blog
4 min readDec 12, 2019

Previously, we introduced our end to end E-commerce conversational AI system which is now live on Google Assistant as well as Apple’s Siri. Our AI system has enabled a complete end to end voice shopping experience for Walmart grocery. Our users can search for Walmart products, add them to cart, edit their cart before checkout, enquire status about the order and much more through voice. A key component of our NLU pipeline is Intent classification and Named Entity Recognition which primarily enables all of the above features and many others. In the following sections, we will dive deep into the intent and entity component of our NLU system. The picture below summarizes the overall conversation flow of our dialog system.

Walmart Grocery Conversation Flow
Walmart Grocery Conversation Flow

Understanding the user query

Walmart catalog has millions of products categorized over several departments. Voice queries are more natural and complete compared to the traditional keyword searches on the website. To make sure our system is optimized for voice search, we must understand the user’s intent, and also identify the entities in the query. This is a key aspect of goal-oriented dialog systems and helps in the retrieval of the right products from the catalog as well as generate the right responses back to the user. Since there can be various attributes associated with a product including brand, type, size, quantity, and unit, we need to carefully identify such associations in the queries. Below is a snapshot of a sample conversation where the user is looking to order chocolate milk.

Query: I’m looking for a pack of horizon chocolate milk

Intent: Search

Entities: pack <unit>, horizon <brand>, chocolate milk <product>

Intent and Entity Model Architecture

We use state of the art BiLSTM and BiLSTM-CRF based deep learning models in production for Intent Classification and Entity Recognition respectively. Recently, there have been various efforts towards generating contextual word embeddings. Such efforts resulted in the embeddings such as Elmo, InferSent and BERT. BERT shines out as it is observed to solve various NLU tasks as seen on the superGLUE benchmark leaderboard. Considering this, we have recently developed a BERT based joint Intent classification and NER model. With the joint model, we exploit the dependencies in the two tasks. The BERT model takes into account, the entire context of a word, enabling it to understand the queries better.

Comparison of results with the previous models

Query: “Search for Head and Shoulders Shampoo”, Intent: Search

Previous model: Head and Shoulders <product>, shampoo <product>

BERT model: Head and Shoulders <brand>, shampoo <product>

The previous model was tagging Head and Shoulders as a product while the BERT model correctly tags this as a brand. The BERT model learns these attribute associations of the product without the need for additional features that have to be periodically scaled and maintained.

Query: “Add multimeter to cart”, Intent: Add to cart

Previous model: multimeter <brand>

BERT model: multimeter <product>

The BERT model correctly tags products it has not previously seen in the training data. This helps us tremendously as we scale to support millions of products in the broad spectrum of our catalog.

Query: “Book a time for pickup tonight”, Intent: Set pickup slot

BERT model: tonight <timereference>

Previous model: tonight <Out of domain>

Correctly tagging the above query seamlessly helps our users set a pickup slot at a Walmart store at their convenient time.

Improvements with BERT

We attribute the improved performance on these queries to the powerful word-piece tokenization and positional embeddings that BERT uses. They help it generalize better for new products and use-cases. We have observed an improvement of about 7% in the F-1 scores of intent classification and entity recognition when compared to the previous models. ‘Add to cart’ intent is very important for an E-commerce NLU system. It shows the user’s real intent to purchase a product. We have observed an improvement of 4% in the F1-score for the intent ‘add to cartʼ with the BERT model.

Future Work

The latency of our inference pipeline is key to generating a timely response back to the user, keeping our overall conversation latency smaller and hence our users more engaged. As we have seen higher inference latencies with BERT than our previous models, we are striving to improve the BERT model inference latency using knowledge distillation and other techniques.

Our models are powering several other voice shopping use-cases such as question-answering for Walmart’s frequently asked questions and product discovery to name a few. Stay tuned for our future blogs and some exciting upcoming features that exploit other features of BERT such as Next Sentence Prediction, segment embeddings, etc.

--

--