Natural Language Understanding for Task Oriented Chatbots

Published in

Feersum Engine

7 min readSep 9, 2018

A goal of chatbots is to make our interactions with services, products or companies more natural and convenient. One is able to ‘talk’ to services and companies in natural language over a channel like WhatsApp or Messenger. The chatbot that resides on the other side of the messaging channel in turn uses Natural Language Understanding (NLU) to comprehend and speak Human.

Based on the nature of the interaction, chatbots may be classified into two types. Some chatbots help one accomplish clearly defined tasks like checking one’s account balance, making a reservation or finding the right recipe. Other chatbots are intended to coach, build a relationship or entertain and hence carry on longer conversations. The first type of chatbot that helps users accomplish specific tasks may be referred to as task oriented dialogue agents [Jurafsky and Martin 2018] or task oriented chatbots.

A task oriented chatbot is restricted in the variety of tasks that it can help a user with. However, the level of machine comprehension and real-world context that it needs to possess is similarly limited. This post is related to a talk we recently gave at the AI Expo in Cape Town, South Africa. It is the first in a series of posts that will go into more detail on the NLU needed to build task oriented chatbots.

Task Oriented Chatbots

Similar to many services, products and websites meant to be used by humans, a user journey map may be used to describe how one should be able to interact with a task oriented chatbot [Lewis 2017][Mears 2013]. As mentioned earlier, a goal of including natural language interactions within user journeys should be to create a user experience that is more natural and convenient.

The Merit of NLU

One use of NLU is to allow the user to navigate the user journeys using natural language statements of her navigation intent. This is typically called intent detection. For example, if a user says “I don’t understand my bill, how much is my current account balance?” then the chatbot would detect that she would like to go to the show_balance user journey. If the user says something similar to “I would like to pay R300 …” then the chatbot would detect that she would like to go on the pay_account journey.

Extracting information from unstructured text typically happens post the intent detection. Text processing is used to extract entities such as the rand amount to be paid, the colour of the item a person would like to buy, the toppings the user wants on her pizza, etc. A later section on Implementing the NLU will give two examples of information extraction.

What happened here? Did a cheeseburger stab another cheeseburger?

The Problem of Context

The image above should demonstrate the problem of context. Looking at the image, one might rightly ask what exactly the ‘Breaking News’ is. Did a cheeseburger stab another cheeseburger? Did a person stab a cheeseburger? Or did a person perhaps stab another over a cheeseburger?

The context here is that the TV reporters are very seriously giving us this news. It must therefore be both plausible and newsworthy. Considering the context, one may conclude that, although perhaps unexpected, a person stabbing another over a cheeseburger is the most likely scenario.

The chatbot may similarly use what it knows about the user and where in the journey she is as the context for comprehension. For example, if a registered user is busy with a transaction then her statements are more likely to be on how to transact than on how to register. Indeterminate questions like “How long will it take?” indeed require the context of where in the journey the user is. One way of taking the context into account is to build various specialised NLU models each serving a specific portion of the user journey map.

Template Based Chatbot Responses

Once the chatbot understands a user’s intent and has extracted the required information, one way of replying to the user is to use a response template. Since the user’s position on her journey is known, the chatbot can take the journey context into account when selecting a template. Below is an example response template and the resulting live response from the chatbot.

Typical template: You’ve asked to pay R<amount> to <recipient>. That ok?

Live response: You’ve asked to pay R300.0 to Mel’s Skates. That ok?

A benefit of using response templates is that one has control over the response copy and the user experience. Another benefit of using response templates is that well designed prompts and responses can guide the user to stay in the context that the bot understands.

An alternative to using response templates is to use Natural Language Generation (NLG). NLG uses machine learning to ‘generate’ responses based on training examples and the current context. However, using the appropriate context to generate a good response is still a difficult research problem.

Implementing the NLU

This post will introduce the basic concepts to implement intent detection and information extraction for task oriented chatbots. Future posts will focus on more advanced NLU algorithms and models. The interested reader can also have a look at one of our earlier posts on using more advanced techniques to build natural language FAQs.

Intent Detection

Intent detection may be implemented as a text classification task. One may, for example, use a Naive Bayesian text classifier that is based on Bayes’ Theorem of the probability of events given prior knowledge (a.k.a training data). An intent text classifier is trained using many examples of user utterances representing various navigation intents.

Our FeersumNLU service has a Naive Bayesian text classifier that I’ll be using in the example. However, you could also use the Naive Bayesian classifier from Python’s NLTK or Scikit-Learn.

The below gist shows how to train an intent classifier by providing it with labelled training samples. For example, the first user utterance “I would like to fill in a claim form” is a training example for the claim intent. The complete source code for the example may be found in our github repository.

Example of how to train the intent classifier.

Example of how to do intent classification.

Once trained one can start using the model like shown above. A text input of “I would please like to fill in a claim form.” would result in {‘label’: ‘claim’, ‘probability’: 0.941}. A text input of “How long does it take to get a quote?” would result in {‘label’: ‘quote’, ‘probability’: 0.9857254551692076}.

A Bayesian classifier doesn’t make use of a pre-trained language model. It therefore doesn’t know when words are likely to be synonyms. The drawback of not having a language model is that, to generalise well, the intent models often require relatively large amounts of task specific training data that includes many different example utterances for each navigation intent.

Information Extraction

Regular expressions are often used to extract structured information such as dates, email adresses and telephone numbers from text. I’ll use our FeersumNLU service to demonstrate regular expressions. Internally the service uses Python’s re module.

Example of how to create a regular expression entity extractor.

Example of how to extract a number plate.

A text input of “My car is a 2007 Jeep Wrangler with plate AB 34 EF GP” would result in an extracted entity {‘license’: ‘AB 34 EF GP’}. Note that the regular expression supports two types of number plates.

One can use word embeddings like Stanford’s Global Vectors for Word Representation (GloVe) to extract things like colours, pizza toppings, animals, etc. For the purpose of the example it is adequate to know that given two words the word embedding tells one whether these words are typically used in the same context and likely to be semantically related in some way.

Example of how to create a similar word extractor.

Example of how to extract words similar to the listed colours.

The above gist shows how to extract words similar to the listed colours. Note that given the user utterance “I have an orange car with pink stripes.”, both the colour orange and the colour pink are extracted without the model having seen those colours before. The extracted colours are shown in the comments in the above gist. The use of the word embedding therefore allows the model to ‘generalise’ to unseen colours.

The complete source code for the regex example as well as the source code for the similar word example may be found in our github repository. The repository’s readme contains details on how to get access to a ‘playground’ instance of the FeersumNLU service.

Conclusion

The text classification presented here offers an acceptable baseline for doing intent detection, but doesn’t make use of a language model. Our earlier post on building FAQs looked at using shallow language models (in fact word embeddings) to do the semantic text classification. Future posts will review the use of language models including more recent ‘deeper’ models like ULMFit and the OpenAI Transformer. Deeper language models contain more pre-trained language knowledge and often require much less task specific training data and development.

The ‘similar’ word and regular expression information extractors shown in this post is adequate for many chatbot tasks. Future posts will look at things like parsing of sentences to find relationships between parts of a sentence, extracting unstructured dates and durations, named entity recognition and detecting sentiment.

References

[Jurafsky and Martin 2019]: Speech and Language Processing (3rd ed. draft), Chapter 29 by Dan Jurafsky and James H. Martin.

[Lewis 2017]: Chatbots — Mapping the User Journey by Belinda Ann Lewis.

[Mears 2013]: User Journeys — The Beginner’s Guide by Chris Mears.