Using the DIET classifier for intent classification in dialogue

Via the Rasa stack

Published in

The Research Nest

8 min readJul 28, 2020

Image source: https://blog.rasa.com/introducing-dual-intent-and-entity-transformer-diet-state-of-the-art-performance-on-a-lightweight-architecture/

A few days ago I was trying to develop an intent classifier for a bot whose backend part was almost complete. The task of intent classification comes under Natural Language Understanding(NLU) and the output of the intent classifier acts as the final interpretation of the query. The task is pretty straightforward and it can be addressed using various techniques. But as it is the case with most of the machine learning projects, the biggest bottleneck was the lack of training data. I tried a whole variety of things from basic techniques like a bag of words to heavy language models like BERT. But in the end, the models were struggling to generalize and some of them were very slow to put into production. The intent classifier needs to be as accurate as possible because the response of the bot largely depends on the output of the intent classifier. So, it doesn’t matter how well the rest of your bot performs, it can still put you into a make or break situation.

To get some advice on this, I messaged Philip Vollet (great guy, I follow him to get the latest updates on NLP) on LinkedIn and a few moments later I got a reply from him advising me to use Rasa’s DIET classifier. I previously had used Rasa to make a pretty basic chatbot in my free time, but, I wasn’t aware that we can use Rasa’s NLU pipeline separately.

So, I ended up using DIET classifier and honestly, I am pretty impressed by its performance. Also, it does not require you to write any code, all the preprocessing and implementation is handled in the background. All you have to do is curate training samples carefully and experiment with various NLU pipelines to get the desired results. In this post, I will show you how to prepare an NLU pipeline with the DIET classifier and spin up an NLU server to use it as an API.

What is Rasa?

If you don’t know already, Rasa is an open-source machine learning framework to automate text and voice-based conversations. Using Rasa, one can build contextual assistants capable of having layered conversations with lots of back-and-forths. Rasa is divided into two major components-

Rasa NLU: It is used to perform NLU tasks like intent classification and entity recognition on the user queries. So, basically its job is to interpret messages.
Rasa Core: Rasa core is used to design the conversation. It handles the conversation flow, utterances, and actions based on the previous set of user inputs.

Since this post is about using DIET Classifier, a component of Rasa NLU, we are going to completely ignore the Rasa Core here.

DIET: A Quick Introduction:

Dual Intent and Entity Transformer(DIET) as its name suggests is a transformer architecture that can handle both intent classification and entity recognition together. It was released in early 2020 with Rasa 1.8. The best thing about DIET is its flexibility. It provides the ability to plug and play various pre-trained embeddings like BERT, GloVe, ConveRT, and so on. So, based on your data and number of training examples, you can experiment with various SOTA NLU pipelines without even writing a single line of code.

If you look at the top right part of the diagram in the left, the total loss is being calculated from the summation of entity loss, mask loss, and intent loss. This is because DIET can be trained for these three NLP tasks simultaneously. The mask loss is turned off by default. Use it only if you have a very large training dataset so that the model can adapt to become more domain-specific.

If you want to use DIET only for the intent classification then you have the option to turn off the entity recognition and vice versa. Even though if you want to use the DIET classifier only for the one task. I would suggest you use it for both tasks because in this case, the final loss will be the sum of entity loss and intent loss. So, there is a probability that one task might influence the performance of the other. I raised this same query in rasa forums and this is the answer that I got: https://forum.rasa.com/t/does-setting-entity-recognition-false-affects-the-performance-of-intent-classification-task-of-diet/30447/2.

So, go through this link and based on your use case decide how you want to train your model. For example in this tutorial, we will train the DIET classifier for both entity recognition and intent classification. If you want to know more about DIET in detail, then you can watch these two videos:

Setting up the Project:

First and foremost you are required to install rasa using pip in a machine with python(≥3.6). I would suggest you build a separate python virtual environment for rasa. Use the following command to install rasa:

pip install rasa

For this post, I am using Rasa 1.10.2. Now initialize a new rasa project use the following command and follow subsequent the instructions :

rasa init

After the initial setup, your project directory will look something like this.

.
├── actions.py
├── config.yml
├── credentials.yml
├── data
│   ├── nlu.md
│   └── stories.md
├── domain.yml
├── endpoints.yml
├── __init__.py
├── models
│   └── 20200711-160818.tar.gz
├── __pycache__
│   ├── actions.cpython-36.pyc
│   └── __init__.cpython-36.pyc
└── tests
    └── conversation_tests.md

The only files that we care about for this tutorial are:

/data/nlu.md- This markdown file will hold our training data.
config.yml- This file will be used to define out NLU pipeline.

Training Data:

Now open up the nlu.md to prepare your training data. You might notice that some initial data is already present in this file. Let me break down its format for you. For example, the NLU training data for a bot that can book flights might look something like this-

## intent:inform_quota_code
- Quota is General
- Its General Quota
- TQ
- General## intent:inform_seating_class
- E class
- Its B
- Seating class is E
- Class is E
- Class B
- B class## intent:inform_source_airport
- Its Mumbai
- Its from Hyderabad
- From Ahmedabad
- Source is Bangalore
- From Berlin
- Source is New Delhi

The training samples of each intent comes under ##intent:INTENT_NAME heading. What follows ##intent: will be the name of the intent. Each training example should be written in a separate line and it must start with -.

To train the model for the entity recognition task. You have to mark the entities in the training dataset. To mark a word or group of words as an entity, enclose the words in [] followed by the name of the entity enclosed in (). Example-

## intent:inform_quota_code
- Quota is [General](quota_code)
- Its [General](quota_code) Quota
- Its [General](quota_code) Quota
- [TQ](quota_code)
- [General](quota_code)
- [General](quota_code)## intent:inform_seating_class
- [E](seating_class) class
- Its [B](seating_class)
- Class is [E](seating_class)conomy
- Seating class is [E](seating_class)
- Its [B](seating_class)
- Seating class is [E](seating_class)
- Class is [E](seating_class)
- Class is [B](seating_class)
- Class [B](seating_class)
- [B](seating_class) class
- [E](seating_class)## intent:inform_source_airport
- Its [Mumbai](city)
- Its [Pune](city)
- Its from [Hyderabad](city)
- From [Ahmedabad](city)
- Source is [Bangalore](city)
- Source is [Chennai](city)
- From [Berlin](city)
- Source is [New Delhi](city)

From what I have learned so far. If you are just starting to build your model and you don’t have any training dataset then having 15–20 examples for each intent is a good starting point. Training data should have a rich set of examples to cover the variety in which real users may phrase their queries.

If any of the entities have a finite set of values then you can also add a lookup table for that entity. For example, the entity country name has a finite set of values.

## lookup:countries   <!-- lookup table list -->
- India
- Nepal
- China

## lookup:additional_countries  <!-- no list to specify lookup table file -->
path/to/countries.txt

NLU Pipeline:

The components of the NLU pipeline is defined in the config.yml. This file is divided into two parts. The first one is for the NLU pipeline and the second one is to define the policies for Rasa Core. The project is initialized with the following pipeline by default.

pipeline:
- name: WhitespaceTokenizer
- name: RegexFeaturizer
- name: LexicalSyntacticFeaturizer
- name: CountVectorsFeaturizer
- name: CountVectorsFeaturizer
  analyzer: "char_wb"
  min_ngram: 1
  max_ngram: 4
- name: DIETClassifier
  epochs: 100
- name: EntitySynonymMapper
- name: ResponseSelector
  epochs: 100

The behavior and the performance of the classifier depend largely on the NLU pipeline. So, you have to curate your NLU pipeline according to your training data. For example, if your dataset has fewer training examples then you might have to use pre-trained tokenizers like SpacyTokenizer, ConveRTTokenizer. The choice of tokenizer might also affect the type of featurizers. The components like RegexFeaturizer can be used to extract certain regex patterns and lookup table values. Similarly DucklingHTTPExtractor can be used to extract entities other than the entities you have marked in your dataset, it can be used for entities like dates, amounts of money, distance, etc. Neither duckling nor spacy requires you to annotate your dataset. Other training parameters like epochs, the number of transformer layers, turning on/off entity recognition, embedding dimension, etc. can be configured under the component DIETClassifier.

Ordering of the components of the NLU pipeline is also very important. For example, you can’t define featurizers before the tokenizers because the output of the preceding component acts as the input for the following component.

In short, you have to plug and play with different components and tweak their configuration to find the optimal pipeline for your dataset. It is also possible that with the addition of new data over the period of time, you might have to again tweak the NLU pipeline. You can follow the Rasa docs and Rasa forums for more details and examples of NLU pipelines.

Training the Classifier:

Now, since our data is in place. It’s time to train our classifier. To train the classifier, use the following command from the root directory of the project:

rasa train nlu --config PATH/TO/CONFIG/FILE --out OUTPUT/PATH

After training, your model will be saved in the ./modelsfolder.

Testing the NLU model:

You can quickly try out the trained model in the following two ways.

Rasa shell- Use the following command to load the trained model. This will start the rasa shell and ask you to type in a message to test. You can keep typing in as many messages as you like.

rasa shell nlu -m PATH/TO/MODEL

2. NLU Server- To start a server with your NLU model, pass in the model name at runtime in the following command:

rasa run --enable-api -m PATH/TO/MODEL

The default port of the NLU server is 5005, it can be changed by using -p flag in the above command. You can then request predictions from your model on the /model/parse endpoint. To do this, run:

curl localhost:5005/model/parse -d '{"text":"I am mohit saini"}'

The response from the API will look something like this:

{"intent":{"name":"enter_data","confidence":0.9997693300247192},
"entities":[{"entity":"name","start":5,"end":16,"value":"mohit saini","extractor":"DIETClassifier"}],
"intent_ranking":[{"name":"enter_data","confidence":0.9997693300247192},{"name":"ask_how_contribute","confidence":0.0001140582753578201},{"name":"switch","confidence":0.00005040861287852749},{"name":"ask_which_events","confidence":0.00004123761027585715},{"name":"restart","confidence":0.000025004552298923954}],
"text":"I am mohit saini"}

As you can see here, the DIET classifier identified my name and the intent as enter_data with high confidence.

Dockerfile of NLU server:

You can use the following simple Dockerfile configuration to containerize the NLU server. Make sure the name of the model in Dockerfile matches the name of your trained model.

FROM python:3.6RUN useradd -m dockeruserADD . /home/dockeruserWORKDIR /home/dockeruserRUN pip install --trusted-host pypi.python.org -r requirements.txt && chown dockeruser.dockeruser /home/dockeruser -RUSER dockeruserEXPOSE 5005CMD ["rasa", "run", "--enable-api", "-m", "PATH/TO/MODEL"]

I have created this repository for reference with a trained Rasa NLU model: https://github.com/sainimohit23/rasa-demo. The training data is taken from an official rasa demo bot- Sara.

References for further exploration

Components

This is a reference of the configuration options for every built-in component in Rasa Open Source. If you want to build…

rasa.com