Intent Detection using Sequence Models
Holler AI Lab at Holler Technologies
Introduction
Text Classification in Natural Language Processing (NLP) is the task of assigning a class from a set of predefined classes to a given piece of text (document/paragraph/sentence). Popular examples of text classification include spam classification (classifying an email into spam or non-spam), sentiment classification (classifying polarity of text into positive, negative, or neutral), intent detection (identify the intention of the user like play music, book restaurant, book flight, etc.), and news categorization (classifying news articles into categories such as business, sports, politics, science, tech, etc.). If the number of classes is two, the task is called binary text classification and if the number of classes is more than two, then the task is called multi-class text classification.
In this article, we focus on the problem of intent detection. We first introduce the task of intent detection and the dataset used. Next, we introduce the motivation behind using sequence models and show how they can be used to solve the task of intent detection. We develop our solution in Python using pandas, TensorFlow Keras, and scikit-learn libraries. We won’t delve into the technical explanations of machine learning concepts in the interest of time, however, we will learn how to use them along with a lot of code.
Intent Detection
Intent detection aims to recognize the intention of the user query i.e. an action that the user wants to perform e.g. “Play music on YouTube Music”, where the intent is to “play music”. The user query is a single sentence and can originate from a written or spoken utterance. Intent detection is a very crucial task in Natural Language Understanding and is usually modeled as a classification problem. Given an input sentence, the objective is to predict an intent for the sentence from a set of predefined intents.
Dataset
To solve this problem, we need a dataset containing utterances that are labeled with intent. For this article, we use the Snips dataset which is a widely used dataset for intent detection benchmarking. It consists of 14484 utterances across 7 intent types. The dataset is stored in a JSON file and we start by loading it into a pandas DataFrame and plotting the class distribution.
Data Preparation
As shown in Figure 1, the class distribution is balanced. If you’re working on a dataset with severe class imbalance then it might be helpful to balance the dataset. There are various techniques to tackle the class imbalance problem which are out of scope for this article. Now, we split our dataset into train and test sets to train a machine learning model and evaluate its performance. We split the dataset into 80% train and 20% test using the scikit-learn library. To make sure that we can replicate the results in the future, we also set the RANDOM_STATE
variable.
(11587,) (2897,) (11587,) (2897,)
Sequence Models
Sequence models deal with supervised learning tasks where either model input or model output is a sequence. Sequences can be text, audio, video, temporal data, or any other sequential data. Examples of sequence modeling include sentiment classification, music generation, and machine translation. Recent advances in deep learning particularly in the area of sequence models have revolutionized the world of NLP, thereby establishing themselves as a dominant paradigm for training models for language understanding tasks. Sequence models like recurrent neural networks (RNNs) and transformers have consistently achieved state-of-the-art performance on a number of benchmark NLP tasks. In this article, we use an architecture related to RNNs, long short-term memory (LSTM) to solve the task of intent detection.
First, we need to prepare our input text for use in training. We tokenize the text and convert it into a sequence of integers by using the Tokenizer
from Tensorflow Keras. Then, we pad the sequences to be of the same length for modeling as required by Keras.
(11587, 35) (2897, 35)
Next, we prepare one-hot vectors for labels by using the LabelEncoder
and to_categorical
function.
(11587, 7) (2897, 7)
Training a Sequence Model
Let’s define & train our model now.
LSTM
We define a Tensorflow Keras Sequential
model and add layers to it. The first layer is the Embedding
layer that is used to represent each word with a vector of fixed length 16
. The next layer is the LSTM layer with 16
units with a relu activation. Next, we have a Dense
layer with 7
units with a softmax activation for classification. Then, we compile the model with adam optimizer, categorical cross-entropy loss, and evaluate performance metrics like precision, recall, and accuracy. Finally, we fit our model on the training dataset with a batch size of 32
for 7
epochs. We also use 10% validation data for the evaluation of performance metrics at the end of each epoch. The fraction of the training data to be used for validation is specified using VAL_SPLIT
.
As the training progresses, we can notice the precision, recall, and accuracy on the training set increasing. The same effect can be noticed in the accuracy on the validation data.
Plot Learning Curves
Once we fit the model, we look at the learning curves by plotting the loss function for the training data and the validation data.
In Figure 2, we notice the loss on training and validation data to be decreasing continuously and getting more stable with a small gap in-between, therefore, we can conclude that our model is not overfitting or underfitting and is a good fit for the data.
Evaluation
Performance on Test Data
Now, we evaluate the performance of our model on the test dataset.
91/91 [==============================] - 0s 3ms/step - loss: 0.1192 - precision: 0.9884 - recall: 0.9724 - accuracy: 0.9821
The model achieves a near state-of-the-art accuracy of 98.21% on the test set.
Classification Metrics
We also compute the performance metrics for each class to get a better estimate of model performance.
As shown in Figure 3, the model is performing well for all classes of intent.
Inference using a Sequence Model
Finally, we can use our trained model to perform inference.
play_music
The model correctly predicts the intent “play_music” for the user query “Play music on YouTube Music”.
Summary
In this article, we learned to use sequence models for intent detection. We introduced the task of text classification with the example of intent detection. We also introduced sequence models and the motivation behind using them for sequence classification tasks. We demonstrated how sequence models can be used to solve the task of intent detection by implementing an LSTM model using TensorFlow Keras in python.
A Jupyter Notebook containing the code can be found here.
Resources
If you’re interested in learning more, here are some resources:
- https://towardsdatascience.com/illustrated-guide-to-recurrent-neural-networks-79e5eb8049c9
- https://machinelearningmastery.com/gentle-introduction-long-short-term-memory-networks-experts/
- https://colah.github.io/posts/2015-08-Understanding-LSTMs/
References
- Coucke, A., Saade, A., Ball, A., Bluche, T., Caulier, A., Leroy, D., Doumouro, C., Gisselbrecht, T., Caltagirone, F., Lavril, T. and Primet, M., 2018. Snips voice platform: an embedded spoken language understanding system for private-by-design voice interfaces. arXiv preprint arXiv:1805.10190.
- https://machinelearningmastery.com/sequence-classification-lstm-recurrent-neural-networks-python-keras/
Thanks for reading! If you have questions or if you would like to see us write on anything, please drop a comment or reach out on my website.
About Holler
Holler is here to make your texts, posts, payments, and DMs more expressive. How? By suggesting the most relevant content — animated Stickers and GIFs– right when you need it the most in chat.