Deep understanding of written user query

Enhancing chatbots with empathy: emotion and intention extraction.

Samuel Torche
Empathic Labs
7 min readAug 26, 2020

--

This project lies in the context of a collaboration between the HumanTech Institute and Kare Knowledgeware. Kare’s product is an automated knowledge retrieval conversational tool. The goal of the collaboration is to enhance the customer experience by adding empathy into the system. Empathy is possible by following two steps, first understanding the user intention, and second answering him accordingly.

The main objective is to develop a tool that will allow a user to extract useful information about his query: intent (informational vs emotional), sentiment (happy, upset). A user query enters the system, and this same query exits the system with annotations.

Basic principle of the project
Basic principle of the project

Intention extraction

In the scope of this project, we will limit ourselves to 2 intentions:

  1. Informational: also called Social Q&A, they represent about 60% of all users requests.
  2. Emotional: mainly composed of attitudes and opinions toward a product or a brand, they represent 40% of all users requests.

Technologies

The intention extraction module will be developed with the help of Rasa NLU, a Python framework that allows us to easily build chatbots, and in our case, to detect the user’s intention.

Dataset

The dataset for the intention extraction should contain samples classified as being either informational or emotional. No existing datasets are meeting these requirements so we will have to create our dataset using customer support data and manually classify it. The final dataset contains 100 samples for each of the two classes.

Example of informational samples

Implementation

Implementation of the intention extraction module using Rasa NLU is pretty straightforward. Once the library is installed, we must initialize Rasa so it creates the necessary structure, then we have to define the intents we want to detect. And for each intent, we must provide a dataset containing relevant examples of this intent. Then, the pipeline can be customized. In our case, the default pipeline is already good to go. Afterward, Rasa can build and save its model, and we can run this model as an API.

Intent classification module pipeline using Rasa

Evaluation

The model was evaluated using an 80/20 train&test split. So 80% of the 200 samples will be used for training, and 20% for testing. The model reached 85% accuracy with 34 out of the 40 testing samples correctly predicted.

Adding more samples to the intent classification dataset should increase its performance. 500 samples for each class might be a good start.

Some of the mistakes made by the model concerned ambiguous samples, such as “app doesn’t work ever, it won’t let me order food. Keeps saying I need to confirm my phone number” being classified as informational when it was emotional. This would suggest that sometimes a sample can be both informational and emotional and maybe adding a third intention makes sense.

Emotion extraction

First, we can ask ourselves, what is an emotion? There a no scientific consensus on a definition, but we can simplify it and define an emotion as an affective state associated with the nervous system. There are multiple classification models. Robert Plutchik “Wheels of emotions” and Paul Ekman “6 basic emotions” are two of the most popular models.

Deciphering the emotion in a text is particularly difficult. The lack of voice modulations and facial expressions is a challenge to overcome. Things like the number of words, the punctuation, the negation, the emojis, the lexicon of the words (see DepecheMood for more information) used can help to extract the emotion.

Technologies

Machine learning libraries such as Keras, Tensorflow, and Scikit-learn will be used to develop the emotion extraction module. Advanced techniques such as word embedding (look at GloVe for more information), and POS-Tagging will be used.

Dataset

The dataset for the emotion extraction should contain samples classified with its emotion. There are multiple existing datasets for this task, but we will use the WASSA 2017 dataset. It contains 7103 samples classified into 4 emotions: joy, anger, sadness, and fear.

Examples of samples classified with its emotion

The last value on the figure above corresponds to the emotional intensity. Sometimes, the value is so low that the emotion indicated makes no sense, so we will filter these samples and only keep when the emotional intensity is at least at 0.6 (arbitrary value). This leaves us with 2245 samples: 696 fear, 474 anger, 562 joy, and 513 sadness.

Emotional intensity is too low

Implementation

For the emotion extraction, two modules will be built. The first one will make use of deep learning methods and the word embedding technique. The second one will use a simpler machine learning algorithm and make use of hand-crafted features found in the text such as the number of words, the emojis, the lexicon of the words used, etc. Both approaches will use the classical machine learning pipeline.

Typical machine learning pipeline

The deep learning approach requires preprocessing on the samples: remove punctuation, convert the text in lower case, remove hashtags and mentions since the data comes from Twitter, remove stop words, and finally tokenize the text left. Bidirectional LSTM will be the main layer, since this machine learning architecture can retain information using memory cells.

Evaluation

Both approaches were evaluated using an 80/20 train&test split, so 1779 samples for training and 449 for testing.

The deep learning approach reached 80.78% accuracy globally. Joy is the best class with 88.36% precision and 93.48% recall. The other 3 classes have similar results, with a tendency to have a better precision than recall. The initial dataset was not exactly balanced between its 4 classes, it might explain why sadness and anger are often mispredicted to fear. Balancing and adding more samples to the dataset might improve the performance of the model.

The hand-crafted features approach made use of the Random Forest algorithm and reached 59.47% accuracy. Joy is again the best class with 72.41% precision and 70.59% recall. The model often predicts fear instead of anger, and also often predicts anger or sadness instead of fear. If fear and anger were merged into a single emotion, the model would perform significantly better. The same remarks about balancing and adding more samples to the dataset can be made. Of all the hand-crafted features, the lexicon extracted features are the most important, followed by the number of words, the punctuation, and the emojis.

Hand-crafted features importances

It is interesting to note that joy is expressed in fewer words than anger. Fear seems to be unaffected by the number of words and sadness continually decreases until 20–25 words but has a sudden spike in 25+ words.

Distribution of emotions according to the number of words, expressed with percentage

If the text contains negation, it will have less probability to be a happy text, and if the text contains affirmation, it will have less probability to be an angry or sad text. Fear is unaffected by the presence of negation.

Distribution of emotions according to the presence of negation or affirmation, expressed with percentage

There is a strong correlation between exclamation points and joy, but using more punctuation only has a small impact on the emotion. The differences on global punctuation use and the underlying emotion is small. Exclamation points and sadness seems to correlate also. Exclamation points seem to decrease the probability of sadness.

Distribution of emotions according to the punctuation, expressed with percentage

Conclusion

The current solution is far from perfect. Two intentions, emotional and informational, are not sufficient. Some users’ queries can be ambiguous and adding a third class for these queries seems to be the way to go.

The current module for the emotion extraction has some flaws. The current 4 emotions used (joy, sadness, fear, and anger) lack the neutral emotion when a user’s query does not contain a particular emotion. The deep learning approach and hand-crafted features approach should be merged.

Detecting the emotion is particularly important in the context of a chatbot. Nowadays, chatbots feels too much like robots, they lack emotional intelligence (EQ). Correctly understanding the emotional state of the user allows the chatbot to be able to adapt itself.

Thanks for reading!

If you want to read more about related topics, I can suggest the article explaining Natural Language Understanding for chatbots, or another one from my colleague Charles Perriard about his project aiming at differentiating hateful messages from friendly ones with machine learning techniques here below:

--

--