What do I do with my emotions?

Emotion Detection with Python

Konstantina Andronikou

Published in

Cmotions

7 min readNov 1, 2022

This Notebook is an adaptation from a notebook generated by Piek Vossen.

Introduction to Emotion Detection

Our daily decisions are strongly influenced by our emotions. These strong feelings are a crucial element in the human experience of our daily lives. When something makes us happy, we often repeat it, but when something makes us feel anger or sadness, we avoid it (García, 2019). We can identify the emotions by extracting subjective information from textual sources like reviews, recommendations, posts on social media, transcribed conversations, etc. This can be done with the help of Natural Language Processing (NLP) tools. NLP uses computational linguistic techniques to help machines understand and generate human languages in the form of texts or speech.

What is Emotion Detection?

With the help of Natural Language Processing and Machine Learning we can observe emotional behaviors associated with an individual from a text. Through textual data we can identify small behavioral subsets through textual indicators or verbal signals that can represent the overall emotion. The primary goal of this task is to analyze and understand the state of an author. Emotion Detection can be considered the broader concept of sentiment analysis. It is a more complex procedure that identifies the broader spectrum of human emotions, for example, with this analysis we can identify emotions such as: happiness, fear, anger, etc. While sentiment analysis focuses on analyzing and characterizing a text in terms of polarity (positive-negative-neutral).

How does Emotion Detection work?

Brief History Moment

Textual data can contain several associated emotions and there are multiple approaches to analyzing a text in terms of emotion. The first approach of detecting emotional states was introduced as ‘affective computing’ in 1997 by Since Picard. This concept proposed the role of emotions in human computer interaction (Shivhare and Khethawat, 2012).

Approaches

There are multiple approaches to analyzing a text in terms of emotion. If you would like to have a better overview on how you can implement an emotion detection task, then a paper written by Canales and Martinez-Bacco, 2014 is the way to go. For this task the techniques can be divided into lexicon based and machine learning based.Some of these methods are the following:

Keyword spotting: This is a simple lexicon-based technique where specific keywords within the data are associated with an emotional state. Let’s imagine that we have the sentence “The dog at the park looked scary”, in this case we can see that the keyword ‘scary’ can give us a hint on the emotional state, which is ‘fear’. These keywords are classified into emotion labels and texts are classified per emotion based on the keywords within the text.
Machine Learning Based: With the use of a pre-trained emotion classifier model, we can easily implement an emotion detection task. Machine Learning approaches apply algorithms based on the linguistic features of the data in order to learn and detect emotions on labelled training data.
Hybrid approach: Both of the previous methods can be combined to generate a hybrid approach.

Implementation

For this notebook we are going to be implementing a machine learning based approach. We are using a pre-trained classifier model, Support Vector Machine (SVM).

Support Vector Machines (SVMs)
An SVM is a type of large-margin classifier whose objective is to establish a decision border between two classes that is as far away from any point in the training data as possible. It uses a function called kernel to map the space of data points in which the data is not linearly separable (Mullen and Collier, 2004).

If you would like to have a better understanding of this classifier a great explanation is provided by Burges 1998.

Step 0: Loading the data and relevant packages

The first step in order to start with the emotion detection task is to load the desired data as well as the relevant packages for pre-processing steps. The data used for this notebook is an open-source dataset from Hugging face. This dataset contains English Twitter messages with six basic emotions: anger, fear, joy, love, sadness, and surprise.

If this is the first time using datasets the following line is necessary.

Step 1: Reviewing and preparing the data

Text data is unstructured, and we need to process the data to obtain structured representations if we want to extract meaningful information. The common idea of all NLP tools is that they try to transform text in some meaningful way. The given data is already separated to three components: train,test, and validation. The following steps are preparing the data for the task.

1.1 Separating the data

The following step separates the data into three different variables: train, test and validation. For training our classifier we are only using the train_data.

1.2 Dataframe

A DataFrame is a two-dimensional data structure of a table with rows and columns. With the creation of a DataFrame makes the inspection and understanding of the data easier. Moreover, this function can take on a lot of different structures as input. The package used to generate a dataframe is pandas. This procedure is done for both train and test data.

1.3 Adjusting Labels

The emotion labels provided from the original datasets are numeric values. For example, number '0' represents the emotion 'sadness'. For a better understanding of the data, we are changing the emotion labels from numeric to strings, from '0' to the keyword 'sadness'. This procedure is done for both train and test data.

1.4 Statistics of the data

By reviewing the data used for our task we have a better understanding of the dataset as well as the labels used. The following cell provides some information concerning the data we are using.

Step 2: Input preparation

2.1 Instances

The following step creates a list of labels and instances in order to train the classifier for both the train and test data.

2.2 Converting the text to numerical values

This step creates a dictionary generating a vector with the length of the number for all unique tokenized words. It will convert the list of text to bag-of-words vector representation.

Bag-of-words(BOW): This approach counts how often each word appears in each cluster, assigning the frequency of each word in each cluster.

What does TF-IDF stand for?

This abbreviation stands for: term frequency-inverse document frequency.

What is the function of this formula?

This formula concerns measuring the representation of the importance of a word to a document.

Step 3: Training the model

Step 4: Results — Evaluation

The evaluation of the classifier is done with the following ways:

Precision: quality of a positive prediction made by the model.
Recall: the ratio between the number of positive samples correctly classified as positive to the total number of positive samples.
F-score: evaluating a binary classification model based on the predictions made for the positive class.
Confusion Matrix: determines the performance of classification models for a given set of test data.

Discussion

If we take a closer look to the results, it can be seen that the label that was picked up the most by the classifier was ‘joy’ as it was 262 times identified correctly. While in the case of ‘sadness’ it can be seen that it was wrongly assigned as ‘joy’ 202 times. This is an interesting case as these two emotions are completely opposites. Another interesting result, worth mentioning is the case of ‘surprise’ as it was never identified correctly as ‘surprise’.

If you are still unsure if your classifier is working, you can always test the performance on a different/unknown data.

Closing Notes

This notebook was aiming to introduce an emotion detection task and highlight the importance of identifying emotions through textual data. By extracting this subjective information from textual source, we can identify the emotions represented by the author’s text and respond appropriately. This task can be beneficial in many fields such as education, politics, customer experience and employee satisfaction. With this blog we show the value of an NLP task such as emotion detection.

If you would like to read more about Emotion Detection, please have a look at our article ‘What do I do with my emotions?’. If you are in general interested in NLP tasks, then you are in the right place! Take a look at our series Natural Language Processing

Want to read more about the cool stuff we do at Cmotions and The Analytics Lab? Check out our blogs, projects and videos! Also check out our Medium page for more interesting blogs!