Predicting Emotions Using EEG Data with Recurrent Neural Networks

An explanation of how RNNs work and how I coded a program that can predict emotional states using EEG data

Mir Ali Zain

Published in

Geek Culture

11 min readMay 6, 2021

What is EEG?

Electroencephalography (EEG) is a non-invasive method that allows us to collect and record the electrical activity of our brain.

Collecting an electroencephalogram is done by placing electrodes on the surface of the scalp to measure the electric potentials caused by the neurons, more specifically the postsynaptic potentials, but it is only when clusters of neurons fire together do they provide enough signal to be detected from the scalp using an EEG.
But what is actually being recorded is the difference in voltage (in the brain) between the placements of a minimum of two electrodes. These differences in voltages should be recorded simultaneously so that we can better understand and interpret an Event-Related potential — which is the measured brain response that is the direct result of a specific sensory, cognitive, or motor event.
The EEG data collected is comprised of the rhythmic activity of the brain, which reflects the neural oscillations that take place within. These neural oscillations are driven by the interactions between neurons and occur at specific frequencies — These include delta, theta, alpha, beta, and gamma. Studies have found that there are associations between these rhythms and different brain states, as shown below.

The goal for this project was to find out if we could use machine learning to discern a positive, neutral, or negative emotional state based on one’s brainwaves.

A Muse EEG headband was used to collect the data, which took the readings from the TP9, AF7, AF8, and TP10 placements via dry electrodes from two subjects (1 male, 1 female) for 3 minutes per state: positive, neutral (resting state), negative. The following movie scenes were used as a stimulus to evoke positive and negative emotions:

Marley and Me — Negative (Twentieth Century Fox)
Death Scene
Up — Negative (Walt Disney Pictures)
Opening Death Scene
My Girl — Negative (Imagine Entertainment)
Funeral Scene
La La Land — Positive (Summit Entertainment)
Opening musical number
Slow Life — Positive (BioQuest Studios)
Nature timelapse
Funny Dogs — Positive (MashupZone)
Funny dog clips

What is an RNN?

Although you may not exactly know what a recurrent neural network (RNN) is, it is highly probable that you’re using applications that leverage RNNs constantly. Examples include Siri, which uses RNNs for speech recognition, or the predictive-text feature on your phone’s keyboard, and even stock price predictions.

But what is a recurrent neural network? A recurrent neural network or RNN is a type of artificial neural network which is designed to recognize sequential characteristics and patterns of a given dataset, which can then be used to predict the next likely scenario.

To understand the concept behind RNNs, let’s take a look at a simple example. The image above is a snapshot of a ball flying through the air, with a reference to time (t=3). Can you predict the direction in which the ball will continue to fly in, using only the diagram provided? You can take a guess, but it’s a completely random guess and is only as good as another random guess.

However, suppose I gave you a few more snapshots in succession, with a reference to time, do you think you’ll be able to make a better guess? Hopefully, your answer is yes, and that is because now you have knowledge of where the ball was and where it’s has been going, and consequently, you’ll also now have enough data to predict where it will go. And this is what we call sequential data, where the order of the data matters, i.e. where the current position of the ball, is dependent on the ball’s previous position. This is just one example of sequential data, others include audio, strings of text, and even EEG data.

So let’s take a look at the more technical side, how are RRNs structured and designed to be able to do this? Let’s compare it to a regular neural network, otherwise known as a feed-forward neural network. Feed-forward neural networks are composed of an input, hidden, and output layer, which passes information from each layer to the next. But how can we redesign this so that it’s able to account for previous information as well as the current information? This is achieved by adding a loop to this neural network to pass previous information forward, and that’s essentially what an RNN does. An RNN has a looping mechanism that allows for information to re-enter the hidden layer, this is information is called the hidden state, which is a representation of all the previous inputs.

RNN (left) vs Feed foorward NN (right) Source: IBM

However, RNNs suffer from short-term memory, caused by the infamous vanishing gradient problem, which is predominant in neural networks that use gradient-based learning methods and backpropagation. Let’s take a look at how this happens:

The first step to training a feed-forward neural network is a forward pass and making a prediction, this prediction is then compared to the ground truth using a loss function, which then returns an error value. This error value is an indication of how bad the neural network has performed, which is then used to perform backpropagation to calculate the gradients of each node of the neural network, starting from the nodes of the layers closest to the output layer, and working its way back to the first hidden layer.

A gradient is a value that measures the change in weight with regard to the change in error; this is used to adjust the network's internal weights, allowing the network to learn, the greater the gradient, the faster the model can learn, and here is where the vanishing gradient problem comes in.

During backpropagation, each node calculates its gradient with respect to the gradient of the effects of the gradients in the previous layer. So if the gradients of the previous layer were to be small, the change in the current gradient would be even smaller - This has a domino effect on the rest of the network, and causes the gradients to vanish, hence called the vanishing gradient problem. And as the gradient gets smaller and smaller, the change in weights also gets smaller and smaller, meaning the model will also learn slower and slower — not doing much to reduce the error value produced by the loss function.

Now, bringing it back to RNNs, RNNs use a form of backpropagation called ‘backpropagation through time’, which is essentially tailored for RNNs by treating each loop as a hidden layer, and here, the gradients’ value will exponentially shrink as it propagates through each loop. Due to the vanishing gradients, the RNN is unable to learn the dependencies from layers/loops further back, meaning there is a chance that the earlier segments of the EEG data are not considered when making a prediction if the dataset is large enough, giving our model short-term memory.

Solution: GRU — What is a GRU?

So what can we do about the RNN’s short-term memory? To address this problem, two types of specialized RNNs were created: LSTM (Long Short Term Memory) and GRU (Gated Recurrent Unit).

These work exactly like RNNs, however, they employ these mechanisms called ‘gates’ which allow them to learn long-term dependencies, unlike a regular RNN. These ‘gates’ are essentially different tensor operations that can learn which information to add or remove to the hidden state — and because of this ability, short-term memory is less of an issue for them.

Building the RNN to predict emotions using EEG

Importing Libraries

Alright so now that you know what EEG and RNNs are, let’s take a look at the code to understand how we can put these two together so that we’re able to predict the emotional states of a subject given their EEG readings.

First off, we’re going to be importing the libraries that we’re going to need for this program.

Numpy and Pandas: Used for data manipulation and handling
Matplotlib and Seaborn: Used for data visualization so that we can better understand the data and familiarize ourselves with what we’re working with
Tensorflow and train_test_split: Used for the machine learning itself and preparing the data
confusion_matrix and classfication_report: To check the models performance

Understanding the data

So if we have a look at the data here, we can see a label column towards the very end, this represents the emotional state of the subject and is given as either neutral, negative, or positive, and this is what we’ll be trying to predict with our recurrent neural network.

Finding value counts of the label column

And then we call the value counts function on our label column. This is used to look at the class distribution of our label to ensure that there isn’t an imbalance in the number of classes represented in the data, which can have an effect on the training of our neural network, and as you can see they are all fairly balanced, ranging between 708 and 716.

Preprocessing the data

Next up we have the preprocessing function, this is where we prepare the data for it to be passed through our neural network.

The neutral, negative, and positive classes we have from the label column are an example of what we categorical data, which is defined as ‘types of data which may be divided into groups’

So what we have to do is enumerate each of the classes so that the computer is able to understand it. This is done by using the dictionary ‘label_mapping’ — this takes each of the values from the label column and replaces it with the corresponding number defined in the dictionary. Essentially every negative will be replaced with 0, and every neutral will be replaced with 1 and every positive will be replaced with 2.

Now we’ll be detaching the label column from the rest of the data as it’s the values of the label column itself are what we’re trying to predict with our RNN. The label column will be stored in y, and the remaining data in X. Think of the label column (y) as an answer sheet to a quiz (X), you wouldn’t give a student the answer sheet while they’re taking a quiz, or else they’ll just memorize the answers rather than learning how to get the answers themselves — the answer sheet is given after the quiz to compare the student’s answers with the actual answers and make improvements.

Our next move is to split the data into training and testing groups, with the train_size set to 0.7, our training group will consist of 70% of all the data. This function also automatically shuffles the data for you, and that’s why we set a random_state of 123 just to ensure that the data is being shuffled the exact same way every time we run this. Sticking with the quiz analogy, X_train and y_train are the practice quizzes and their answers, whereas X_test and y_test is the actual quiz and its answers.

Modeling the RNN

And here comes the part where we model our neural network. As previously mentioned, we’re gonna be using a recurrent neural network because we have time-series data, and it’s important that we use an RNN because the values in the data aren’t necessarily independent, but rather tied in with previous and following values.

So instead of using dense layers as you would in a feed-forward neural network, we’re gonna using a GRU. We’ll be using 256 units and have return sequences_turned set to True — because return_sequences is turned on, it will essentially give us more data in the form of a 2-dimensional array and this often improves performance because obviously there is more data, however, because it is now outputting data in an array, we’re gonna have to flatten it from a 2-dimensional array into just a long 1-dimensional vector.

One more thing I need to mention, it’s that the GRU layer requires the input to be 3-dimensional, and the reason for that is because often each bit of data that comes in, is encoded as a vector itself, so it requires a third dimension, and to compensate for that, we’re going to get an expand_dims layer, which takes in the inputs and expands the dimensions along axis 2, which gives it a 3rd dimension. So this expand_dims layer will be passed into GRU, and then we’re going to pass the output from the GRU to flatten, and then the output from flatten into outputs, which is the output layer of our RNN.

This is where we compile the model — We use an adam optimizer, along with sparse_categorical_crossentropy, which is best suited for multi-classification problems like the one we have. And once we train the model, we store its result in history. So we’re going to pass through X_train and y_train, with a validation split of 20% and a batch size of 32, which is the number of samples to work through before updating the internal model parameters, and 50 epochs, which is the number of times the entire training dataset will be passed through the model. We won’t actually be needing these many epochs, because we’ll be employing the early stopping callback. This will allow the model to take a look at the validation loss after each epoch, and it will evaluate if the loss is improving, and if it doesn’t improve for a specified amount of epochs, which we’ve specified as 5 here, then it will stop the training and restore the weights from the best epoch.

Results

Code for the confusion matrix and classification report

Confusion matrix and classification report

That’s all folks! Hope you all enjoyed, and if you did leave a few claps and follow! Feel free to send in any questions you may have. The code and the dataset can be found on my GitHub!

Hi, I’m Mir Ali, I’m a 17-year old ML and neurotech developer and innovator at The Knowledge Society (TKS) — leveraging emerging technologies to solve the world’s biggest problems. Join me on my journey as a writer as I develop my knowledge and skills by working on fascinating projects to impact billions in the future.