How to do Facial Emotion Recognition Using A CNN?

Published in

the ML blog

8 min readDec 24, 2018

Hola everyone!

It’s been a long time. So let me start by giving a recap of what I was doing during this time. I moved to the beautiful town of Boulder, Colorado, USA to pursue a Master of Science (MS) degree from the University of Colorado Boulder!

I started working on a project involving human-robot interaction at Collaborative AI and Robotics Laboratory at my University and have met lot’s of amazing people in a very short time!

Alright, enough of that, let’s get to the task at hand. This is a post meant hopefully answer the question-

How to do Facial Emotion Recognition Using a Convolution Neural Network?

Before we start with the specifics, let’s start with some basics!

What the f is a convolution neural network?

Right now, all you need to know that a Convolution Neural Network or CNN as it is popularly called is a collection of two types of layers-

The hidden layers / Feature Extraction Part

convolutions
pooling

2. The classifier part

Architecture of a CNN. — Source: https://www.mathworks.com/videos/introduction-to-deep-learning-what-are-convolutional-neural-networks--1489512765771.html

Alright, but what the f is a convolution?

Convolution is a mathematical operation which involves a combination of two functions to produce a third function. In CNN the convolution is performed on the input data with the use of a filter to produce a feature map.

But, you mentioned something called pooling too?

Pooling layer is added after a convolution layer. It performs continuous dimensionality reduction i.e reduces the number of parameters and computations thereby shortening training time and controlling overfitting. One such pooling technique is called max-pooling, which takes the maximum value in each window which decreases the feature map size while keeping the significant information.

Max pooling takes the largest values. — Source: http://cs231n.github.io/convolutional-networks/

Now let’s move to the last thing we need to know before we get out hands dirty that is Dropout, a technique where randomly selected neurons are ignored during the training. They are “dropped out” randomly. This is a great technique which is used to reduce overfitting in our model and to get well-generalized results.

Still confused?

Don’t you worry, just read this awesome post by Daphne Cornelisse and you will get the hang of things.

Now lets’s start coding!

We will be working with the Kaggle’s FER2013 dataset, which can be downloaded by clicking the link and the CSV file can be extracted.

I will follow a line by line approach so that it’s easier to understand. Let’s start with preprocessing. You can fork the repository for this code if you wish to follow along.

Preprocessing

This is a fairly simple step which involves getting the data and storing it in a way that would be easier for us to use.

Line 1–7- Importing the libraries and reading the CSV file.

Line 8–3 - Getting the training features X and labels y from pixels and emotion columns of the CSV respectively and converting them into numpy arrays. We also add an additional dimension to our feature vector by using np.expand_dims() function, this is done to make the input suitable for our CNN which we will design later. Both features and labels are stored as .npy files to be used later.

After we execute the code above, our output would look something like this-

Preprocessing Done
Number of Features: 48
Number of Labels: 7
Number of examples in dataset:35887
X,y stored in fdataX.npy and flabels.npy respectively

Now let’s start developing our model. I will divide the process into multiple steps so that it’s not too overwhelming.

Starting

Line 1–11 - Importing the required libraries for our CNN.

Line 12 -23 - Okay there is a lot going on here, first we declare the variables we will need for training our CNN. We have 48x 48-pixel resolution so we have width and height as 48. Then we have 7 emotions that we are predicting namely (0=Angry, 1=Disgust, 2=Fear, 3=Happy, 4=Sad, 5=Surprise, 6=Neutral), so we have 7 labels. We will be processing our inputs with a batch size of 64.

Next, we load the features and labels into x and y respectively and standardized x by subtracting the means and dividing by the standard deviation.

Line 24 -35 - The first four lines just print the images by using the pixel values. After that we divide the data into training and testing set by using sklearn’s train_test_split() function and save the test features and labels to be used later. We are also performing another division on our training data to obtain the validation data which would be used later in the code.

Now let’s move to the next chunk of code.

Designing the CNN

This step is the most important part of the entire process as we design the CNN through which we will pass our features to train the model and eventually test it using the test features. We have used a combination of several different functions to construct CNN which we will discuss one by one.

Sequential() - A sequential model is just a linear stack of layers which is putting layers on top of each other as we progress from the input layer to the output layer. You can read more about this here.
model.add(Conv2D()) - This is a 2D Convolutional layer which performs the convolution operation as described at the beginning of this post. To quote Keras Documentation “ This layer creates a convolution kernel that is convolved with the layer input to produce a tensor of outputs.” Here we are using a 3x3 kernel size and Rectified Linear Unit (ReLU) as our activation function.
model.add(BatchNormalization()) - It performs the batch normalization operation on inputs to the next layer so that we have our inputs in a specified scale say 0 to 1 instead of being scattered all over the place.
model.add(MaxPooling2D()) - This function performs the pooling operation on the data as explained at the beginning of the post. We are taking a pooling window of 2x2 with 2x2 strides in this model. If you want to read more about MaxPooling you can refer the Keras Documentation or the post mentioned above.
model.add(Dropout()) - As explained above Dropout is a technique where randomly selected neurons are ignored during the training. They are “dropped out” randomly. This reduces overfitting.
model.add(Flatten()) - This just flattens the input from ND to 1D and does not affect the batch size.
model.add(Dense()) - According to Keras Documentation, Dense implements the operation: output = activation(dot(input, kernel)where activationis the element-wise activation function passed as the activation argument, kernel is a weights matrix created by the layer. In simple words, it is the final nail in the coffin which uses the features learned using the layers and maps it to the label. During testing, this layer is responsible for creating the final label for the image being processed.

After the model.summary() function is executed, the output looks something like this -

On to the next chunk!

Training the CNN

This is a fairly simple chunk of code where first the model is compiled with categorical_crossentropy as the loss function and using Adam optimizer. We are using accuracy as the metrics for validation.

Next, we are fitting the model with the fixed batch size (64 here), epochs (100 here) and validation data which we obtained by splitting the training data earlier. And finally, we are saving the model for some custom tests which I will explain later.

After we run the code above (fertrain.py) we will get an output which would look something like this -

Train on 29068 samples, validate on 3230 samples
Epoch 1/100
29068/29068 [==============================] — 34s 1ms/step — loss: 2.0047 — acc: 0.2124 — val_loss: 1.8123 — val_acc: 0.2817
Epoch 2/100
29068/29068 [==============================] — 31s 1ms/step — loss: 1.7918 — acc: 0.2692 — val_loss: 1.6796 — val_acc: 0.3195
Epoch 3/100
29068/29068 [==============================] — 31s 1ms/step — loss: 1.7021 — acc: 0.3148 — val_loss: 1.5516 — val_acc: 0.3957
...
Epoch 100/100
29068/29068 [==============================] — 31s 1ms/step — loss: 0.3083 — acc: 0.9049 — val_loss: 1.3855 — val_acc: 0.6666
Saved model to disk

We can see we got a validation accuracy of 66.6% which is quite good actually! Let’s go a step ahead and test the model on the testing data which we saved earlier by running the fertest.py file. We will get an output like this-

Loaded model from diskAccuracy on test set :66.3694622458

This is an exciting result because the model which won the competition had 71.1% accuracy, which means this result puts us into 5th place! Isn’t that awesome!

Now I also created a confusion matrix to find out which emotions usually get confused with each other more often and it looked something like this-

See how Anger and Disgust were confused with each other as they are very similar negative emotions. Something similar happened with Fear and Sadness.

Building on this result I am dividing the emotions into 3 categories (Positive, Neutral and Negative) for my next project which involves giving facial emotion recognition capabilities to a robot during navigation!

You can generate your own confusion matrix by running the confmatrix.py program from the repository.

To make things more fun, I tested the model on faces of the cast from a popular TV Series F.R.I.E.N.D.S and results were pretty good!

Facial Emotion Recognition on F.R.I.E.N.D.S

Mind you, these are real predicted emotions. You can do the same on your custom test image or use this model in your own project by forking and cloning the repository and running thefertestcustom.py file!

I think that wraps it up real good. It has been a great ride as always.

Adios!

How to do Facial Emotion Recognition Using A CNN?

Written by Nishank Sharma