Facial Emotion Recognition (FER) using Keras

Published in

Analytics Vidhya

8 min readSep 18, 2020

This story will walk you through FER, it’s applications and more importantly how we can create our own FER system using Tensorflow-Keras.

This story is divided into following sections,

Introduction
Inspecting and manipulating the data
Creating our own custom FER model from scratch
Applications of FER systems

Introduction

Facial Emotion Recognition (commonly known as FER) is one of the most researched field of computer vision till date and is still in continuous evaluation and improvement. The idea is, can we detect the emotion a person is having by his/her facial expressions? Of-course as a human we can very easily do that and in-fact we do this thousands of times every day. But can we make our so called dumb-machines intelligent enough so they can also achieve human level performance on these emotion recognizing task? See, because we humans have emotions that’s why we are easily able to detect other’s emotions but that’s not the case with machines.

The advancements in the field of computer vision are skyrocketing, these tasks are no more that difficult, at-least we are able to achieve good performance with very less amount of efforts. We will see how that can be done with few lines of python code.

Note: To get the full out of this story, you should have some basic understanding of python and some basics of neural networks specifically CNN.

Inspecting and manipulating the data

As we all know adding intelligence to machine is more about letting them learn from data via some algorithm, and of-course for that we need DATA. Data is the most important part of any machine learning/ deep learning project, because after all our trained model is a product of the data on which it is trained. I mean the better our data represent the real world the more our model behave like real and perform well in real world. Remember one thing “GARBAGE IN, GARBAGE OUT”, if we train on data containing lots of garbage then in production our model also throw garbage. So, DATA is the most important building block for any ML/DL task.

So, we need data for this FER task as well. We will train our model on that and then test it’s performance on data kept aside and also in real-time video stream. Note that this is a supervised learning problem i.e., the learned model y is a function of data x.

For this task I am going to use a very popular data available at kaggle. It’s name is same as the task because it is collected for this purpose only. You can also use other datasets as well, there are few more which are publicly available, or you can create your own.

We will get more insights in this data as we proceed, so stay tuned…

Now we will wet our hands in some real python codes. Firstly import all needed libraries,

Let’s inspect the data,

This data is not in image format but it’s in a format called data-frame. The pixels column of the data-frame contains all the pixel values. There total 96 pixel values associated with each image because each image is grey-scaled and of resolution 48x48.

Now, we will check the number of emotion categories we have and the number of images in each of those categories.

So, there are 7 categories of emotions in this data-set and emotion disgust has the minimum images around 5–10% of other classes.

Let’s visualize the images of each emotion category.

We can conclude following points from these images,

The data contains a wide range of images like, male, female, kids, olds, white, black etc.
It contains some non-human images, like cartoons(first row, last column) and in-fact contains images that doesn’t contain any face but is either blank image or contain some text.
The data-set contains images in wild settings, by wild settings I mean they are not clicked in laboratory under proper lighting conditions(like ck+48). But rather they are taken from web through web scraping.

For simplicity we will train our upcoming model on top three classes only i.e., 3:Happy, 4:Sad and 6:Neutral.

Creating our own custom FER Model

We will create an Convolutional Neural Network(CNN) for this task and then we will feed batches of 48x48x1 gray-scaled images. But the current data we have is not in that format, so we need to make the data compatible for our model otherwise it will crash even before it start learning.

Below is the code which will make the data compatible with our upcoming model. I will explain it line by line.

In line-1, I converted each flattened image into a square 3 dimensional image of size 48x48x1(note, as this is a grey-scaled image so there is only a single channel).

In line-2, I stacked all those images along the 4th dimension because we feed data as batches to our model rather than feeding one image at a time. This is because we will be using mini-batch gradient descent as our optimizer. In line-3 you can verify the shape.

Now our images(X) are ready but we also need to make our labels compatible with our model. From line 5–8, I am label encoding my categories.

le_name_mapping , is the mapping from original class label to new label. Like, the emotion Happy which was originally 3 is now labelled 0.

Now, will Split the data into training and validation set. We will train on training data and validate our model on validation data.

Now, we will normalize the image arrays, this is done because neural networks are highly sensitive to non-normalized data. We will use min-max normalization.

For these gray-scaled images min=0, max=255 therefore we will divide the array by 255 because,

Below is a Convolutional Neural Network (CNN), I used following settings :

For generalization purpose dropouts are used at regular intervals.
ELU is used as the activation function because firstly it avoids dying relu problem but it also performed well as compared to LeakyRelu, at-least in this case.
he_normal is used as kernel initializer as it suits ELU.
Batch Normalization is also used for better results.

I used two callbacks one is early stopping for avoiding over-fitting the training data and other ReduceLROnPlateau for reducing learning rate whenever the validation accuracy plateaus.

ImageDataGenerator is also used as this helps in model performance because we add some changes to the images which we may encounter a lot in real world like shearing, rotations etc.

I tried both Nadam and Adam optimizer but achieved similar performance. Batch size of 32 is used, and trained for a max 100 epochs.

Let’s now train the model and log the training performance.

Let’s plot the training and validation metrics,

The epoch’s history shows that accuracy gradually increases and achieved +83% accuracy on both training and validation set, but at the end the model started over-fitting training data and here the model automatically stopped because we have enabled early-stopping. Also, ReduceLROnPlateau is called whenever the accuracy plateau.

We should also save this model using save function and use this for latter use.

Let’s plot the distribution of training and validation metrics.

We will now visualize what we called a confusion-matrix, it is one of the most widely used evaluator for multi-class classification. It gives us a good glance of model’s performance on all classes.

The confusion matrix clearly shows that our model is doing good job on the class happy but it's performance is low on other two classes. One of the reason for this could be the fact that these two classes have less data. But when I looked at the images I found some images from these two classes are even hard for a human to tell whether the person is sad or neutral. Facial expression depends on individual as well. Some person's neutral face looks more like sad.

Let’s inspect the errors,

See, the first row 7th image looks more like neutral rather than sad and our model even predicted it neutral. Whereas the last image in second row is very much sad rather than neutral which indeed our model predicted. So, our model error rate is around 17% but some of the errors made by our model is due to the errors in the data itself.

Now what next? Should we stop here? No not at all. The purpose of any model is not just to train it and validate it but rather test/use it in the real world. I moved ahead way long with this project by trying many different models and more emotion classes. And at the end I integrated my model with OpenCV and tested it on videos and even on real-time webcam stream. I did this by feeding video frame by frame to my model. I achieved good fps, close to real-time predictions.

Here, is a 2 minute demo video of the power of our model, in this I used many emotions and did cool annotations as well.

Here is the full project hosted on Github.

Applications of FER systems

There are many applications and some are even emerging, I mentioned few of them :

In customer satisfaction for a product(Visual sentiment analysis)
In medical diagnosis
Understanding human behaviour
For fun … :)

You can get the entire jupyter notebook for this story from here, you just need to fork it. Also if you like the notebook then up-vote, it motivates me for creating further quality content.

Here is the second part of this FER series, in this I explained how we can feed tiny video clips as input to the model. For that I used time-distributed Convolutional followed by Bidirectional LSTM. It’s a must read, you will surely learn something new.

If you like this story then do clap and also share with others.

Also, have a read of my other stories which includes variety of topics including,

and many more.

Thank-you once again for reading my stories my friends :)

Facial Emotion Recognition (FER) using Keras

Introduction

Inspecting and manipulating the data

Creating our own custom FER Model

Applications of FER systems

Written by Gaurav Sharma