Ultimate Guide for Facial Emotion Recognition Using A CNN.

10 min readAug 17, 2020

A comprehensive guide for image preprocessing and to implement CNN using Keras

Note: This is a long post to read to coverup everything. So don’t get frustrated :)

Hello Folks,

As we know, emotions play a vital role in our life. We need a system that customizes its actions based on our behavior and emotions.

Major tech giant companies like Google, Microsoft, Apple are trying to make their virtual assistants like Siri, Google Assistant, Alexa appear more like humans. These companies are doing great research and development to humanize their AI functionality of their virtual assistants. The idea is to incorporate digital assistants with a psychological machine learning model that is capable of detecting human facial emotions and acts accordingly.

So It inspires me to do this project.

Getting started…

Step 1: DataSet Preparation

Download the data set from the official Kaggle website from the link below :

Challenges in Representation Learning: Facial Expression Recognition Challenge

Learn facial expressions from an image

www.kaggle.com

The data consists of 48*48 pixel grayscale images of faces.
The task is to categorize each face based on the emotion shown in the facial expression into one of seven categories (0=Angry, 1=Disgust, 2=Fear, 3=Happy, 4=Sad, 5=Surprise, 6=Neutral).

The data set contains two columns, “emotion” and “pixels”.
The “emotion” column contains a numeric code ranging from 0 to 6, inclusive, for the emotion that is present in the image.
The “pixels” column contains a string surrounded in quotes for each image. The contents of this string a space-separated pixel values in row-major order. test.csv contains only the “pixels” column and your task is to predict the emotion column.

Step 2: Data preprocessing

Data preprocessing is one of the important steps in the machine learning pipeline.

load the pixels CSV of the file into a data frame.

Apply image preprocessing techniques such as resize, reshape, convert into greyscale, and normalization.

Use the power of vectorization by converting images into NumPy arrays and pandas data frame whenever it’s necessary.

Convert the images into NumPy arrays using OpenCV and make the output as categorical using pandas.

applying normalization to speedup convergence

Step 3: Splitting dataset

Split the data set into the train and validation set. So that we can check whether the model is overfitted to the training dataset or not using the validation dataset.

Step 4: Build the model using Convolutional Neural Networks(CNN):

A Convolutional Neural Network (ConvNet/CNN) is a Deep Learning algorithm that can take in an input image, assign importance (learnable weights and biases) to various aspects/objects in the image, and be able to differentiate one from the other. The pre-processing required in a ConvNet is much lower as compared to other classification algorithms. While in primitive methods filters are hand-engineered, with enough training, ConvNets can learn these filters/characteristics.

Right now, all you need to know that a Convolution Neural Network or CNN as it is popularly called is a collection of mainly two types of layers-

The hidden layers / Feature Extraction Part

convolutions
pooling

2. The classifier part

Here I am using Keras with TensorFlow as the back-end for building Neural Networks. Further generalization, We have following layers in the keras to be added are

Convolution layer
Pooling layer
Batch normalization
Activation Layer
Dropout Layer
Flatten Layer
Dense layer

First import all dependencies:

Keras is a powerful deep learning library that is a high-level API for TensorFlow and acts as a wrapper for simplifications and abstract representations of neural networks.

Implementing Convolutional neural network using Keras may involve following intuitions, insights and back scenes

Convolution : A matrix multiplication with filter -> feature detector

It involves the following components

Convolution2d : used for filtering windows of 2 dimensional input-> if 1st layer = input_shape
filters | windows; 2 dim filter | sliding window
the sliding window slides over each channel and summarising the features.

batch normalization:

It is used to stabilize perhaps accelerate the learning process by standardizing layer inputs.

It is done by applying transformation that maintains mean activation close to 0 and activation standard deviation(square root of variance — -> how far) close to 1

Below are the intuitive points and insights related to batch normalization

normalization => process tends to follow bell shape curve known as a normal distribution
backpropagation => updated layer by layer backward from output to the input using an estimate of error that assumes weights in the layer before the current layer is fixed.
the gradient tells how to update each parameter under the assumption that other layers do not change.
all layers change during an update → this update procedure leads to forever chasing a moving target.
batch normalization => technique to coordinate the update of multiple layers in the model => reparametrization of network
Its all about standardize the mean and variance of each unit in normal dist
it’s all about standardizing inputs to layers for each mini-batch

Activation(ReLu): activation layer( non linear layer)

a convention is to apply after conv layer
to introduce non-linearity to a system that has computed linear operations in Conv
The rectified linear unit is widely used than non-linear functions(sigmoid, tanh) for its fast training without accuracy.
Relu : max(0,x)
Relu also alleviates the vanishing gradient (lower layers of network trains very slowly because the gradient decreases slowly through layers.)
without these non-linear functions(activation functions), the network would be a large linear classifier that could be simplified by multiplying weight matrices(accounting for bais). It wouldn’t do anything interesting such as image classification etc..

Pooling: involves following types — max, avg ,global max, globalavg

order : conv > activation > pooling
To reduce dimensions of feature map by reducing parameters to learn and amount of computations
it further summarizes the feature map instead of precisely positioned features generated by conv layer. This makes the model more robust to variations in the position of features in image
when network wants to detect higher-level features from low-level building blocks (detecting corners from edges). we don’t need to be rigid about the exact position . we need translational invariance at the feature level . so insert pooling
It overcomes the problem of sensitivity to the location of the features.
local translation invariance

Dropout: is a regularization technique

neurons are randomly dropped while training
this effect makes the network less sensitive to the specific weights of neurons
better generalization — less overfit

model.summary(): gives the information about the architecture and configuration of the neural network.

Data Augmentation:

Neural networks are data hunger. It’s best practice to use a large dataset since it reduces overfit and more chance to generalization.

So we use data augmentation using ImageDataGenerator() in keras

Here is a brief summary of ImageDataGenerator() and it’s parameters and usage.

horizontal and vertical shift:

for moving all pixels of an image in one direction either vertically or horizontally

width_shift_range ( horizontal shift)
height_shift_range (vertical shift)
floating num [0- 1] → % of shift

horizontal and vertical flips augmentation:for reversing rows or cols of pixels → True or False

Random rotation → from 0–360 degrees

if rotation_range = 90 ==> means random rotation to image is between 0 and 90 degrees

random brightness:it randomly darkens or brightens images. if brightness_range =[0.2,1.0] → means darkens or brightens if pixel is between 0.2 and 1

random zoom : either adds pixel or subtract pixels in image.

[1-value, 1+value]
for example , if zoom_range = .3 → means range [0.7, 1.3] or between 70%(zoom in) and 130% (zoom out)

when an object is created using following arguments. an iterator can be created for an image dataset.

it iterates through all images in memory → use obj.flow(X,y)
to iterates images through subdirectories → use obj.flow_from_directory(X,y,..)
for training ==> fit_generator()

It’s time to set the configuration for our neural network :)

Step 5: Training the model

Now its time to train our model. Training is nothing but a learning loop. here we define hyperparameters such as the number of epochs, batch size, learning rate etc..The only way to find the best parameters is by trying.

we use callbacks to record model performance while the model is learning.

Here I want to explain everything in the code and also want to mention some of the insights briefly.

callback is an object that can perform actions at various stages of training

1. write tensorflowboard logs after every batch
2. periodically save model to disk
3. do early stopping 
4. view on internal states and statistics during training
* used in fit() loop

CSVLogger(filename, separator=”,’) :

is used to save epoch results to a csv file
create obj and use that obj in fit(callbacks=[csv_logger_obj])

EarlyStopping() :

is used to stop training when a monitored metric has stopped improving
Below are the parameters used

monitor = “val_loss” → loss function to be monitored
min_delta → minimum change to count(threshold)
patience → no of epochs with no improvement to stop training

ReduceLROnPlateau()

is used reduce learning rate when metric has stopped improving
Below are the parameters

monitor, patience, min_delta
factor = 0.1 ==> learning rate reduced to 10% (lr*0.1)
verbose ==> 0: quiet , 1: update msgs

ModelCheckpoint() :

to save keras model or model weights at some frequency
Below are the parameters used

filepath
monitor → val_acc or val_loss
save_best_only = True

Step 6: Evaluating the model.

Its time to check how well the model learn the patterns from the training dataset.

Let’s see how the accuracy changes with epochs:- using history object:

History is default callbacks that is registered when training

Records training metrics for each epoch
The history object is returned from calls to the fit() function used to train the model
Metrics are stored in a dictionary in the history member of the object returned.

For demo purpose, below i use accuracy record for 20 epochs

Visualizing model training history:

Using matplotlib , let’svisualise the learning curve of the model.

Below is the code snippet, to plot loss vs number of epochs ..

to visualize how the loss decreases as the number of epochs increases.

Result:

Below is the code snippet, to plot accuracy vs number of epochs ..

To visualize how the accuracy increases as the number of epochs increases

accuracy vs number of epochs

Result:

To create a sense of feel, |I add emojis to the results.

I define all relevant emoji’s to the emotions in a dictionary . You can see the image below.

Step 7: Testing the model

Now its time for predictions . Lets test the model with some images.

Before testing , we need to preprocess the testing image.

testing code snippet

I want add some things related to OpenCV here for better understanding.

With the help of OpenCV, we can easily preprocess images.

cv2 has so many functionalities:

imread() →reading image throught its path
cvtColor() → can convert image color . Some of the parameters are: img , cv2.COLOR_BGR2GRAY
cascadeClassifier() → consists of function detectMultiScale() →it is used to detect multiple faces in an image
rectangle() → used to draw box in an image .some important parameters are →img, startpoint, endpoint, color,thickness
putText() →used to write text in an image . Parameters →img, text, coordinates, font, fontScale, color, thickness, lineType

Things that need to be observe in the above code for better intuition are: