Emotion Detection Using OpenCV and Keras

Published in

The Startup

13 min readJun 23, 2020

Emotion Detection or Facial Expression Classification is a widely researched topic in today’s Deep Learning arena. To classify your emotions in real-time using just you camera and some lines of code is actually a big step towards Advanced Human Computer interaction.

Objective

In this post i will explain how to create Convolution Neural Network Model from scratch by using Keras: the python deep learning api for emotion detection using the live camera feed of your system. For detailed description i have embedded the links in the text written in bold to the documentation page.

Prerequisites

To understand this thoroughly you should have some basic knowledge of :

Python
OpenCV
Convolution Neural Network (CNN) and the various layers used to make it
numpy

( NOTE: I have used tensorflow version 1.13.1 and keras version 2.3.1. The syntax is slightly changed in tensorflow version 2.0 and later versions. )

Model Creation

Firstly i will explain and walk you though the code for model creation.

I have divided this part into 5 tasks to make it easier to understand.

Task 1:

Import the required modules that are needed in this project. These are the wheels of the project.

Now let’s define some variables that will save the time of writing the values manually again and again.

num_classes=5
img_rows,img_cols=48,48
batch_size=32

The discription for above variables are as follows:

num_classses = 5 : This variable defines the number of classes or the emotions that we will be dealing with in training our model.
img_rows,img_cols=48,48 : These variables define the size of the image array that we will be feeding to our neural network.
batch_size=32: This variable defines the batch size.The batch size is a number of samples processed before the model is updated. The number of epochs is the number of complete passes through the training dataset. The size of a batch must be more than or equal to one and less than or equal to the number of samples in the training dataset.

Task 2:

Now its time to take out the big guns!!

Now it’s time for loading the blueprint of the model i.e. our dataset. It’s the dataset that makes a Deep Learning model what it is. Here i am using the fer2013 dataset which is an open source dataset hosted on kaggle. The dataset contains totally 7 classes namely Angry,Disgust,Fear,Happy,Sad,Surprise and Neutral.The training set consists of a total of 28,709 examples. But unfortunately the dataset has been removed from the website. But no worries i have added the dataset alongwith all the code with explanation in my GitHub repository.

The Dataset in my Repository

I have segregated the data in different folders containing images pertaining to the foldername. For example, Angry folder contains pics with angry faces etc. Here we are using 5 classes which include Angry,Happy,Sad,Surprise and Neutral. So in total i am using 24256 images as training data and using 3006 images as validation data.

Now let’s load the data in some variables.

train_data_dir='fer2013/train'
validation_data_dir='fer2013/validation'

The above two lines import the validation and training data. The model is trained on the training dataset and the validation dataset is a part of the original dataset which is separated from it to check the performance of the model on the data that it has never seen before.

Task 3:

Now we will be using Image Augmentation techniques om our dataset. Image Data Augmentation is a technique that can be used to artificially expand the size of a training dataset by creating modified versions of images in the dataset. The Keras deep learning neural network library provides the capability to fit models using image data augmentation via the ImageDataGenerator class.

The train_datagen variable will artificially expand the dataset using the following:

rotation_range: Degree range for random rotations. Here i am using 30 degrees.
shear_range: Shear Intensity (Shear angle in counter-clockwise direction in degrees). Here i am using 0.3 as shear range.
zoom_range: Range for random zoom.Here i am using 0.3 as zoom range.
width_shift_range: This shifts the images by a value across its width.
height_shift_range : This shifts the images by a value across its height.
horizontal_flip: This flips the images horizontally.
fill_mode: This is used to fill in the pixels after making changes to the orientation of the images by the above used methods. Here i am using ‘nearest’ as the fill mode as i am instructing it to fill the missing pixels in the image with the nearby pixels.

Here i am just resclaing the validation data and not performing any other augmentaions as i want to check the model with raw data that is different from the data used in the training of the model.

The output of the above code will be :

Found 24256 images belonging to 5 classes.
Found 3006 images belonging to 5 classes.

Now in the above code i am using the flow_from_directory() method to load our dataset from the directory which is augmented and stored in the train_generator and validation_generator variables. flow_from_directory() actually takes the path to a directory & generates batches of augmented data. So here i am giving some options to the method to automatically change the dimension and divide it in the classes so that it is easier to feed in the model.

The options given are:

directory: The directory of the dataset.
color_mode: Here i am converting the images to gray-scale as i am not interested in the color of the images but only the expressions.
target_size: Convert the images to a uniform size.
batch_size: To make baches of data to train.
class_mode: Here i am using ‘categorical’ as the class mode as i am categorizing my images into 5 classes.
shuffle: To shuffle the dataset for better training.

Task 4:

The dataset modifications is complete and now it’s time to make the brain of the model or you can say the engine of our system i.e. the CNN Network.

So firstly i will define the type of model that i will be using. Here i am using a Sequential model which defines that all the layers in the network will be one after the other sequentially and storing it in a variable model.

model = Sequential()

The network consists of 7 blocks:
(Note:I will be explaining each layer one by one at the last)

In the output after running the above code you will get some warning if you are using an older version of tensorflow .

This seems a lot,actually it is a lot.

Here i have used the layers of 7 types which are present in keras.layers.

The layers are:

Conv2D(
filters, kernel_size, strides=(1, 1), padding=’valid’, data_format=None,
dilation_rate=(1, 1), activation=None, use_bias=True,
kernel_initializer=’glorot_uniform’, bias_initializer=’zeros’,
kernel_regularizer=None, bias_regularizer=None, activity_regularizer=None,
kernel_constraint=None, bias_constraint=None, **kwargs
)
Activation(activation_type)
BatchNormalization()
MaxPooling2D(pool_size, strides, padding, data_format, **kwargs)
Dropout(dropout_value)
Flatten()
Dense(
units,
activation=None,
use_bias=True,
kernel_initializer=”glorot_uniform”,
bias_initializer=”zeros”,
kernel_regularizer=None,
bias_regularizer=None,
activity_regularizer=None,
kernel_constraint=None,
bias_constraint=None,
**kwargs)

Block-1 layers in the order of occurrence are as follows :

Conv2D layer- This layer creates a convolutional layer for the network. Here i am creating a layer with 32 filters and a filter size of (3,3) with padding=’same’ to pad the image and using the kernel initializer he_normal. I have added 2 convolutional layers each followed by an activation and batch normalization layers.
Activation layer — I am using a elu activation.
BatchNormalization — Normalize the activations of the previous layer at each batch, i.e. applies a transformation that maintains the mean activation close to 0 and the activation standard deviation close to 1.
MaxPooling2D layer — Downsamples the input representation by taking the maximum value over the window defined by pool_size for each dimension along the features axis.Here i have used the pool_size as (2,2).
Dropout: Dropout is a technique where randomly selected neurons are ignored during training. Here i am using dropout as 0.5 which means that it will ignore half of the neurons.

Block-2 layers in the order of occurrence are as follows :

Same layers as block-1 but the convolutional layers have 64 filters.

Block-3 layers in the order of occurrence are as follows :

Same layers as block-1 but the convolutional layers have 128 filters.

Block-4 layers in the order of occurrence are as follows :

Same layers as block-1 but the convolutional layers have 256 filters.

Block-5 layers in the order of occurrence are as follows :

Flatten layer — To flatten the output of the previous layers in a flat layer or in other words in the form of a vector.
Dense layer — A densely connected layer where each neuron is connected to every other neuron. Here i am using 64 units or 64 neurons with a kernel initializer — he_normal.
These layers are followed by activation layer with elu activation , batch normalization and finally a dropout with 50% dropout.

Block-6 layers in the order of occurrence are as follows :

Same layers as block 5 but without flatten layer as the input for this block is already flattened.

Block-7 layers in the order of occurrence are as follows :

Dense layer — Finally in the final block of the network i am using num_classes to create a dense layer having units=number of classes with a he_normal initializer.
Activation layer — Here i am using a softmax layer which is used for multi-class classifications.

Too many layers, but finally it’s over!!

Now to check the overall structure of the model:

print(model.summary())

The output will be:

The above output shows all the layers used in this network.This is a big network which consits of 1,328,037 parameters.

Task 5:

Compile and Train: The final Chapter !!

Now the only thing left is to compile and train the model. But first let’s import some more things.

Now let’s do the magic.

Before compiling i will create 3 things using keras.callbacks class:

Checkpoint( Function — ModelCheckpoint() )

It will monitor the validation loss and will try to minimize the loss using the mode=’min’ property. When the checkpoint is reached it will save the best trained weights. Verbose=1 is just for visualization when the code created checkpoint.Here i am using it’s following parameters:

file-path: Path to save the model file.Here i am saving the model file with the name EmotionDetectionModel.h5
monitor: Quantity to monitor.Here i am monitoring the validation loss.
mode: One of {auto, min, max}. If save_best_only=True, the decision to overwrite the current save file is made based on either the maximization or the minimization of the monitored quantity.
save_best_only: If save_best_only=True, the latest best model according to the quantity monitored will not be overwritten.
verbose: int. 0: quiet, 1: update messages.

Early Stopping ( Function — EarlyStopping() )

This will stop the execution early by checking the following properties.

monitor: Quantity to monitor.Here i am monitoring the validation loss.
min_delta: Minimum change in the monitored quantity to qualify as an improvement, i.e. an absolute change of less than min_delta, will count as no improvement.Here i have given it 0.
patience: Number of epochs with no improvement after which training will be stopped. Here i have given it 3.
restore_best_weights: Whether to restore model weights from the epoch with the best value of the monitored quantity. If False, the model weights obtained at the last step of training are used.Here i have given it True.
verbose: int. 0: quiet, 1: update messages.

Reduce Learning Rate ( Function — ReduceLROnPlateau() )

Models often benefit from reducing the learning rate by a factor of 2–10 once learning stagnates. This callback monitors a quantity and if no improvement is seen for a ‘patience’ number of epochs, the learning rate is reduced. I have used the following properties for this.

monitor: To monitor a particular loss. Here i am monitoring the validation loss.
factor: Factor by which the learning rate will be reduced. new_lr = lr * factor. Here i am using 0.2 as factor.
patience: Number of epochs with no improvement after which learning rate will be reduced.Here i am using 3.
min_delta: Threshold for measuring the new optimum, to only focus on significant changes.
verbose: int. 0: quiet, 1: update messages.

Now it’s time to finally compile the model using model.compile() and fit or train the model on the dataset using model.fit_generator()

model.compile()

It has the following arguments:

loss: This value will determine the type of loss function to use in your code. Here i have categorical data in 5 categories or classes so i have used ‘categorical_crossentropy’ loss.
optimizer: This value will determine the type of optimizer function to use in your code.Here i have used Adam optimizer with learning rate 0.001 as it is the best optimizer for categorical data.
metrics: The metrics argument should be a list — you model can have any number of metrics.It is the list of metrics to be evaluated by the model during training and testing.Here i have used accuracy as metric which will compile mu model according to the accuracy.

model.fit_generator()

Fits the model on data yielded batch-by-batch by a Python generator.

It has the following arguments:

generator: The train_generator object that we created earlier.
steps_per_epochs: The steps to take on the training data in one epoch.
epochs: The total number of epochs (pass though the whole dataset once).
callbacks: The list containing all the callbacks that we created earlier.
validation_data: The validation_generator object that we created earlier.
validation_steps: The steps to take on the validation data in one epoch.

DONE!!

The model generation is completed now you can use this model to create the emotion detector.

Driver Code

Now i’ll explain the code for Emotion Detection using the model that i created in the above section.

Now we have the engine and the only thing left to make is the body so that our software is complete.

First let’s once more import some modules that are need to run the code.

Now let’s load the model and also load a classifier that i have used to detect the face of a person in-front of the camera. I have used the haarcascade_frontalface_default classifier. Haar Cascade is a machine learning object detection algorithm used to identify objects in an image or video and based on the concept of features proposed by Paul Viola and Michael Jones in their paper “Rapid Object Detection using a Boosted Cascade of Simple Features” in 2001. The haarcascade_frontalface_default classifier detects the front face of a person in an image or a continuous video feed.

face_classifier=cv2.CascadeClassifier('/haarcascade_frontalface_default.xml')classifier = load_model('/EmotionDetectionModel.h5')

Now i’ll define a variable class_labels to store the name of the classes or the types of emotions we are going to predict and also a variable cap to store the value returned by the cv2.VideoCapture method. Here the value 0 in VideoCapture is used to instruct the method to use the primary webcam of a laptop.

class_labels=['Angry','Happy','Neutral','Sad','Surprise']
cap=cv2.VideoCapture(0)

Now let the magic begin!!
Now i’ll explain the code to make boxes around the faces detected by the classifier in the camera feed. I will explain the code line by line and the at last i will sum up everything.

while True:
    ret,frame=cap.read()

This while loop runs infinitely till someone presses the ‘q’ key on the keyboard. The first line after while reads a frame from the running camera feed and stores it in the frame variable.

gray=cv2.cvtColor(frame,cv2.COLOR_BGR2GRAY)
faces=face_classifier.detectMultiScale(gray,1.3,5)

The variable gray stores the image or the frame converted to gray scale by using the cvtColor() function of openCV and passing the cv2.COLOR_BGR2GRAY attribute to the function which converts the BGR color to Gray. The detecMultiScale function uses the haar cascade classifier and detect the faces in the image and returns a list containg the coordinates of the rectangle around the face and stores it in the variable faces.

An image may consist of many faces of many people so now let’s loop over the elements of the list to get the different rectangle values for different faces.

The cv2.rectangle method creates a rectangle using the (x,y,w,h) of one face in one iteration of the loop.

The roi_gray variable stores only that face from the image stored as an array of values stored in variable gray which has the rectangle around it in the current iteration and then resizes the array to a size (48,48) with an interpolation INTER_AREA.Resizing an image needs a way to calculate pixel values for the new image from the original one. The five such interpolation methods provided with OpenCV are INTER_NEAREST, INTER_LINEAR, INTER_AREA, INTER_CUBIC, and INTER_LANCZOS4.

The INTER_AREA calculates the pixels using the pixel-area realation.

The if condition is true when the total summation of values in the roi_gray array is not equal to 0 which means that a face is detected and something else is not detected as a face by mistake by the classifier. Then the roi_gray array is converted to float and pixel values in it are devided by 255.0 to get values between 0 and 1 so that it is easier for the model to predict the results and it is stored in the roi variable. The value in roi is converted to array to use making the prediction by using the model.

Finally the cv2.putText Method is used to display the emotion detected on the screen beside the frame and the cv2.imshow method displays the result.

The final if condition is to close the window when q is pressed on the keyboard.

Then,

cap.release()
cv2.destroyAllWindows()

To close the window and release the camera frame.

So, finally i’ll put a gist to show all the code from while till the last line all together.

DONE!!

Finally all the tasks are over and the Emotion Detection using Keras and OpenCV is over.

Conclusion

So here i have explained the process of creating the Emotion Detection Using OpenCV and Keras. You can see the full code along with the dataset in my GitHub repository. I have also made a script to run the code in a flask server and display the result in the browser. My output is as follows.

I hope that you enjoyed learning through my post. You can checkout my other codes on my GitHub and also checkout my LinkedIn profile. Feel free to post your doubts and feedback in the responses section. Thanks a lot for reading!!

Emotion Detection Using OpenCV and Keras

Objective

Prerequisites

( NOTE: I have used tensorflow version 1.13.1 and keras version 2.3.1. The syntax is slightly changed in tensorflow version 2.0 and later versions. )

Model Creation

Task 1:

Task 2:

The Dataset in my Repository

Task 3:

Task 4:

Block-1 layers in the order of occurrence are as follows :

Block-2 layers in the order of occurrence are as follows :

Block-3 layers in the order of occurrence are as follows :

Block-4 layers in the order of occurrence are as follows :

Block-5 layers in the order of occurrence are as follows :

Block-6 layers in the order of occurrence are as follows :

Block-7 layers in the order of occurrence are as follows :

Task 5:

Checkpoint( Function — ModelCheckpoint() )

Early Stopping ( Function — EarlyStopping() )

Reduce Learning Rate ( Function — ReduceLROnPlateau() )

model.compile()

model.fit_generator()

DONE!!

Driver Code

DONE!!

Finally all the tasks are over and the Emotion Detection using Keras and OpenCV is over.

Conclusion

Written by Karan Sethi