Facial expression detection using Machine Learning in Python

Aaditya Singhal
Analytics Vidhya
Published in
14 min readJan 5, 2021

To detect facial expressions using machine learning algorithms such as CNN and then predicting what kind of expression is shown.

Example :

As above in the clip we can see that a box which is of blue color is highlighting the the face of the lady. And above the box in cyan color the expression predicted by our machine learning algorithm is displayed. And the bigger Grey color text is the actual expression.

Solution/ Approach :

Introduction (Why its needed) ->

First of all emotion detection is a very important task for many companies to understand how are their consumers reacting to the products launched by them. Also it can be used to know whether their employees are satisfied with the facilities given to them. Also it has many other use cases like checking the mood of a person without getting near to him as we are using camera to detect. Also the same algorithm just needs a little modification and can be used in other fields like face detection, attendance system, mask detection and many more…

Step #1 ->

Importing the required python libraries like numpy, seaborn, matplotlib, tensorflow


import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import utils
import os
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.layers import Dense, Input, Dropout,Flatten, Conv2D
from tensorflow.keras.layers import BatchNormalization, Activation, MaxPooling2D
from tensorflow.keras.models import Model, Sequential
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.callbacks import ModelCheckpoint, ReduceLROnPlateau
from tensorflow.keras.utils import plot_model
from IPython.display import SVG, Image
from livelossplot.inputs.tf_keras import PlotLossesCallback
import tensorflow as tf
print("Tensorflow version:", tf.__version__)

In the above code we have imported different modules from keras which is a wrapping on tensorflow. We have imported particularly that modules from keras which will help in building a CNN model. At last line in above code we have printed the current version of tensorflow in the system.

Please note that -> The required version of the tensorflow must be greater than or equal to 2.0

Step #2 ->

Now we will get the dataset on which our model will be trained and we will validate how good or bad our model has performed on that particular dataset, so that we can improve the accuracy score. I have uploaded the dataset used by me on Kaggle so that anyone can access that. The Link of the dataset is : — https://www.kaggle.com/aadityasinghal/facial-expression-dataset

The dataset has seven categories of expressions.

Step #3 ->

Now as we have got the dataset, Now let us get some information about the train and test folder.


for expression in os.listdir("PATH OF TRAIN FOLDER"):
print(str(len(os.listdir("PATH OF TRAIN FOLDER" + expression))) + " " + expression + " images")

The above code uses os library to get the train folder path and after that we are printing the total number and category of each expression folder.

Step #4 ->

Now we will generate training and testing (validation) batches so that our model could be trained and evaluated/validated on the test data. This is a very important step as without this we can’t have a accurate model and also without training the model also doesn’t know what it has to look for and also predict for.


img_size = 48
batch_size = 64
datagen_train = ImageDataGenerator(horizontal_flip=True)train_generator = datagen_train.flow_from_directory("PATH OF TRAIN FOLDER",
datagen_validation = ImageDataGenerator(horizontal_flip=True)
validation_generator = datagen_validation.flow_from_directory("PATH OF TEST FOLDER",

In the above code we have defined the image size to 48 so each image will be reduced to a size of 48x48. After that we have defined the batch size equal to 64 which means that in each epoch ,i.e., when the model is ran through the training dataset at each cycle , the number of images that will be passed will be 64. Which means that model will take first 64 images for training through first epoch and will continue so on till all the epochs are completed.

After that we have used ImageDataGenerator from keras module which Generate batches of tensor image data with real-time data augmentation. Here in this we have kept horizontal flip True which means it will randomly flips input images horizontally.

Now we will perform the main and most important step ,i.e., generating train and test data images.

Firstly we will go for train data. Here we have used a function from datagen_train which is from ImageDataGenerator ,i.e., flow_from_directory which takes few parameters like the path of the dataset, target_size (size of the output image), color_mode (color of the output images, we have set gayscale which gives grey images), batch_size, class_mode (Determines the type of label arrays that are returned , we have specified categorical) and we have shuffle False.

Now we will do the above same for generating the test dataset images. The only change will be the path of the test folder.

Step #5 ->

As of now we have imported libraries, got dataset, and created train and test images. Now its time to move to another important step which is building the CNN Model.


# Initialising the CNN
model = Sequential()
# 1 - Convolution
model.add(Conv2D(64,(3,3), padding='same', input_shape=(48, 48,1)))
model.add(MaxPooling2D(pool_size=(2, 2)))
# 2nd Convolution layer
model.add(Conv2D(128,(5,5), padding='same'))
model.add(MaxPooling2D(pool_size=(2, 2)))
# 3rd Convolution layer
model.add(Conv2D(512,(3,3), padding='same'))
model.add(MaxPooling2D(pool_size=(2, 2)))
# 4th Convolution layer
model.add(Conv2D(512,(3,3), padding='same'))
model.add(MaxPooling2D(pool_size=(2, 2)))
# Flattening
# Fully connected layer 1st layer
# Fully connected layer 2nd layer
model.add(Dense(7, activation='softmax'))opt = Adam(lr=0.0005)
model.compile(optimizer=opt, loss='categorical_crossentropy', metrics=['accuracy'])

In the above code we have created the basic structure of how our neural network will look. For this we have taken help from the keras module which is a great wrapper of the tensorflow library and helps to reduce our work. First of all we have initialized the CNN model by using sequential() function. After that we have created the first four layer of the neural network which are Convolution Layer.

The convolutional neural network, or CNN for short, is a specialized type of neural network model designed for working with two-dimensional image data, although they can be used with one-dimensional and three-dimensional data. A convolution is a linear operation that involves the multiplication of a set of weights with the input, much like a traditional neural network. The multiplication is performed between an array of input data and a two-dimensional array of weights, called a filter or a kernel. The visualization for this is given below :-

The convolutional layer can be added to our model by using model.add(). This takes few parameters like Conv2D which specifies that the layer to be added is convolutional layer. This method also takes two parameters , The first one the number of filters and the pooling. After that we apply Batch Normalization which applies a transformation that maintains the mean output close to 0 and the output standard deviation close to 1. After that activation function is added which does non linear transformation that we do over the input signal. This transformed output is then sent to the next layer of neurons as input. And after that we have applied MaxPolling2D which take out only the maximum from a pool. And finally we add dropout which prevents the model from overfitting. And here we complete building a convolution layer.

Here for all the four convolution layers all the parameters except Conv2D are kept same. for the first Conv we have also mentioned input size, the size of the image to be received.

After that we have added flatten layer which converts the data into a 1-dimensional array for inputting it to the next layer. We flatten the output of the convolutional layers to create a single long feature vector. And it is connected to the final classification model, which is called a fully-connected layer.

Finally we have added to fully connected layers to the model. To create a fully connected layer first we add a Dense Layer which is a regular deeply connected neural network layer. It is most common and frequently used layer. After that we add Batch Normalization layer. And at last activation and dropout layer.

For all the layers till now we have kept the activation function ReLU because the ReLU is half rectified (from bottom). f(z) is zero when z is less than zero and f(z) is equal to z when z is above or equal to zero.

But after all the above layers we will finally add a Dense Layer layer with activation set as Softmax which turns numbers aka logits into probabilities that sum to one. Softmax function outputs a vector that represents the probability distributions of a list of potential outcomes.

Thus In the output of above model we finally get probabilities in range 0 to 1 thus making easy for us to classify the expression.

After that we have used model.compile to compile the model. It takes some parameters like optimizer which optimize the input weights by comparing the prediction and the loss function. We have kept optimizer to Adam with learning rate (lr) specified. After that we have added loss to categorical_crossentropy and we have kept metrices (which is used to evaluate the performance of model equal to) accuracy. Finally we have outputted model summary using model.summary() .

Step #6 ->

As of now we have compiled our model . Now let us visualize the model architecture using the code given below.


plot_model(model, to_file='model.png', show_shapes=True, show_layer_names=True)
Image('model.png',width=400, height=200)

In the above code we have used plot_model which is also a function of keras library. we have given same parameters to it like model name , name of the file ,i.e., to_file , show_shapes which displays the shape info and show_layer_names: to display layer names.

After that we have used Ipython library function Image to display the model architecture in the output. It takes few arguments like file name, width and height of the image to be displayed. The output of this (the image) is shown below.

Step #7 ->

Now of as we have successfully built the model architecture , Now it’s time to train the model and evaluate the results.


%%timeepochs = 15
steps_per_epoch = train_generator.n//train_generator.batch_size
validation_steps = validation_generator.n//validation_generator.batch_size
reduce_lr = ReduceLROnPlateau(monitor='val_loss', factor=0.1,
patience=2, min_lr=0.00001, mode='auto')
checkpoint = ModelCheckpoint("model_weights.h5", monitor='val_accuracy',
save_weights_only=True, mode='max', verbose=1)
callbacks = [PlotLossesCallback(), checkpoint, reduce_lr]
history = model.fit(
validation_data = validation_generator,
validation_steps = validation_steps,

The above code train our model on the training dataset and at the same time it validates on the test/validation dataset.

First of all We have set the number of epochs where One Epoch is when an ENTIRE dataset is passed forward and backward through the neural network only ONCE. Since one epoch is too big to feed to the computer at once we divide it in several smaller batches. As the number of epochs increases, more number of times the weight are changed in the neural network and the curve goes from underfitting to optimal to overfitting curve.

Important Note:- I have kept the number of epochs to 15 , you can increase that to gain more accuracy and have great results.

After that we have set the steps per epoch and validation steps equal to the the integer output of the division of the total number of images divided by the batch size.

After that we have set ReduceLROnPlateau that adjusts the learning rate when a plateau in model performance is detected, e.g. no change for a given number of training epochs. After that we have set the ModelCheckpoint which allows you to define where to checkpoint the model weights, how the file should named and under what circumstances to make a checkpoint of the model. After this we have set the Callbacks with PlotLossesCallback() which gives live report of the how the training is going on and we have added checkpoints and reduce_lr also.

Finally we have done model.fit() which starts the training and validation of

our model. Its takes some parameters as input like x -> training dataset which is training_generator after that steps per epoch, validation data , validation steps and callbacks. The Output after running the above code is given below.

Log-loss (cost function):
training (min: 0.866, max: 1.786, cur: 0.866)
validation (min: 0.970, max: 1.705, cur: 0.970)
training (min: 0.313, max: 0.675, cur: 0.675)
validation (min: 0.381, max: 0.643, cur: 0.643)
Epoch 00015: saving model to model_weights.h5
448/448 [==============================] - 27s 60ms/step - loss: 0.8659 - accuracy: 0.6748 - val_loss: 0.9700 - val_accuracy: 0.6426
CPU times: user 6min 50s, sys: 57.4 s, total: 7min 47s
Wall time: 6min 46s

We got a accuracy of nearly 68% on the dataset by running only 15 epochs. You can run for more epochs and can change some other parameters to score a higher accuracy.

Step #8 ->

Now as we have got a trained model , Lets save the model in json format with its weights also saved.


model_json = model.to_json()
with open("model.json", "w") as json_file:

In the above code we have first converted model to json format. After that we have saved the weights of the model in .h5 format. After that we have opened a model.json file in write mode and we wrote the model which was converted into json format into this file. Now we have got two files named model.json and model_weights.h5 file for model and its weight respectively and which can be used anywhere to make predictions.

Step #9 ->

Now we will write a Python code for loading the model and weights and making Predictions.


from tensorflow.keras.models import model_from_jsonclass FacialExpressionModel(object):    EMOTIONS_LIST = ["Angry", "Disgust",
"Fear", "Happy",
"Neutral", "Sad",
def __init__(self, model_json_file, model_weights_file):
# load model from JSON file
with open(model_json_file, "r") as json_file:
loaded_model_json = json_file.read()
self.loaded_model = model_from_json(loaded_model_json)
# load weights into the new model
def predict_emotion(self, img):
self.preds = self.loaded_model.predict(img)
return FacialExpressionModel.EMOTIONS_LIST[np.argmax(self.preds)]

In the above code we have first imported model_from_json function which helps us to import the model from a json file. After the we have written a python class in which it has first a list of emotions which our dataset contained. After that we have defined the init method which takes the model.json file and the model weights file which is in .h5 format. After that in this we are reading the json file and using the model_from_json function to load the model. After that we are loading the weights into the model.

After this in the Class we have defined a method named predict_emotion which gives the prediction of the image. First it uses .predict method to give prediction after that we are using the numpy argmax to get a integer number b/w 0–6 representing the corresponding emotion in the list. And finally we return that particular emotion name.

Step #10 ->

Now as we have mode the code to load the weights to model , Now we will get the frames of the video and will perform the predictions on that.


import cv2facec = cv2.CascadeClassifier('haarcascade_frontalface_default.xml')
model = FacialExpressionModel("model.json", "model_weights.h5")
class VideoCamera(object):
def __init__(self):
self.video = cv2.VideoCapture(0)
def __del__(self):
# returns camera frames along with bounding boxes and predictions
def get_frame(self):
_, fr = self.video.read()
gray_fr = cv2.cvtColor(fr, cv2.COLOR_BGR2GRAY)
faces = facec.detectMultiScale(gray_fr, 1.3, 5)
for (x, y, w, h) in faces:
fc = gray_fr[y:y+h, x:x+w]
roi = cv2.resize(fc, (48, 48))
pred = model.predict_emotion(roi[np.newaxis, :, :, np.newaxis])
cv2.putText(fr, pred, (x, y), font, 1, (255, 255, 0), 2)
return fr

In the above code first of all we are importing the OpenCV Module. After this we are setting the CascadeClassifier with Haar Cascade Classifiers that is used to detect features by superimposing predefined patterns over face segments and are used as XML files. In our model.

The Haar Cascade Classifiers which we used can be found HERE.

After that we have called the model by passing the model.json file and weights file. After this we have set the font for the CV2. After that we have written a python class. In the class first of all we have declared init method which uses cv2 VideoCapture method to access the camera or the video file for which you want the predictions.

An Important Note :-

In the Code above we have passed 0 as argument in the VideoCapture. You can change the 0 to the path of the video file to make predictions on a video file. Here 0 means that the CV2 will get video from your PC’s Webcam.

After that we have declared the destructor for the class which releases the video and stops the methods when you want to quite.

After that we have declared a method called get_frame which firsts read the video. After that we have used cv2.cvtColor() method which is used to convert an image from one color space to another. Here we have used cv2.COLOR_BGR2GRAY which converts the image to gray color as our model was trained on gray color images. After that we have used detectMultiScale() which detects objects of different sizes in the input image. The detected objects are returned as a list of rectangles. After that we have have looped to the different coordinates of the image returned and resized the image using CV2.resize() function. Finally we have used model.predict_emotion to get the predicted emotion. After that we have put the text on the frame of the image which shows the predicted emotion and also we have put the rectangle box around the area where the face was detected. And Finally we have returned that frame along with the predicted box and text.

Step #11 ->

As of now we have mode all important functions , Now let us make the function for calling the above code and showing the output video.


def gen(camera):
while True:
frame = camera.get_frame()
cv2.imshow('Facial Expression Recognization',frame)
if cv2.waitKey(1) & 0xFF == ord('q'):

In the code above we have declared a python function named gen which takes the camera as parameter. In this we run a while loop for True , Continuously. And In this first we call get_frame function of VideoCamera. After that we use imshow method of CV2. to show the video as output. After this we have written code to stop the code. We have added if condition that if key ‘q’ is presses then loop will be break and the output screen will be destroyed using destroyAllWindows() function.

Step #12 ->

Now we will write a final line of code which will run all the code above by calling the gen function.



In the above code we have called the gen function and we have passed the VideoCamera class object as the parameter and finally when we run the above line of code the output screen will open showing the live prediction like given below.


Important Points:-

  • To Get the Dataset Click HERE
  • To Get the Haar Cascade Classifiers file, Click HERE
  • To Get all the code with model file like JSON and Weights and all other code , The GITHUB LINK is HERE
  • Link To The Kaggle Notebook is HERE



Aaditya Singhal
Analytics Vidhya

I am a ML Enthusiast, Student and a Developer. I love exploring new tech everyday