Real-time Emotion Detection using Deep Learning and Machine Learning Techniques

Published in

Yıldız Technical University - Sky Lab

5 min readDec 12, 2020

8 emotions are detected in real-time with ~77% F1 accuracy score. OpenCV, Python 3, Keras, data preprocessing, Deep Learning & Machine learning Techniques are used in this project. This was the final project of Kodluyoruz 1. Applied Data Science and Machine Learning Bootcamp, instructed by Engin Deniz Alpman. Project contributors are Berk Sudan and İrem Şahin.

About Cohn-Kanade Database

In the dataset, each folder is dedicated to one person who shows a particular facial expression over tens of frames. Each person starts with a neutral/emotionless look and start exhibiting some overt emotion. Apart from neutral emotion, there are 7 emotions which are “anger,” “contempt”, “disgust”, “fear”, “happy”, “sadness”, “surprise”. An example of a dataset folder which contains emotion transition is shown below:

As seen in the images, after frames, facial expression which demonstrates “Surprise” becomes apparent. Pie chart of emotion labels can be shown below:

As seen in the chart, half of the emotions are neutral. Despite it seems unbalanced, yet, it is assumed that it represents the real world where emotions are mostly neutral.

Collecting Paths of Relative Face Images

First of all, with the help of our dataset preparer module, a clean, labeled and ready-to-use dataset has been built with images collected from Cohn-Kanade database, which consists of thousands of face images. As shown in the previous section (About Cohn-Kanade Database), all dataset folders contain set of images and only the first and the last one are suitable for use. Using such images, in which have an emotion label, paths of 654 images were obtained.

Extracting Face Landmarks

After getting paths of images, we converted each image into gray scales, reshaped and extracted 68 unsupervised landmark points from it using pretrained VGG-Face Shape Predictor. An example of unsupervised 68 points can be seen in the picture below:

These 68 points suggest the most distinguishable and representative spots in the face, yet, they didn’t mean anything by themselves.

Vectorization

So far, we only knew 2 coordinate values of a point. With using 136 parameters for each face image, along with 1 categorical label, we started training a model and checked the accuracy. But, accuracy scores were much less than expected. So, we did apply vectorization as an intermediary method. In this method, we supposed the mean spots in each axis as the starting point of the vector.

For iᵗʰ point, let’s say (xᵢ,yᵢ), we plotted two different vectors which have heads on the mean points in their axises. Such vectors that are colored in yellow, then, used for the next calculation. An example of vector extractions are shown in the figure below:

With the help of such vectors, we could able to calculate the rotation angle which can be seen in the figure below:

In order to calculate the rotation angle, let’s say α, we used the following formula:

Also, we could easily calculate the magnitude (as Euclidean Distance) of the vector using the following formula:

Consequently, we ended up with the rotation angle (α) and the magnitude (D). Thanks to this method, we were able to have better indicators for the target emotion label.

Applying Machine Learning Algorithms

By acquiring 2 columns from each point, along with the emotion label, we got 137 columns in total. Using Scikit Learn module, we applied SVM (Support Vector Machine), Random Forest, Gradient Boosting and Naïve Bayes algorithms. Among all algorithms, SVM and Random Forest gave fullfilling evaluation results.

Random Forest evaluation result is shown in the figure below:

Support Vector Machines evaluation result is shown in the figure below:

Real-time Image Classification

Using OpenCV and Dlib, images are captured periodically and classified in real time. Once an image is captured, the corresponding emotion label is predicted through feeding it into the machine learning model. Bounding box and face landmarks are both displayed in the screen. An example of real-time image classification is shown in the figure below:

Repository

Repository is available on GitHub. Don’t hesitate to contribute to the project! Also you can contact me whenever you want to ask something about the project.

References

Cohn-Kanade (CK and CK+) Database, Kanade, T., Cohn, J. F., & Tian, Y. (2000). Comprehensive database for facial expression analysis. Proceedings of the Fourth IEEE International Conference on Automatic Face and Gesture Recognition (FG’00), Grenoble, France, 46–53.
Lucey, P., Cohn, J. F., Kanade, T., Saragih, J., Ambadar, Z., & Matthews, I. (2010). The Extended Cohn-Kanade Dataset (CK+): A complete expression dataset for action unit and emotion-specified expression. Proceedings of the Third International Workshop on CVPR for Human Communicative Behavior Analysis (CVPR4HB 2010), San Francisco, USA, 94–101.Facial landmarks with dlib, OpenCV, and Python: https://www.pyimagesearch.com/2017/04/03/facial-landmarks-dlib-opencv-python/
https://www.pyimagesearch.com/2018/04/02/faster-facial-landmark-detector-with-dlib/
https://www.pyimagesearch.com/2017/04/03/facial-landmarks-dlib-opencv-python/