Real-time Emotion Detection using Deep Learning and Machine Learning Techniques

Berk Sudan
Yıldız Technical University - Sky Lab

--

8 emotions are detected in real-time with ~77% F1 accuracy score. OpenCV, Python 3, Keras, data preprocessing, Deep Learning & Machine learning Techniques are used in this project. This was the final project of Kodluyoruz 1. Applied Data Science and Machine Learning Bootcamp, instructed by Engin Deniz Alpman. Project contributors are Berk Sudan and İrem Şahin.

About Cohn-Kanade Database

In the dataset, each folder is dedicated to one person who shows a particular facial expression over tens of frames. Each person starts with a neutral/emotionless look and start exhibiting some overt emotion. Apart from neutral emotion, there are 7 emotions which are “anger,” “contempt”, “disgust”, “fear”, “happy”, “sadness”, “surprise”. An example of a dataset folder which contains emotion transition is shown below:

Neutral to Surprise Emotion

As seen in the images, after frames, facial expression which demonstrates “Surprise” becomes apparent. Pie chart of emotion labels can be shown below:

Emotion Label Portions Pie Chart

As seen in the chart, half of the emotions are neutral. Despite it seems unbalanced, yet, it is assumed that it represents the real world where emotions are mostly neutral.

Collecting Paths of Relative Face Images

First of all, with the help of our dataset preparer module, a clean, labeled and ready-to-use dataset has been built with images collected from Cohn-Kanade database, which consists of thousands of face images. As shown in the previous section (About Cohn-Kanade Database), all dataset folders contain set of images and only the first and the last one are suitable for use. Using such images, in which have an emotion label, paths of 654 images were obtained.

Extracting Face Landmarks

After getting paths of images, we converted each image into gray scales, reshaped and extracted 68 unsupervised landmark points from it using pretrained VGG-Face Shape Predictor. An example of unsupervised 68 points can be seen in the picture below:

68 Landmark Points in Face

These 68 points suggest the most distinguishable and representative spots in the face, yet, they didn’t mean anything by themselves.

Vectorization

So far, we only knew 2 coordinate values of a point. With using 136 parameters for each face image, along with 1 categorical label, we started training a model and checked the accuracy. But, accuracy scores were much less than expected. So, we did apply vectorization as an intermediary method. In this method, we supposed the mean spots in each axis as the starting point of the vector.

For iᵗʰ point, let’s say (xᵢ,yᵢ), we plotted two different vectors which have heads on the mean points in their axises. Such vectors that are colored in yellow, then, used for the next calculation. An example of vector extractions are shown in the figure below:

Vector Extractions from Landmarks

With the help of such vectors, we could able to calculate the rotation angle which can be seen in the figure below:

The Rotation Angle of Vectors

In order to calculate the rotation angle, let’s say α, we used the following formula:

Rotation Angle Formula

Also, we could easily calculate the magnitude (as Euclidean Distance) of the vector using the following formula:

Euclidean Distance Magnitude Formula

Consequently, we ended up with the rotation angle (α) and the magnitude (D). Thanks to this method, we were able to have better indicators for the target emotion label.

Applying Machine Learning Algorithms

By acquiring 2 columns from each point, along with the emotion label, we got 137 columns in total. Using Scikit Learn module, we applied SVM (Support Vector Machine), Random Forest, Gradient Boosting and Naïve Bayes algorithms. Among all algorithms, SVM and Random Forest gave fullfilling evaluation results.

Random Forest evaluation result is shown in the figure below:

Random Forest Evaluation Result

Support Vector Machines evaluation result is shown in the figure below:

Support Vector Machines Evaluation Result

Real-time Image Classification

Using OpenCV and Dlib, images are captured periodically and classified in real time. Once an image is captured, the corresponding emotion label is predicted through feeding it into the machine learning model. Bounding box and face landmarks are both displayed in the screen. An example of real-time image classification is shown in the figure below:

An Example of Real-time Image Classification

Repository

Repository is available on GitHub. Don’t hesitate to contribute to the project! Also you can contact me whenever you want to ask something about the project.

References

--

--

Berk Sudan
Yıldız Technical University - Sky Lab

I’m a computer engineer who works in the fields of Machine Learning, Data Science, Big Data and Data Engineering for a long time. I love math, data and coding!