When we think about some particular person, very often we imagine his or her face, smile and different unique facial features. We know how to recognize the person by it’s face, understand his emotions, estimate his age, almost with 100% certainty tell his gender. Human vision system can do these and many other things with faces very easily. Can we do the same with modern artificial intelligence algorithms, in particular, deep neural networks ? In this article I would like to describe most of the common tasks in facial analysis, also providing references to already existing datasets and deep learning based solutions. The latter will allow you to play by your own, trying to apply these features for your business, or, better… you can contact us :)
This problem is the most basic one — how to find the face on the image? First, we would like just find the face in a rectangle, like on a picture:
This problem is for years more or less successfully solved with Haar cascades that are implemented in popular OpenCV package, but here are alternatives:
One of the biggest datasets for faces, it consists of 32,203 images and label 393,703 faces in different conditions, so it’s good choice to try to train a detection model on it
Another good dataset, but smaller, with 5171 faces, good to test simple models on it.
What about algorithms and approaches for detection problem? I would offer you to check the following article to see comparison of different approaches, but general and really good one is Faster RCNN, that is designed for object detection and classification pipeline and showed good results on VOC and ImageNet datasets for general image recognition and detection.
After we successfully cropped our face from the general image, most probably we would like to identify a person, for example, to match it with someone form our database.
Good point to start with face identification problem is to play with VGG Face dataset — they have 2,622 identities in it. Moreover, they already provide trained models and code that you can use as feature extractors for your own machine learning pipeline. They also introduce and important concept as triplet loss, that you might use for large scale face identification. In two words, it “pushes” similar faces to have similar representation and does the opposite to different faces.
If you want to go deeper both in terms of complexity of neural network and scale of data, you might try Do We Really Need to Collect Millions of Faces for Effective Face Recognition. They use deeper ResNet-101 network with code and trained models as well.
I would like to remind, that you can use these models just as facial feature extractor, and apply given representations to your own purposes, for example, clustering, or metric-based methods.
Facial keypoints detection
Long story short, facial keypoints are the most important points, according with some metric, see, e.g., Face recognition: A literature survey , that can be extrapolate from a given (picture of a ) face to describe the associated emotional state, to improve a medical analysis, etc.
If you really want to train a neural network model for keypoint detection, you may check Kaggle dataset with corresponding tutorial, but maybe more convenient way to extract these points will be use of open-source library dlib.
Age estimation and gender recognition
Another interesting problem is understanding the gender of a person on the photo and try to estimate it’s age.
Unfiltered faces for gender and age classification — good dataset that provides 26,580 photos for 8 (0–2, 4–6, 8–13, 15–20, 25–32, 38–43, 48–53, 60-) age labels with corresponding labels.
Emotion recognition from a photo is such a popular task, so it’s even implemented in some cameras, aiming at automatically detecting when you’re smiling. The same can be done by suitably training neural networks. There are two datasets: one from Kaggle competition and Radboud Faces Database that contain photos and labels for seven basic emotional states (0=Angry, 1=Disgust, 2=Fear, 3=Happy, 4=Sad, 5=Surprise, 6=Neutral).
If you want to try some ready solution, you can try Emotion Recognition in the Wild via Convolutional Neural Networks and Mapped Binary Patterns project, they provide code and models.
The interesting moment of latter project is converting input image into local binary pattern space, after mapping it into 3D space and training a convolutional neural network on it.
Facial action units detection
Facial emotion recognition datasets usually have one problem — they are concentrated on learning an emotion only in one particular moment and the emotion has to be really visible, while in real life our smile or sad eye blink can be really short and subtle. And there is a special system for coding different facial expression parts into some “action units” — Facial action coding system (FACS).
One of the most famous databases for action units (AUs) detection is Cohn-Kanade AU-Coded Expression Database with 486 sequences of emotional states. Another dataset is RPI ISL Facial Expression Databases, which is licensed. Training a model to detect AUs keep in mind, that this is multilabel problem (several AUs are appearing is single example) and it can cause to different problems.
Another interesting and challenging problem is the so called gaze tracking. Detection of eye movements can be as a part of emotion detection pipeline, medical applications, but it’s an amazing way to make human-computer interfaces based on sights or a tool for retail and client engagement research.
Nice deep learning approach project is University of Georgia’ Eye Tracking for Everyone, where they publish both dataset and trained model.
Facial analysis is still very young area, but as you can see, there are a lot of developments and published models that you can try and test as a starting point of your own research. New databases are regularly appearing, so you can use transfer learning along with fine tuning already existing models, exploiting new datasets for your own tasks.