Face Recognition Pipeline Clearly Explained

Published in

Backprop Lab

9 min readDec 5, 2020

Introduction to State-of-the-art algorithm used in Face Recognition pipeline.

Face recognition is a technique of identification or verification of an individual using their face in a video or photo. This computer vision task captures, analyzes, and compares patterns based on the person’s facial details.

The 2011 book on face recognition titled “Handbook of Face Recognition” describes two main modes for face recognition, as:

A face recognition system is expected to identify faces present in images and videos automatically. It can operate in either or both of two modes: (1) face verification (or authentication), and (2) face identification (or recognition).

Face Verification. A one-to-one mapping of a given face against a known identity (e.g. is this the person?).
Face Identification. A one-to-many mapping for a given face against a database of known faces (e.g. who is this person?).

Use case of Face Recognition Technology

This technology is used by many companies and organizations, some that you’re probably aware of, and there are some that you’re possibly not. Here are some examples of Face Recognition Technology:

Access Control: Access control of personal computers, homes, cars, offices, and other premises is one of the most apparent methods of using Face Recognition. And Apple’s iPhone X is a perfect example of using FRT to unlock a smartphone.
Shopping Online: Alibaba, a prominent Chinese e-commerce company, plans to use the Alipay platform to let users make purchases over the Internet. And as a first step, Alipay has already launched a ‘Smile to Pay’ facial recognition system at a KFC in Hangzhou. The system recognizes a face within two seconds and then verifies the scan by sending a mobile alert. ‘Smile to Pay’ is also able to identify people wearing make-up or wigs as a disguise.
Helping addictive gamblers: The face recognition system merely compares the faces of individuals who play slots with self-proclaimed problem gamblers in casinos. It alerts the security team when the device detects a match, which then discreetly approaches the gamblers and escorts them off the premises.
Tracking down criminals: And this one won’t come as a surprise. Facial recognition is a crime-fighting technology that is used to recognize targets by law enforcement and intelligence agencies. For example, the officer only has to snap a picture and voila with the assistance of MORIS (Mobile Offender Identification and Information System)-a portable biometric device attached to a smartphone.
Organizing photos: The most widespread way to use this technology is done by Apple, Google, and even Facebook to differentiate a portrait from a landscape, find a user in a frame, and sort photos by categories using their own face recognition systems. And we all provide tremendous support for the facial recognition algorithm every time we upload a picture and tag our friends on it.
Taking attendance in school: Schools in the UK use FRT in order to attend. This has been going on for a while in the UK, but will definitely spread to other nations as well. Both students and teachers in the UK love this new technology that scans faces with infra-red light and matches them with archived images.

Face Recognition Pipeline

A Face Recognition pipeline can be divided into three major stages:

Face Detection
Face Alignment
Feature Extraction
Feature Matching

In this article, I will not explain the working architecture of each algorithm I list down for each major stage but will give you an idea of their use cases and you could dive deep into those algorithms which you think are applicable for your face recognition project.

Face Detection

A Face detection method is used to find the faces present in the given image, extract faces if exist, and crop the face only to create a compressed file for further feature extraction. There are multiple algorithm options to perform this task in a face detection/recognition system.

Methods used in Face Detection:

Haar cascade Face Detection: Haar Cascade based Face Detector was the state-of-the-art in Face Detection for many years since 2001 when it was introduced by Viola and Jones in their paper, “Rapid Object Detection using a Boosted Cascade of Simple Features”. There have been many improvements in recent years. This method has a simple architecture that works nearly real-time on the CPU. Also, it can detect images at different scales. But the major drawback is that it gives false results as well as it doesn’t work on non-frontal images.
Dlib (HOG) Face Detection: This is a widely used face detection model, based on HoG features and SVM published in 2005 in the paper “Histograms of oriented gradients for human detection”. HOG, or Histogram of Oriented Gradients, is a feature descriptor that is often used to extract features from image data. It is the fastest method on CPU which can work on frontal and slightly no-frontal images. But it is incapable of detecting small images and handling occlusions. Also, it often excludes some parts of the chin and forehead while detection.
Dlib (CNN) Face Detection: This method first introduced in the 2016 paper “CNN based efficient face recognition technique using Dlib” uses a Maximum-Margin Object Detector ( MMOD ) with CNN based features. The training process for this method is very simple and you don’t need a large amount of data to train a custom object detector. It works very fast on GPU and is capable to work for various face orientations in images. It can also handle occlusions. But the major disadvantage is that it is trained on a minimum face size of 80*80 so it can’t detect small faces in images. It is also very slow on the CPU.
MTCNN Face Detection: Multi-task Cascaded Convolutional Networks (MTCNN) is a framework developed as a solution for both face detection and face alignment. This method was first introduced in a paper named “Joint Face Detection and Alignment using Multi-task Cascaded Convolutional Networks” in 2016. This method gives the most accurate results out of all the four methods. It works for faces having various orientations in images and can detect faces across various scales. It can even handle occlusions. It doesn’t hold any major drawback as such but is comparatively slower than HOG and Haar cascade method.

You can refer to this site to get a detailed understanding of the above differences.

Face Alignment

Face alignment is an early phase of the modern pipeline of face recognition. Google has reported that face alignment improves the accuracy of its FaceNet face recognition model from 98.87% to 99.63%. This is an increase in accuracy of almost 1 percent. We can easily apply 2D face alignment inside OpenCV in Python. With haar cascade configurations in OpenCV has modules for both frontal face and eye detection. Extracting the eye locations is very important to align faces. OpenCV finds eye locations with conventional haar cascade method. After getting the eye location of the detected face you can rotate the image 1 degree until both eyes are horizontal. This will increase the complexity of the solution so you can align the face based on angles between two eyes using the cosine rule. MTCNN also finds some facial landmarks such as eye, nose, and mouth locations. If we are using MTCNN in the face recognition pipeline it will automatically do an alignment to the face detected.

Feature Extraction

Feature extraction is the basic and most important initializing step for face recognition. It extracts the biological components of your face. These biological components are the features of your face that differ from person to person. There are various methods which extract various combination of features, commonly known as nodal points. No two people can have all the nodal points similar to each other except for identical twins.

Methods used in Feature Extraction (Deep Approach):

There was a flurry of research and publications in 2014 and 2015 on deep learning methods for face recognition. Capabilities reached near-human-level performance rapidly, then surpassed human-level performance over a three-year span on a standard face recognition dataset. So these advances have been powered by four milestone systems for deep learning for face recognition: DeepFace, the DeepID series of systems, VGGFace, and FaceNet.

VGGFace: The VGG-Face CNN descriptors are computed using our CNN implementation based on the VGG-Very-Deep-16 CNN architecture and are evaluated on the Labeled Faces in the Wild and the YouTube Faces dataset. VGG uses various architectures such as VGGFace1, VGGFace2 by Keras. The basic difference among these models is the number of layers included in its architecture that varies from model to model. These models have quite good accuracy.

FaceNet: FaceNet is a face recognition system developed in 2015 by researchers at Google in their 2015 paper titled “FaceNet: A Unified Embedding for Face Recognition and Clustering”, that achieved then state-of-the-art results on a range of face recognition benchmark datasets and presented an innovation called ‘triplet loss‘ that allowed images to be encoded efficiently as feature vectors that allowed rapid similarity calculation and matching via distance calculations. The FaceNet system can be used broadly thanks to multiple third-party open-source implementations of the model and the availability of pre-trained models. The FaceNet system can be used to extract high-quality features from faces, called face embeddings, which can then be used to train a face identification system.

DeepFace: DeepFace is a system based on deep convolutional neural networks. It was described in the 2014 paper titled “DeepFace: Closing the Gap to Human-Level Performance in Face Verification.” It was perhaps the first major leap forward using deep learning for face recognition, achieving near human-level performance on a standard benchmark dataset.

DeepID (Deep hidden IDentity features): The DeepID is a series of systems (e.g. DeepID, DeepID2, etc.), first described by Yi Sun, et al. in their 2014 paper titled “Deep Learning Face Representation from Predicting 10,000 Classes.” Their system was first described much like DeepFace, although was expanded in subsequent publications to support both identification and verification tasks by training via contrastive loss. The DeepID systems were among the first deep learning models to achieve better-than-human performance on the task.

Feature Classification

The final stage of face detection technology is to make a decision whether the face’s features of a new sample are matching with the one from a facial database or not. These template-based classifications are possible using various statistical approaches. It usually takes just seconds.

Euclidean Distance: It is a distance-based feature classification method that calculates the distance between the facial nodes and the face which has the minimum difference between these distance values is considered to be the match. But it is suitable for the datasets having a smaller number of classes and lower-dimensional features.
Cosine Similarity: In cosine similarity, the solution that we obtain after calculating the cosine of an angle is brought into concern. Here, we would compare the differences between these results. The more the value is closer to 1, the greater is the probability of the match. But it may give a false result if the test data features are incomplete.
SVM (Support vector machine): SVM creates an optimal hyperplane to classify the classes of the training dataset based on the different features of the face. The dimensionality of the hyperplane is one less than the number of features. Different kernels can be applied to see what features are used by the classifier to remove the features if required. This can help to improve speed.
KNN (K-Nearest Neighbor): KNN is all about the number of neighbors i.e. the k value. In KNN, if k=3 then we check that the data is close to which 3 data points. Thereafter, it is decided that the majority of the closest data points belong to which class. Now, the test data is predicted to be in this class KNN has a curse of dimensionality problem which can be solved by applying PCA before using the KNN classifier. You can get a better understanding of KNN
ANN (Artificial Neural Network): ANN uses a very detailed algorithm for face recognition. It classifies the local texture using a multi-layer perceptron for face alignment. It uses a geometric feature-based and independent component analysis for feature extraction and multi artificial neural network for feature matching.

Happy learning.

REFERENCES:

[1] https://www.bayometric.com/identification-verification-segmented-identification/

[2] https://www.learnopencv.com/face-detection-opencv-dlib-and-deep-learning-c-python/

[3] https://maelfabien.github.io/tutorials/face-detection/#4-which-one-to-choose-

[4] https://medium.com/clique-org/how-to-create-a-face-recognition-model-using-facenet-keras-fd65c0b092f1

[5] https://machinelearningmastery.com/introduction-to-deep-learning-for-face-recognition/