Enhancing a Facial Recognition System via a Deep Learning Model as a Similarity Metric

Published in

Seamfix Engineering

7 min readMay 27, 2019

Facial recognition (FR) is a deep learning approach capable of identifying or verifying an individual from a digital image or a video. Quite a number of approaches exist capable of identifying faces, but we would focus on improving an FR system by replacing our traditional similarity metric with an intelligent model as a similarity learner. We would also evaluate the characteristics of using some traditional similarity metrics.

Facial recognition, verification or clustering are all ways in which faces are compared for different purposes. These purposes vary with applications ranging from attendance checks for employees to forensics in criminal detection. When compared to other biometric traits (such as fingerprint and iris), FR has transformed over time. FR systems capture an image without the user’s consent and further uses it for security-based applications — take an example of surveillance systems and security checks in airports. In addition, companies that focus on building virtual reality systems like VironIT and Oculus VR utilize FR for face tracking in their applications.

For our pipeline, the pre-trained Google’s FaceNet model would be responsible for generating some sort of number vector. FaceNet seemed sufficient compared to other models available to the community due to its ability to generate representations of unique facial features. This is as a result of the large dataset it was trained on. A point to note is our focus on ‘improving the similarity learning metric and not on FaceNet itself’. Also, a significant implementation we adopted from FaceNet was the one-shot training approach. The existing FR model to be improved utilizes a traditional similarity learning metric for classification, therefore the need for a different approach was necessary to improve the model.

Introduction

Facial biometrics such as — the distance between the eyes and nose tip positioning — vary in all humans. These unique features are therefore commonly used for identification.

Face verification has received adequate attention from researchers over the past decade, the process involves capturing a face image and then, comparing the image concurrently against a previously taken image or those stored in a database. Face recognition is a tasking field of research in artificial intelligence with various limitations imposed for an intelligent system to recognize a face — These include variations in head pose, change in lighting effect, facial expression, aging faces, occlusion due to accessories, and so on. Quality research work has been done to correct these effects with significant progress made.

FaceNet as an embedding generator

The FaceNet model as proposed by Schroff et al solves the face verification problem. It takes face images in batches and trains on them using the Triple Loss Function to calculate loss. A batch contains images as positive, negative and anchor pairs. While computing loss, the function minimises the distance between an anchor and a positive, i.e images of the same identity, and maximises the distance between the anchor and a negative i.e images of different identities. It learns one deep CNN, then transforms a face image to an embedding. The embedding can be used to compare faces in three ways:

Face verification considers two faces and it decides whether they are similar or not. Face verification can be done by computing the distance metric.
Face recognition is a classification problem for labelling a face with a name.The embedding vector can be used for training the final labels.
Face Clustering groups similar faces together just like how photo applications cluster photos of the same person together. A clustering algorithm such as K-means is used to group faces.

Traditional algorithms for similarity learning

Similarity learning is the process of training a mathematical function or metric to measure the degree of relationship between elements. It simply measures a metric across both elements under observation, also known as metric learning. Therefore, vector embeddings generated from our FaceNet model could be compared against each other to validate for similarity.

Traditional algorithms for similarity learning metrics can be:

Distance-based similarity metrics are the Euclidean distance, Manhattan distance, Minkowski distance, and so on. The basic idea common to these metrics is that they leverage the average distance between the elements of two vectors under experiment.

The Manhattan Distance Equation

Cardinality based similarity metrics leverage on the union and intersection of sets in comparison. The Jaccard similarity metric uses cardinality to determine the relation between two vectors under experiment.

Orientation based similarity metrics leverage on the angle between two vectors in their respective vector spaces. An example is the Cosine similarity metric, it calculates similarity by measuring the cosine of angle between two vectors.

A deep learning model as similarity learning metric

**Similarity metric as a Deep Learning Model**

The need to improve our model’s accuracy led us to research and experiment on better approaches that could make our FR system robust. First, we considered experimenting on traditional similarity learning metrics as discussed previously. It seemed viable initially, but results showed little or no improvements in relation to our previous implementation. Implementing other similarity metrics did not suffice our need to scale up model accuracy.

Before the decision of experimenting with a deep learning model, we first considered the use of statistical machine learning models, support vector machine (SVM) and the logistic regression model. The SVM classifier from experiments has proven to be a viable approach when the training data consists of long dimensional arrays. Therefore, it could serve this purpose but since the breakthrough in deep learning models, they have proven to be better than statistical models with regards to accuracy and performance. For a clearer picture, the previous FR pipeline and the improved has been represented diagrammatically below.

**Previous and Improved FR model pipeline**

Comparison function

A comparison function is a mathematical function that quantifies a “metric” between pairs of elements in two sets or more. This is achieved through a mathematical computation of some sort. In choosing a comparison function, it must satisfy the following properties for all x, y, z belonging to the set:

Non-negativity: f(x,y) ≥ 0
Identity of Discernible: f(x,y) = 0 <=> x =y
Symmetry: f(x,y) = f(y,x)
Triangle Inequality: f(x,z) ≤ f(x,y) + f(y,z)

Below are experimental results carried out on some comparison functions with respect to our implementation. Do note that these results might differ for different projects with respect to the dataset and implementation.

Data Constraint

Some of the major constraints encountered in deep learning today are data availability. For training, we needed pairs of images for each example; a large number of similar pairs belonging to one class (i.e. ‘match’ class) and multiples of dissimilar pairs belonging to another class (i.e ‘not a match’ class).

The solution was to augment the available dataset. Data augmentation has proven to be useful in solving deep learning problems today in areas where data is insufficient. There are vast methods available ranging from the use of sophisticated GANS to mere image translation. To keep things simple, we applied augmentation processes ranging from distributed noise addition to image flipping. The code snippet is as displayed below:

Embedding Generator and Storage

For an efficient training pipeline, the researcher has to make a decision on how to process the data generated. Data can either be stored to be accessed later or trained on-the-go as it is generated.

To store the embedding generated, the storage format has to be light-weight. The .npz format is a zipped archive of files named after the variables they contain. It saves NumPy arrays in a single file in an uncompressed format. On the other hand, for on-the-go training, the generator's function yields a chosen amount of embedding obtained from the desired comparison function chosen.

In building our deep learning model network architecture, the input dimension, logits and label were taken into consideration. We implemented a number of dense layers, trained with Adam optimizer and binary cross entropy as loss function.

Finally, the keras model “fit_generator” fits the data yielded batch-by-batch by our embedding generator. The generator runs parallel to the model for better results. For our generator, it enables us to do real-time data augmentation on images on the system in parallel to the model training handled by the GPU.

Conclusion and future work

This work presented an overview of the development of an enhanced facial recognition system by using a deep learning model in place of a traditional distance metric as a similarity learning metric. We would further extend this work in the future to see the possibility of using a deep learning network trained via triplet loss to generate a scalar value to base our decision on.

Is this paper sufficient? I would love to hear your opinions, suggestions and comments. Please leave them below. Thanks.

References
Florian Schroff, Dmitry Kalenichenko, James Philbin. FaceNet: A Unified Embedding for Face Recognition and Clustering. 2015
Yaniv Taigman, Ming Yang, Marc’Aurelio Ranzato, Lior Wolf. DeepFace: Closing the Gap to Human-Level Performance in Face Verification. 2014. 4
Rajalingappaa Shanmugamani. Deep Learning for Computer Vision: Expert techniques to train advanced neural networks using TensorFlow and Keras. Packt Publishing Ltd, January 2018