Face Verify — Part 1

Tanmay Thakur
DataSeries
Published in
6 min readMay 20, 2020

Facial recognition is a term we hear thrown around about 10 times a day in a technology-dominated era, but did you know it’s actually split into three subtasks?

Facial detection, recognition and verification are actually three seperate tasks contributing to the same goal, using your facial signature as a virtual key to anything you don’t want to secure using the term “password”.

Facial detection involves detecting if a face is present in an image, it can’t do anything else, although this is a fairly cheap computational process, requiring only landmark filters, like the haarCascade face landmarks filter. Look here for a detailed explanation : http://www.willberger.org/cascade-haar-explained/.

Facial recogniton and verification may sound similar but unlike facial recognition which performs a 1:n match against a database of known faces, facial authentication is 1:1. The user is authenticating using their face as their credential to secure access to their online account. To authenticate, the user simply takes a selfie (which is converted into a 3D face map) that is then compared, one-to-one, with the stored biometric template. A proper match, based on an accuracy score, completes the secure authentication process in the background.

In this tutorial I will endevour to explain the algorithms used by my friend SphericalKat in his app https://github.com/ATechnoHazard/faceverify-lib/ , which aims to provide a simple API for cropping faces from a bitmap and comparing two faces to verify that they’re from the same person. He has done so using the TFLite Models of MTCNN and MobileFaceNet, which we’ll delve deeper into moving forwards.

FaceNet is a face recognition system developed in 2015 by researchers at Google that achieved then state-of-the-art results on a range of face recognition benchmark datasets. The FaceNet system can be used broadly thanks to multiple third-party open source implementations of the model and the availability of pre-trained models.

The FaceNet system can be used to extract high-quality features from faces, called face embeddings, that can then be used to train a face identification system.

A demo from the app, the test subject being SphericalKat

FaceNet is a face recognition system that was described by Florian Schroff, et al. at Google in their 2015 paper titled “ FaceNet: A Unified Embedding for Face Recognition and Clustering.”

It is a system that, given a picture of a face, will extract high-quality features from the face and predict a 128 element vector representation these features, called a face embedding.

The model is a deep convolutional neural network trained via a triplet loss function that encourages vectors for the same identity to become more similar (smaller distance), whereas vectors for different identities are expected to become less similar (larger distance). The focus on training a model to create embeddings directly (rather than extracting them from an intermediate layer of a model) was an important innovation in this work.

Our method uses a deep convolutional network trained to directly optimize the embedding itself, rather than an intermediate bottleneck layer as in previous deep learning approaches.

- FaceNet: A Unified Embedding for Face Recognition and Clustering, 2015.

These face embeddings were then used as the basis for training classifier systems on standard face recognition benchmark datasets, achieving then-state-of-the-art results.

The paper also explores other uses of the embeddings, such as clustering to group like-faces based on their extracted features.

It is a robust and effective face recognition system, and the general nature of the extracted face embeddings lends the approach to a range of applications.

Before we can perform face recognition, we need to detect faces.

We will use the Multi-Task Cascaded Convolutional Neural Network, or MTCNN, for face detection, e.g. finding and extracting faces from photos. This is a state-of-the-art deep learning model for face detection, described in the 2016 paper titled “ Joint Face Detection and Alignment Using Multitask Cascaded Convolutional Networks.”

Face detection and alignment in an unconstrained environment is challenging due to various poses, illuminations and occlusions. Recent studies show that deep learning approaches can achieve impressive performance on these two tasks. In this paper, the authors proposed a deep cascaded multi-task framework which exploits the inherent correlation between detection and alignment to boost up their performance. In particular, the framework leverages a cascaded architecture with three stages of carefully designed deep convolutional networks to predict face and landmark location in a coarse-to-fine manner. In addition, an online hard sample mining strategy that further improves the performance in practice was proposed. The method achieves superior accuracy over the state-of-the-art techniques on the challenging FDDB and WIDER FACE benchmarks for face detection, and AFLW benchmark for face alignment, while keeping up real time performance.

We can use the mtcnn library to create a face detector and extract faces for our use with the FaceNet face detector models in subsequent sections.

The first step is to load an image. We will also convert the image to RGB, just in case the image has an alpha channel or is black and white.

Next, we can create an MTCNN face detector class and use it to detect all faces in the loaded photograph.

The result is a list of bounding boxes, where each bounding box defines a lower-left-corner of the bounding box, as well as the width and height.

If we assume there is only one face in the photo for our experiments, we can determine the pixel coordinates of the bounding box . Sometimes the library will return a negative pixel index, and I think this is a bug. We can fix this by taking the absolute value of the coordinates.

We can then use the preprocessing functions to resize this small image of the face to the required size; specifically, the model expects square input faces.

We can use this function to extract faces as needed in the next section that can be provided as input to the FaceNet model.

We here use the mobile version of Facenet, MobileFaceNets, which uses less than 1 million parameters and is specifically tailored for high-accuracy real-time face verification on mobile and embedded devices. There was first an analysis on the weakness of common mobile networks for face verification. The weakness has been well overcome by the above mentioned MobileFaceNets. Under the same experimental conditions, MobileFaceNets achieve significantly superior accuracy as well as more than 2 times actual speedup over MobileNetV2. After trained by ArcFace loss on the refined MS-Celeb-1M dataset , single MobileFaceNet of 4.0MB size achieves 99.55% accuracy on LFW and 92.59% TAR@FAR1e-6 on MegaFace, which is even comparable to state-of-the-art big CNN models of hundreds MB size. The fastest one of MobileFaceNets has an actual inference time of 18 milliseconds on a mobile phone. For face verification, MobileFaceNets achieve significantly improved efficiency over previous state-of-the-art mobile CNNs.

We will now create a face embedding.

A face embedding is a vector that represents the features extracted from the face. This can then be compared with the vectors generated for other faces. For example, another vector that is close (by some measure) may be the same person, whereas another vector that is far (by some measure) may be a different person. The FaceNet model will generate this embedding for a given image of a face.

The FaceNet model will pre-process a face to create a face embedding that can be stored and used as mapping to authenticate against although in our implementation we save on memory by not storing embeddings but just generating them on every image and comparing then.

With that we are done with the theoretical stage, my friend and co-worker at Smoketrees (https://smoketrees.dev/), SphericalKat will take over for Part 2 where you’ll learn how to use the TFLite versions of these models to use in an android application, where you use these both for face verification.

--

--