Building a real time Face Recognition system using pre-trained FaceNet model

Vinayak Arannil
4 min readNov 8, 2017

Code at: https://github.com/vinayakkailas/face_recognition

Face Recognition is becoming a new trend in the security authentication systems. Modern FR systems can even detect, if the person is real(live) or not while doing face recognition, preventing the systems being hacked by showing the picture of a real person. I am sure, everyone wondered when Facebook implemented the auto-tagging technique. It identifies the person and tag him/her when ever you upload a picture. It is so efficient that, even when the person’s face is occluded or the picture is taken in darkness, it tags accurately. All these successful face recognition systems are the results of recent advancements in the field of computer vision, which is backed by powerful deep learning algorithms. Let us explore one of such algorithms and see how we can implement a real time face recognition system.

Face recognition can be done in two ways. Imagine you are building a face recognition system for an enterprise. One way of doing this is by training a neural network model (preferably a ConvNet model) , which can classify faces accurately. As you know for a classifier to be trained well, it needs millions of input data. Collecting that many images of employees, is not feasible. So this method seldom works. The best way of solving this problem is by opting one-shot learning technique. One-shot learning aims to learn information about object categories from one, or only a few, training images. The model still needs to be trained on millions of data, but the dataset can be any, but of the same domain. Let me explain this more clearly. In one shot way of learning, you can train a model with any face datasets and use it for your own data which is very less in number. There are many publicly available face datasets like CASIA-WebFace, MS-Celeb-1M, AT&T faces dataset, etc which are having millions of face images. Now the question is, what kind of model is required for building a face recognition system.

One-shot learning can be implemented using a Siamese network. As the name indicates, its nothing but, two identical neural networks with exact same weights, but taking two distinct inputs. These networks are optimised based on the contrastive loss between their outputs. This loss will be small when the inputs to the networks are similar and large when inputs differ from each other. So in this way, optimised Siamese networks can differentiate between their inputs.

Siamese network

FaceNet is a one-shot model, that directly learns a mapping from face images to a compact Euclidean space where distances directly correspond to a measure of face similarity. Once this space has been produced, tasks such as face recognition, verification and clustering can be easily implemented using standard techniques with FaceNet embeddings as feature vectors(from the original paper).To train, they used triplets of roughly aligned matching / non-matching face patches. A triplet is nothing but a collection one anchor image, one matching image to the anchor image and one non-matching image to the anchor image. So the triplet loss minimises the distance between an anchor and a positive, both of which have the same identity, and maximises the distance between the anchor and a negative of a different identity.

Triplet loss before and after training(taken from facenet paper)

There are many pre-trained facenet models available which are trained on different deep learning frameworks. One such implementation is https://github.com/davidsandberg/facenet by David Sandberg using tensorflow. He has published the weights at https://drive.google.com/file/d/0B5MzpY9kBtDVZ2RpVDYwWmxoSUk. Let us see how we can use a pre-trained model for our use case. Here are the steps.

  1. Collect the images of all employees.
  2. Align the faces using MTCNN (Multi-task Cascaded Convolutional Neural Networks), dlib or Opencv. These methods identify, detect and align the faces by making eyes and bottom lip appear in the same location on each image.
  3. Use the pre-trained facenet model to represent (or embed) the faces of all employees on a 128-dimensional unit hyper sphere.
  4. Store the embeddings with respective employee names on disc.

Now your face recognition system is ready !!. Let us see how we can recognise faces, with what all we have done above. Now you have with you, the corpus of 128-dimensional embeddings with corresponding employee names. When ever an employee faces your detection camera, the image being captured will be ran through the pre-trained network to create the 128-dimensional embedding which will then be compared to the stored embeddings using euclidean(L2) distance. If the lowest distance between the captured embedding and the stored embeddings is less than a threshold value, the system can recognise that person as the employee corresponding to that lowest distant embedding.

That’s all…. You have built a simple, but efficient face recognition system. Have a nice day…!!

Note: If you have 1000s of classes with good number of images per class, better train a classifier on top of the features extracted through a pre-trained facenet model than calculating the embedding distances.

References:

  1. Deepface paper https://www.cs.toronto.edu/~ranzato/publications/taigman_cvpr14.pdf
  2. Facenet paper https://arxiv.org/pdf/1503.03832.pdf
  3. Pre-trained facenet model https://github.com/davidsandberg/facenet
  4. training a classifier using extracted features https://github.com/davidsandberg/facenet/wiki/Train-a-classifier-on-own-images

--

--