Face Recognition using Deep learning

Amit Kumar
Kuzok
Published in
4 min readSep 12, 2019

Let’s start with the most basic question.

What is deep learning?

Deep learning is a sub-field of Machine Learning and a significant branch of Artificial Intelligence. It aims to infer high-level abstractions from raw data by using a deep graph with multiple processing layers composed of multiple linear and non-linear transformations. Several deep learning architectures such as convolutional deep neural networks (CNNs), and recurrent neural networks have been applied to computer vision, speech recognition, natural language processing, and audio recognition to produce state-of-the-art results on various tasks.

Now coming to face recognition, It is a sequence of processes which involves first face detection followed by extraction of facial features. Below is a rough pipeline of a complete face recognition API system where you input an image and get face recognized.

face recognition API pipeline

Here in this implementation, I will be using FaceNet which is a face recognition system developed in 2015 by researchers at Google that achieved then state-of-the-art results on a range of face recognition benchmark datasets.

Let’s explore about face embeddings little bit. So, what does a face embedding is exactly? Each face is compactly represented by a 128-dimensional byte vector. Now why it’s 128 why not 256 or 64? 128 is selected because according to[FaceNet] paper they performed experiments with these dimensions and below is the result.

Source:FaceNet (https://arxiv.org/abs/1503.03832)

The differences in the performance reported above are statistically insignificant. Therefore they selected 128. However, it is possible that higher dimension requires more training to achieve the same accuracy. What kind of values is present in the 128D embedding array? It contains floating point values [0.03861399, -0.04976186, … ,0.09530648 -0.05199577].

Let’s check how the numbers actually look like.

Above each image is 8x16 pixels image, therefore, have total of 128dimensions each holding the value of one specific pixel. You can somewhat tell the difference between FaceID0 and FaceID1 . Below is the representation of 128D{high dimension} into 2D dimension using the result of PCA(Principle Component Analysis) which is actually used to reduce the dimensionality of a dataset. PCA uses eigenvalues and eigenvectors of the data-matrix. These eigenvectors of the covariance matrix have the property that they point along the major directions of variation in the data.

PCA

You can observe how similar face embeddings are being closer to each other while being apart from different kinds of faces embedding. The difference is huge because above is a representation of of a women’s and a men’s face embeddings.

Let’s explore by using t-SNE(t-Distributed Stochastic Neighbouring Entities), 3D representation of the same embeddings. t-SNE uses a probablistic approach instead of a mathematical technique.

t-SNE

Having said so much about face embeddings let’s dive into creating APIs

An application program interface (API) is a set of routines, protocols, and tools for building software applications. Basically, an API specifies how software components should interact. A RESTful API is an application program interface (API) that uses HTTP requests to GET, PUT, POST and DELETE data. REST is an architectural style,REST stands for REpresentational State Transfer. It means when a RESTful API is called, the server will transfer to the client a representation of the state of the requested resource.

There are a few key options for a REST API request:

  • GET — The most common option, returns some data from the API based on the endpoint you visit and any parameters you provide
  • POST — Creates a new record that gets appended to the database
  • PUT — Looks for a record at the given URI you provide. If it exists, update the existing record. If not, create a new record
  • DELETE — Deletes the record at the given URI
  • PATCH — Update individual fields of a record

Why I did select Django REST Framework?

Because creating APIs in Django is so easy! DRF makes serialization very easy!

Here is a small glimpse of the view inheritance of DRF

Image Source: reddit.com/u/sheldon392

For more details check out the project on GitHub, Project link: https://github.com/pymit/Rekognition

--

--