DeepLearning series: Face Recognition

Michele Cavaioni
Machine Learning bites
4 min readFeb 23, 2018

In this write-up, I will describe the techniques to build a face recognition system, through the use of neural networks.

FACE RECOGNITION:

There are two aspects in the field of face recognition that we should separate:

  • Verification.
  • Recognition.

In terms of “face verification” we are dealing with a 1-to-1 problem where we have:

  • Input: image associated with a name.
  • Output: whether the image is that of the claimed person.

“Face recognition”, instead, is a 1-to-k problem where we have:

  • A dataset of k persons.
  • Input: images associated with a name.
  • Output: the name associated to the input image if it is the image of any of the k persons.

Implementing a “face verification” system is solving a “one-shot learning problem”. Let’s see different ways to do that:

One-shot learning problem:

One approach is to implement a ConvNet where we have input images and the output unit is reflecting the number of people we are verifying.

This doesn’t work well, as for every new person we need to change the output of the network and re-train the CNN from scratch.

Another approach is to learn a “similarity” function, which depicts the degree of difference between two images:

The value of this function will be small if img1 and img2 are of the same person, and bigger otherwise. This is a good way of thinking…but how do we implement this d(img1, img2) ?

Answer: with the Siamese network, of course!

SIAMESE NETWORK

We create a ConvNet that outputs a feature vector as shown below:

where f(x(1)) is the “encoding” of x(1) and it represents the picture as a vector of numbers (which, in this case, is 128).

Now we feed a second picture to a network with the same parameters and we get a vector that identifies that image.

We define the similarity function as:

Ok, now we got to find out the way to create our similarity function, but we created a new hurdle. How do we get a good encoding?

Read further!

Triplet loss

This is a good objective function to learn the parameters of the Neural Network so that it gives us a good encoding for the pictures of the faces.

What we do is to always look at three images at a time.

So now we have an “anchor” image (A), a positive picture (P), which is the same picture of the person we want to represent, and finally a negative picture (N) of a completely different person.

What we want to achieve is:

(α is a margin that we use to avoid a zero result, which would actually be satisfied otherwise by even an empty image. So, this avoids the network to output a trivial solution).

The loss function for a single triplet is then computed as:

and the overall cost function over a training set is:

Finally we train the neural network using gradient descent on this cost function.

To get better training results it’s important to choose triplets that are “hard” to train on. So where d(A,P) is close to d(A,N).

After having trained the system (through the use of multiple pictures), we can apply it to a one-shot learning problem where we can have only one single picture of the person we want to recognize.

There is also another way, besides the “triplet loss”, to learn the parameters of the neural network. It is done by looking at face verification as a binary classification problem.

Binary classification

We create two Siamese networks that have the same parameters. Both of them output the encoding for the image and we use those as inputs to a logistic regression unit. The output will be 1 if the input images are the same person or 0 otherwise.

This blog is based on Andrew Ng’s lectures at DeepLearning.ai

--

--

Michele Cavaioni
Machine Learning bites

Passionate about AI, ML, DL, and Autonomous Vehicle tech. CEO of CritiqueMatch.com, a platform that helps writers and bloggers to connect and exchange feedback.