AIsaturdaysOgbomosho Week 13- Face Recognition
Its been a long break ! It feels good to be back ! đ Exams in the school took about a month and we needed to pause our classes temporarily and hit the ground running afterwards. To round off our final weeks, we are going to take consecutive classes to cover up.
This is a review of what we learnt today 27th September, 2018 using Andrew Ngâs DeepLearning specialization course. To accelerate our learning, we watch the videos during the week and do a walk-through the code while solving the programming assignments during weekends.
Todayâs task was learning how to build a face recognition system. Face recognition problems commonly fall into two categories:
- Face Verification: âis this the claimed person?â An example is a mobile phone that unlocks using your face. This is a one vs one matching problem.
- Face Recognition: âwho is this person?â Company employees entering the office without needing to otherwise identify themselves. This is a one against all matching problem.
Naive Face Verification
The simplest way to do face verification is to compare the two images pixel-by-pixel. If the distance between the raw images are less than a chosen threshold, it may be the same person!
This method is unlikely to produce promising results since the pixel values change dramatically due to variations in lighting, orientation of the personâs face, even minor changes in head position, and so on. But instead, element-wise comparisons of the encoding of raw images gives more accurate judgements as to whether two pictures are of the same person or not.
Encoding face images into a 128-dimensional vector
Using a pre-trained model called Facenet, the network takes in 96x96 dimensional RGB images as its input and outputs a matrix of shape (m,128) that encodes each input face image into a 128-dimensional vector.
An encoding is a good one if:
- the distance between the encodings of two images of the same person is small (below a chosen threshold)
- the distance between the encodings of two images of the different persons is far apart (above the chosen threshold)
The Triplet Loss
The idea behind the triplet loss function is to minimize the distance[1] between the encodings of two images of the same person and maximize the distance [2] between the encodings of two images of different persons.
Applying the model
Problem statement: Using images of friends in the neighborhood, we are to build a face verification system so as to only let people from a specified list come in. To get admitted, each person has to swipe an ID card (identification card) to identify themselves at the door. First,
- build a database containing one encoding vector for each person allowed to enter the happy house.
- verify if the person at the door is who they claim to be before opening the door.
Test the algorithm on pictures taken from the camera
An impersonator who stole younesâs ID card and tries to get in.
Face Recognition
In the case where an ID card gets lost or stolen, an authorized person would be denied entry by the face verification system without his ID card. To prevent such cases, weâd like to change our face verification system to a face recognition system. This way, no one has to carry an ID card anymore.
Lessons learnt:
Face verification solves an easier 1:1 matching problem; face recognition addresses a harder 1:K matching problem.
The triplet loss is an effective loss function for training a neural network to learn an encoding of a face image.
The same encoding can be used for verification and recognition. Measuring distances between two imagesâ encodings allows you to determine whether they are pictures of the same person.
AISaturdayOgbomosho wouldnât have happened without fellow ambassadors Temiloluwa Ruth Afape, Adegoke Toluwani, Mhiz Adeola and our loving Partner Intel.
Thanks to our Ambassador Daniel Ajisafe for the write up.
A big Thanks to Nurture.AI for this amazing opportunity and Andrew Ng for his amazing course.
follow us on twitter
More resources