How Facial Analytics is empowering Urban Company

By — Aditya Jaiswal, Resham Wadhwa (Data Science, Urban Company)

UC Blogger
Urban Company – Engineering
8 min readSep 20, 2021

--

Isn’t it amazing how Artificial Intelligence can be used to automate tasks which would have been very difficult to perform manually? Just Imagine Tony Stark without F.R.I.D.A.Y.

Trust and Safety are the main pillars of UC’s business. We ensure that a smooth and safe service is delivered to our customers. In this post, we will see how UC built a pipeline performing face detection, verification and identification using AI and different processes at UC that are empowered by this technology.

Processes fuelled by Facial Analytics

Face Detection

Face detection is used to find and localise all the faces present in an image. Face Detection helps in identifying which parts of an image consist of faces. We use this technique for the following use-cases:

  1. Registered image validity: While onboarding / updating the registered image of a service professional, it is ensured that only one face is present in the image through Face Detection
  2. Headcount for group jobs: Helps in getting daily headcount of service professionals on jobs where a minimum number of service professionals are required ensuring smooth and speedy service

Face Verification

Face verification is used to detect faces present in an image and match them against a benchmark image of a person.

  1. Identity verification before every job: Before starting a new job, every service professional clicks a selfie, uploads it and gets verified through our Face Verification pipeline
Image 1: Face Verification system

Face Identification

Face Identification is the process of determining the identity of a given face. In layman terms, it basically finds who the person is in the image.

· Group Attendance: For group jobs, a group photo is uploaded where all the faces are detected and matched with a pool of service professional / helper images present in our database. This helps in identification of each and every service professional / helper delivering the group job

· Stopping onboarding of blacklisted service professionals: During onboarding a service professional, their images are matched with the images of blocked service professionals so that no blocked professional is onboarded again

Image 2: Face Identification System

Face — Detection, Verification & Identification Pipeline

Image 3: Face — Detection, Verification & Identification Pipeline

Image Quality Check

The images uploaded by the service professionals are sometimes of poor quality. We expect the image to be well lit and clear but we often come across noisy images which need to be filtered out before processing them through the pipeline.

This can happen due to following reasons:

  1. Bad Lighting: The selfie taken by the service professional is in low-light conditions. We use the mean pixel value of the image to decide whether the image is dark or bright. If the value is below the threshold (tuned on our dataset), we call it a dark image and ask the service professional to upload the selfie in a well-lit area.
Image 4: Response for dark image

2. Blurry Image: Sometimes while taking the selfie, the phone is not stable or the camera is dirty which makes the image blurred. Variance of the Laplacian filter is used to decide the sharpness of the image. If this score is below the threshold (tuned on our dataset), we call it a blurry image and ask the service professional to upload a clear selfie.

Image 5: Response for blurry image

3. Mask on Mouth: If there is a mask covering the mouth, it is very difficult to identify the person in the image. Any obstruction to facial features might lead to incorrect recognition. Hence, it is important that the mask be brought to the chin level or removed to get accurate results. To detect the mask, we use our in-house mask classifier to check if the mask is present and obtain its position on the face. If the mask is present and blocking the face, we ask the service professional to re-upload a selfie by bringing down the mask at chin level.

Image 6: Response for image with mask on mouth

Face Detection

To detect the face in the selfie uploaded by the service professional, we use the Multi-Task Cascaded Convolutional Neural Network or MTCNN. The model consists of three stages or three convolutional neural networks (CNN): P-Net (Proposal Network), R-Net (Refine Network) and O-Net (Output Network).

Image 7: MTCNN pipeline (Source: MTCNN paper)

The first step in the MTCNN is to create an image pyramid to detect faces of all different sizes. It creates different copies of the same image to search for faces of different sizes in the image.

Image 8: Detection of faces of different size

The image is passed through the P-Net which is used to obtain the bounding box coordinates along with the confidence level for each bounding box containing a face. The bounding boxes from the previous network are input to another CNN called R-Net. The R-net further reduces the number of false candidates by rejecting boxes with lower confidence and predicts more accurate bounding boxes. The more accurate bounding boxes are fed to the O-Net. At this point, only one bounding box is left for each face in the image. The output of the O-Net, which is the final stage in face detection, includes: the coordinates of the bounding box, the coordinates of the facial landmarks (eyes, nose & mouth) and the confidence level for all the detected faces in the image.

Feature Extraction

After we localise the face in the image, we extract key features from it. A feature is a measurable part of the image that is unique to the face or any object. A good feature is used to distinguish objects from one another. For example, if you are given an image of a leg and you are asked to guess whether it is an animal or an aeroplane, your immediate response would be animal as leg is a strong feature to distinguish between animal and aeroplane but if you have to guess which animal it is, you might ask for extra features as leg is not a strong feature here. For the face, key features are eyes, nose, mouth and the relative position to each other. These features are encoded in a vector (embedding) which is used to represent the face. In this manner, we obtain a unique identifier in the form of an embedding for each service professional.

Image 9: Feature Extraction

To generate face embeddings, we utilise the face encodings method of face recognition API created by Adam Geitgey. It generates embeddings using a ResNet network with 29 convolutional layers. It is essentially a version of the ResNet-34 network with a few layers removed and the number of filters per layer reduced by half. The exact approach to generate the face embeddings being used was invented by researchers at Google in 2015.

The benefit with this method is that it gives us representational efficiency by achieving face-recognition performance using only 128 dimensional vector per face instead of varying image sizes as uploaded by the provider.

Face Verification

Comparison with benchmark images

After the selfie has cleared the quality checks, it is passed through our face verification pipeline. The selfie is passed to the MTCNN model for face detection. If the face is located, we use the face_recognition API to generate the embeddings for the uploaded selfie. Now comes the interesting part, we compare the embeddings generated with the embeddings of the same person already present in our database. We use 5 recent images of each provider and use it to generate embeddings for the comparison. These are called the benchmark embeddings. At last, we use the Cosine similarity metric to calculate the similarity score between the uploaded selfie and benchmark images.

Match or Mismatch?

So now we have 5 scores, for each 5 pairs. With thorough evaluation, it was decided to use the mean of these five scores as an aggregate for our final score. If the score is greater than the threshold (tuned on our dataset), we classify it as a match else mismatch.

Image 10: Comparison with benchmark images

Face Identification

After an image has passed the quality checks, face detection is carried out followed by generation of face embeddings. But unlike face verification, the embeddings generated for every face detected in the image is compared with a pool of registered face embeddings. Since the number of registered embeddings will explode with scale, we narrow down our eligible embedding search pool by applying some hard filters like gender, state, category etc. Cosine similarity is used to calculate the similarity score between faces detected in the uploaded image and registered images.

As we have the similarity scores between the uploaded image and registered images, we use the threshold to decide for the face match. If the similarity score is greater than the threshold, we return the matched id with the maximum score. If no such id is found for the given face, it is marked as unidentified. The final output is:

· Total face count in the image

· Count and details of identified faces

· Number of unidentified faces and their position in the image

Impact

Utilising the face detecting power of MTCNN and representational efficiency of Face recognition API, we are able to build a face-verification system having a confidence of 99.4% that the person delivering the service is the same as the registered service professional with majority of the mismatches being due to the poor quality photos. Also, we are able to ensure speedy and quality outputs in jobs requiring more than one service professional.

This follows with the core theme of UC by building trust with every job being delivered while maintaining a high level of quality and efficiency.

About the authors

Aditya Jaiswal

A data analyst by profession, a cricketer by passion. Aditya works in the data science team and has trained multiple ML models to help ensure safety compliance. He is also a big Marvel fan.

Resham Wadhwa

Resham currently leads multiple data science projects at Urban Company. She is a bibliophile, hobbyist photographer, data nerd and a travel enthusiast with a knack for yoga.

Sounds like fun?
If you enjoyed this blog post, please clap 👏(as many times as you like) and follow us (@UC Blogger) . Help us build a community by sharing on your favourite social networks (Twitter, LinkedIn, Facebook, etc).

You can read up more about us on our publications —
https://medium.com/uc-design
https://medium.com/uc-engineering
https://medium.com/uc-culture

If you are interested in finding out about opportunities, visit us at http://careers.urbancompany.com

--

--

UC Blogger
Urban Company – Engineering

The author of stories from inside Urban Company (owner of Engineering, Design & Culture blogs)