Video KYC — Part III

10 min readNov 17, 2022

Face recognition

This article is the third part of the “Video KYC” series. In Part I it was demonstrated how to access a webcam and capture the video stream from it. Part II is about face detection — the process of examining either a photo or a video in order to distinguish faces from any other objects in the background. Part III is dedicated to face recognition - computer technology that can identify people. We will explore some deep learning approaches, which can be easily integrated into video KYC projects to achieve state-of-the-art face recognition results.

Face recognition with dlib

A high-quality face recognition algorithm built with deep learning is offered by Dlib library, presented in Part II. The implementation has a state-of-the-art accuracy (99.38%) on the Labeled Faces in the Wild benchmark. The implementation of this algorithm is a version of the ResNet-34 network proposed in the paper Deep Residual Learning for Image Recognition (2016) with a few layers removed and the number of filters per layer reduced by half. The created model can be downloaded from https://github.com/davisking/dlib-models/blob/master/dlib_face_recognition_resnet_model_v1.dat.bz2.

The network was trained from scratch on a dataset of about 3 million faces. This dataset is derived from a number of datasets: the face scrub dataset, the VGG dataset, and a large number of images scraped by Davis King, the author of Dlib, from the internet.

The training was performed using triplets — sets of three images, where two of them correspond to the same person and a third — to a different person. The network generated the 128D face descriptor (embedding) for each of the images, slightly modifying the neural network weights in order to make the two vectors that correspond to the same person closer and the feature vector from the other person further away.

After repeating this step millions of times for millions of images of thousands of different people, the neural network learned to reliably generate 128D embeddings for each person. Different photos of the same people should give roughly the same measurements. We can use these embeddings to perform face recognition.

Now let's code.

First of all, import all necessary libraries:

import matplotlib.pyplot as plt
import dlib
import cv2
import numpy as np

A face recognition pipeline is a series of following steps: detection, alignment, representation (or encoding), and verification (face recognition itself).

Detection

The following line of code initializes HOG + linear SVM face detection of Dlib package. You can read about this method specifically and other methods and packages in Part II.

detector = dlib.get_frontal_face_detector()

Alignment

Now we have to have to make sure face recognition will work accurately on faces turned in different directions (for example, during video calls people naturally turn their heads toward speakers). We are going to use an algorithm called face landmark estimation, more precisely, it’s Dlib implementation based on the paper by Vahid Kazemi and Josephine Sullivan proposed in 2014. The basic idea is to find specific points (called landmarks) that exist on every face — the top of the chin, the outside edge of each eye, the inner edge of each eyebrow, etc. It can be done by training a machine learning model. After those points are found, a face can be aligned according to them via linear transformations.

Dlib offers a number of landmark models:

shape_predictor_5_face_landmarks.dat.bz2, identifies the corners of the eyes and bottom of the nose
shape_predictor_68_face_landmarks.dat.bz2 and shape_predictor_68_face_landmarks_GTX.dat.bz2, identify 68 points on the faces.

The following line of code generates a so-called pose predictor object based on 5 face landmarks model. This object allows later to calculate face embedding using the detected landmarks.

pose_predictor_5_point = dlib.shape_predictor("data/shape_predictor_5_face_landmarks.dat")

Encoding

The following line of code generates a face-encoder object

face_encoder = dlib.face_recognition_model_v1("data/dlib_face_recognition_resnet_model_v1.dat")

Now we’ll create a function that takes an image as input and returns its embedding.

Face detection is done with the next line of code:

face_locations = detector(face_image, number_of_times_to_upsample)

Then we generate 5-point landmarks for each located face:

raw_landmarks = [pose_predictor_5_point(face_image, face_location) for face_location in face_locations]

And calculate the face encoding for every detected face using the landmarks:

embeddings = [np.array(face_encoder.compute_face_descriptor(face_image, raw_landmark_set, num_jitters)) for
            raw_landmark_set in raw_landmarks]

Let's wrap these three lines of code into a function

def face_encodings(face_image, number_of_times_to_upsample=1, num_jitters=1):
    # Detect faces:
    face_locations = detector(face_image, number_of_times_to_upsample)
    # Detected landmarks:
    raw_landmarks = [pose_predictor_5_point(face_image, face_location) for face_location in face_locations]
    raw_landmarks[0]    
    # Calculate the face encoding for every detected face using the detected landmarks for each one:
    embeddings =[np.array(face_encoder.compute_face_descriptor(face_image, raw_landmark_set, num_jitters)) for
            raw_landmark_set in raw_landmarks]
    return embeddings

Face recognition

Every call to face_encodings will return a list of (128,) vectors. As we learned, close vectors mean the photos most likely belong to the same person. The function calculate_distances returns the euclidian distances between a list of face encodings and a candidate to check (actually it can be any similarity distance measure). In order to make a conclusion about “the same/ different person”, we just compare those distances to some pre-defined threshold.

def calculate_distances(face_encodings, encoding_to_check):
    return list(np.linalg.norm(face_encodings - encoding_to_check, axis=1))

Let’s see how it works:

The photos below belong to the same actor (Matt Smith), the first as 11th Doctor (Doctor Who), and the second one as prince Daemon Targaryen (House of the dragon)

Matt Smith as Daemon Targaryen(The House of dragons) — Matt Smith as Daemon Targaryen (The House of Dragon)

The code below reads the image, calculates its embedding, and displays the image:

def read_and_calculate(path)
     # Convert image from BGR (OpenCV format) to RGB (dlib format):
     known_image = cv2.cvtColor(cv2.imread(path), cv2.COLOR_BGR2RGB)
     encodings = face_encodings(known_image)
     plt.imshow(known_image)
     return encodings

Now let's call calculate_distances:

known_image = read_and_calculate("data/dw_11_happy.jpg")
unknown_image = read_and_calculate("data/DT2.jpg")
result = calculate_distances(known_image, unknown_image[0])

The result is 0.44. Let’s assume that our threshold for “same/different” is 0.6 (it is the default value in many packages). The calculated distance is smaller than the threshold, so we can conclude that both photos belong to the same person.

In order to implement a face recognition system we can build a “faces” database by storing per each person the embeddings of their face. So if we get some unknown photo, we can compare the embedding of the faces detected on it to the embeddings in the database. The records with the closest embeddings will be what we search for.

Face recognition with face_recognition

In 2017 Adam Geitgey released a face recognition library called face _recognition; it is based on dlib functionality and provides easy-to-use API.

The installation is done via pip install, but you have to install dlib first. The library comes with all pre-trained models required, you don’t have to download them separately. According to the author, you need macOS or Linux (Windows is not officially supported), but I’ve used it on Windows 10 and it works perfectly. If you don’t want to deal with installation, you can download a pre-configured virtual machine with face_recognition, OpenCV, TensorFlow, and lots of other deep learning tools pre-installed or docker image.

Now let's see the code.

Import face_recognition library:

import face_recognition

You can still use imread of opencv , but if you don’t want to handle color scheme transitions, you can use the method load_image_file, offered by face_recognition:

known_image = face_recognition.load_image_file(“data/dw_10_1.jpg”)

Method face_encodings returns the embeddings of all the faces found in the image. Yes, with just this single line of code. You don’t have to worry about face detection and face alignment, face_encoding function does it for you. 68-points face landmarks model is used by default, and it can be overridden.

known_encoding = face_recognition.face_encodings(known_image)

After you get all your faces embeddings, you can start to compare faces

face_recognition.compare_faces(known_encoding, unknown_encoding[0])

The faces comparison is done by calculating a euclidian distance between compared embeddings and if the distance is below the threshold (0.6 by default), it is evidence that it is the same person.

If you want to implement your own comparison algorithm, you can call face_distances method explicitly. Make attention that the first argument should be a list of vectors and not a single vector.

distances = face_recognition.face_distance([known_encoding], unknown_encoding)

Face recognition with deepface

Deepface is a lightweight face recognition and facial attribute analysis (age, gender, emotion, and race) framework for python. It is a hybrid face recognition framework wrapping state-of-the-art models: VGG-Face, Google FaceNet, OpenFace, Facebook DeepFace, DeepID, ArcFace, Dlib and SFace.

Installation is very easy:

pip install deepface

So as importing:

from deepface import DeepFace

And even face comparison:

DeepFace.verify(img1_path = "data/dean_face2.png", img2_path = "data/dean_2.png", model_name='Dlib')

This single line of code reads two images, calculates their embeddings, and compares them with the cosine similarity distance metric (which can be overridden).

The output of this method is the json record containing the result of the comparison, the normalized distance between two vectors, threshold, face recognition, and face detection model names, and the type of similarity metric used.

{'verified': True, 'distance': 0.04013044376471997, 'threshold': 0.07, 'model': 'Dlib', 'detector_backend': 'opencv', 'similarity_metric': 'cosine'}

Another very useful method is supplied for us: we can search through images on the driver in order to find a most similar image with this single line of code:

#face recognition
df = DeepFace.find(img_path = "img1.jpg", db_path = "C:/workspace/my_db", model_name = 'Dlib')

If you want to implement your own faces finder/comparator, you can use represent method that just returns embeddings.

#face embeddings
df = DeepFace.represent(img_path = "img1.jpg", model_name = 'Dlib')

As you’ll see in the “Live face recognition on the video stream” section, more “low-level” API is also supplied.

What to choose

Face_recognition has a nice easy_to_use API, and if you want to use dlib face recognition model, it is preferable over pure dlib API. If you want access to a variety of state-of-the-art models and don’t afraid of possible bugs (the library is rather new and it is still updated from time to time), you’ll probably stick with deepface, especially since it offers not only facial recognition functionality but also facial attributes analysis. Currently, I’ll also choose the deepface with Dlib detector and recognizer backend, mostly because I want to keep code as simple as possible, and (spoilers, again!!) facial attributes analysis is the topic of my next article.

Live face recognition on the video stream

Now let’s see how face recognition can be implemented on an online video stream. For this demo, I’ve created a toy database that includes the name and the face embedding of two people: Rose Tyler and myself. The code for database generation is given below. As you can see I call represent method of deepface framework to get face embeddings.

db = {"id": [1,2],
      "name": ["Elina Maliarsky", "Rose Tyler"],
      "path" : ["data/me_fear.png","data/rose_1.jpg"]      
     }df=pd.DataFrame(db)
df["embedding"]=df.apply(lambda r: np.array(DeepFace.represent(r["path"], model_name='Dlib')) ,axis=1)

Then I’ll create and initialize FaceDetector object with dlib Hog and LinearSVM detector. I do it before the live face recognition loop — it can take a while and it is not what we want during the live sessions.

face_detector = FaceDetector.build_model('dlib')

The code responsible for capturing the video stream, calling the appropriate handling method, and saving the changed video is the same code we’ve seen in Part II. What is new, is the “face-handling” code: now we need not only to detect the faces and surround them with a rectangular border but also to find, who are the people in the frame and to write down their names. First, we detect the faces using the face detection model, and then to each detected face apply the following procedure:

draw the surrounding rectangular border
calculate embedding using a face recognition model
find the closest embedding in the database and retrieve the person’s data we want to display
display the person’s data (for example, name)

def recognize(frame,face_detector):
    img=frame
    frameHeight=img.shape[0]
    frameWidth=img.shape[1]
    faces = FaceDetector.detect_faces(face_detector, 'dlib', img)
    for face,(x, y, w, h) in faces:
        cv2.rectangle(img, (x,y), (x+w,y+h), (0,0,255), int(round(frameHeight/150)), 8)
        embedding = DeepFace.represent("data/me_fear.png", model_name="Dlib")
     
        row = get_closest_row(embedding,df)
        cv2.putText(img, f'{row["name"]}', (x, y-10), 
                    cv2.FONT_HERSHEY_SIMPLEX, 0.0006*img.shape[1],
                    (255,255,255), int(0.002*img.shape[1]), cv2.LINE_AA)        
        return img

In order to find the database’s row with the closest embedding we calculate the euclidian distance between the embedding of the face from the frame and embeddings in the database and pick the row for which the distance is minimal.

def get_closest_row(embedding, df2):
# calculate the euclidian similarity  
    dist = np.linalg.norm(list(df2["embedding"]) - np.array(embedding), axis=1)
# Get the index of the maximum value in the euclidian similarity (minimum distance)
    index = np.argmin(dist)
# Get the row from df1 with the maximum similarity (minimum distance)
    row = df2.iloc[index]
# Return the row
    return row

Putting everything together:

#create and initialize face detectior object, dlib HOG + LInearSVM is choosen as a backend
face_detector = FaceDetector.build_model('dlib')
#create video captioning object
cap = cv2.VideoCapture(0)
fourcc = cv2.VideoWriter_fourcc(*'XVID')
out = cv2.VideoWriter('demo_fr.avi', fourcc, 10.0, (640,  480))while True:    
    result, frame = cap.read()   
    frameHeight=frame.shape[0]
    frameWidth= frame.shape[1]
    framec= recognize(frame,face_detector)
        #frames.append(frqame)   
    out.write(framec)
    cv2.imshow("frame", framec) # This will open an independent window
    
    if cv2.waitKey(1) & 0xFF==ord('q'): # quit when 'q' is pressed        
        break
# close the already opened camera
cap.release() 
out.release()
# Destroy all the windows        
cv2.destroyAllWindows()

And here is the recording of the live demo. For smoother flow, we can consider n-sampling: applying face recognition to each nth frame.

Face recognition on live stream demo

Summary

In this chapter, we’ve seen deep-learning-based state-of-the-art algorithms and techniques for face recognition. We reviewed the methods, and the main Python libraries frameworks/packages offered for face processing. More specifically, dlib, face_recognition, and deepface, were introduced in the context of face processing.

In the next part, we will see how to analyze facial attributes in order to implicitly retrieve more information from the photos/videos.

Actimize

NICE Actimize leverages machine learning and AI to detect and prevent financial crimes across the financial services industry, including some of the largest global FIs. Our AI and analytics teams create models to detect anomalous activities associated with AML, Fraud, and market abuse.

The NICE Actimize KYC/CDD solution uses the latest technological innovations to provide complete customer lifecycle risk coverage — accounting for customer onboarding, ongoing due diligence, and enhanced due diligence (EDD) processes.