Video KYC — Part IV

Advanced facial attributes analysis

11 min readJan 14, 2023

This article is the fourth part of the “Video KYC” series. Part I demonstrated how to access a webcam and capture the video stream from it. Part II is about face detection — the process of examining either a photo or a video in order to distinguish faces from any other objects in the background. Part III is dedicated to face recognition — computer technology that can identify people. Part IV is about facial attributes analysis— algorithms to analyze and extract detailed information about a person’s face, like physical appearances, such as age, gender, and ethnicity, as well as behavioral characteristics, such as emotions. We will explore some deep learning approaches that can be used for advanced facial attributes analysis, and we’ll see how to integrate them into video KYC projects.

Age and Gender classification, Adience benchmark

In 2015 Gil Levi and Tal Hassner published the paper “Age and Gender Classification Using Convolutional Neural Networks”. They proposed the same architecture for both gender and age classification: three convolutional layers, each followed by RELU operation and pooling layer, and two fully-connected layers. The first two layers also follow normalization using local response normalization.

Illustration of our CNN architecture proposed by Gil Levi and Tal Hassner for age and gender classification, taken from the original paper.

The network has been trained on the newly released Adience benchmark for age and gender classification of unfiltered face images. These images represent some of the challenges of age and gender estimation from real-world, unconstrained images. This database has gender labels (male/female) and age group labels (0–2, 4–6, 8–13, 15–20, 25–32, 38–43, 48–53, 60-).

The researchers provided the CNN models for age and gender classification used in the paper. The related downloads include the following:

A git repository to help reproduce the results.
Caffe model for age classification and deploy prototext.
Caffe model for gender classification and deploy prototext.
The mean image.
A Gist page for trained models now appears in the BVLC/Caffe Model Zoo.
A 3rd party Tensorflow reimplementation of age and gender network.

In the same year, Tal Hassner and Gil Levi published “Emotion Recognition in the Wild via Convolutional Neural Networks and Mapped Binary Patterns” paper. They proposed novel transformations of image intensities to 3D spaces, designed to be invariant to monotonic photometric transformations, and applied them to CASIA Webface images which are then used to train an ensemble of multiple architecture CNNs on multiple representations. Each model is then fine-tuned with limited emotion-labeled training data to obtain final classification models. Their method was tested on the Emotion Recognition in the Wild Challenge (EmotiW 2015), Static Facial Expression Recognition sub-challenge (SFEW) and shown to provide a substantial, 15.36% improvement over baseline results (40% gain in performance).

Now we will see how to run pre-built age, gender, and emotion Caffe models within OpenCV in Python.

I’ve downloaded the age, gender, and emotion models with prototxt and mean files from the Google drive referenced above and put them in the local folder. Then I initialized all the models' objects and set up the lists of age ranges, genders, and emotions. As you’ve seen in previous posts, OpenCV handles building different models, including Caffe models with its dnn module.

ageProto="age_deploy.prototxt"
ageModel="age_net.caffemodel"
genderProto="gender_deploy.prototxt"
genderModel="gender_net.caffemodel"
emoProto ="deploy.prototxt" 
emoModel="EmotiW_VGG_S.caffemodel"

emoNet =cv2.dnn.readNet(emoModel,emoProto)
ageNet=cv2.dnn.readNet(ageModel,ageProto)
genderNet=cv2.dnn.readNet(genderModel,genderProto)

ageList=[‘(0–2)’, ‘(4–6)’, ‘(8–12)’, ‘(15–20)’, ‘(25–32)’, ‘(38–43)’, ‘(48–53)’, ‘(60–100)’]
genderList=[‘Male’,’Female’]
emoList = [ 'Angry' , 'Disgust' , 'Fear' , 'Happy'  , 'Neutral' ,  'Sad' , 'Surprise']

The following function detects faces and then applies age, gender, and emotion detection models to the found faces.

def detectAgeGender(frame, faceDetector,ageNet, genderNet,emoNet):
    
    #detect and highlight faces
    resultImg,faceBoxes = faceDetector(frame)        
    
    if not faceBoxes:
        print("No face detected")
    
    padding=20    
       
   for faceBox in faceBoxes:
        faceBox = list(faceBox)
        #print(faceBox)
        face=frame[max(0,faceBox[1]-padding):
                   min(faceBox[3]+padding,frame.shape[0]-1),max(0,faceBox[0]-padding)
                   :min(faceBox[2]+padding, frame.shape[1]-1)]

        blob=cv2.dnn.blobFromImage(face, 1.0, (227,227), MODEL_MEAN_VALUES, swapRB=False)        
        genderNet.setInput(blob)
        genderPreds=genderNet.forward()
        gender=genderList[genderPreds[0].argmax()]       
        
        blob=cv2.dnn.blobFromImage(face, 1.0, (227,227), MODEL_MEAN_VALUES, swapRB=False, crop=False)
        ageNet.setInput(blob)
        agePreds=ageNet.forward()
        age=ageList[agePreds[0].argmax()]        
           
        
        blob=cv2.dnn.blobFromImage(face, 1.0, (227,227), MODEL_MEAN_VALUES, swapRB=False, crop=False)
        emoNet.setInput(blob)
        emoPreds=emoNet.forward()
        emo=emoList[emoPreds[0].argmax()]
       
        cv2.putText(resultImg, f'{gender}, {age}, {emo}', (faceBox[0], faceBox[1]-10), cv2.FONT_HERSHEY_SIMPLEX, 0.0009*resultImg.shape[1], (255,0,255), int(0.004*resultImg.shape[1]), cv2.LINE_AA)        
        
        plt.imshow(cv2.cvtColor(resultImg, cv2.COLOR_BGR2RGB))
        
    return resultImg

Now let's see the results with dlib (HOG and Linear SVM) as a face detector:

Looks reasonable (in real life both people are about 35 at this period of time, but gender and emotion have been predicted correctly)

Now let’s see what happens if we’ll use another face detector, for example, Haar-Cascade of OpenCV:

Now age-gender detector was right about women but didn’t work correctly at all on a man.

Finally, let’s check the same flow with OpenCV dnn-based detector:

Now gender detector works correctly. As you can see, age and gender detectors are very sensitive to input.

And what about the elder person?

It worked well except maybe facial expression (The 12th Doctor is grumpy by default, but I’m not sure he is angry in this scene).

Age and Gender classification, IMDB-WIKI

In the same 2015 year researchers of the Swiss Federal Institute of Technology in Zurich (ETH Zurich) built their age and gender prediction state-of-the-art models and shared the structure and pre-trained weights of the production models for the Caffe framework. Their networks used VGG-16 architecture and have been pre-trained on ImageNet for image classification and fine-tuned on IMDB-WIKI , the largest publicly available dataset of face images with gender and age labels for training.

I’ve downloaded the age and gender models with prototxt files from the supplied shareable link and put them in the local folder. The initialization of model objects and detectAgeGender function is more or less the same as in the previous example; the main difference is in input (224x224 vs 227x227) and output (age is 101-dimensional vector vs 8-dimensional) shapes.

Below there is the code for detection and the results of applying this code to images with faces:

ageProto1="data/age1.prototxt"
ageModel1="data/age1.caffemodel"
genderProto1="data/gender1.prototxt"
genderModel1="data/gender1.caffemodel"
ageNet1=cv2.dnn.readNet(ageModel1,ageProto1)
genderNet1=cv2.dnn.readNet(genderModel1,genderProto1)

def detectAgeGenderWiki(frame, faceDetector, ageNet, genderNet):
    
    #detect and highlight faces
    resultImg,faceBoxes = faceDetector(frame)        
    
    if not faceBoxes:
        print("No face detected")
    
    padding=20    
    
 
    
    for faceBox in faceBoxes:
        faceBox = list(faceBox)
        #print(faceBox)
        face=frame[max(0,faceBox[1]-padding):
                   min(faceBox[3]+padding,frame.shape[0]-1),max(0,faceBox[0]-padding)
                   :min(faceBox[2]+padding, frame.shape[1]-1)]

        blob=cv2.dnn.blobFromImage(face, 1.0, (224,224), MODEL_MEAN_VALUES, swapRB=False)
        
        genderNet1.setInput(blob)
        genderPreds=genderNet1.forward()
        
        
        
        gender='Female' if np.argmax(genderPreds) == 0 else 'Male'      
        print(f'Gender: {gender}')

        ageNet1.setInput(blob)
        agePreds=ageNet1.forward()
        #age=ageList[agePreds[0].argmax()]
        
        output_indexes = np.array([i for i in range(0, 101)])
        age = int(np.sum(agePreds * output_indexes))        
        print(f'Age: {age} years')    
       
        cv2.putText(resultImg, f'{gender}, {age}', (faceBox[0], faceBox[1]-10), cv2.FONT_HERSHEY_SIMPLEX, 0.0009*img.shape[1], (255,0,255), int(0.004*img.shape[1]), cv2.LINE_AA)        
        
        plt.imshow(cv2.cvtColor(resultImg, cv2.COLOR_BGR2RGB))
        
    return resultImg

Let’s see how it works with dlib at the backend:

As we can see, the age-prediction model really flatters people. And it will flatter in some way whatever face detector it is, but we can see something positive in it: the models are not so sensitive to face detectors.

Let’s see how it work on elder man:

Age, Gender, Emotions and Race classification, deepface

Sefik Ilkin Serengil offers age and gender prediction as part of his deepface package. The proposed age and gender prediction models are based on the paper of Researchers of the Swiss Federal Institute of Technology we’ve seen in the previous paragraph, but they are trained in Keras and use VCG-Face trained on the same facial data as a base model.

In addition, deepface supports facial expression recognition and race and ethnicity prediction functionality.

The facial expression recognizer is CNN build with Keras using TensorFlow backend, trained from scratch on Fec2013 dataset.

Race and ethnicity predictor is a VGG-Face-based model, trained (with transfer learning) on the merged data set. One of the data sets is FairFace, a large-scale data set that consists of 86K train and 11K test instances. Its labels are East Asian, Southeast Asian, Indian, Black, White, Middle-Eastern, and Latino-Hispanic. The second one is UTKFace, a small-scale data set that has 10K instances. Its labels are Asian, Indian, Black, White, and Others (Latino and Middle Eastern).

The code is simple: we just call the appropriate method (it accepts a path to the file or image itself) and get back the dictionary structure with all the necessary information, including the coordinates of the face detected (if face detection is enabled)

Some important notes:

By default function analyze will do face detection
The default detection algorithm is OpenCV HaarCascade classifier
There is a limitation: the analyzer will detect and analyze only one face, so if you need a multi-face detector and analyzer, you should do face detection on your own and then pass the already preprocessed face region to analyze function in the loop.

Here is the code for face analysis with built-in face detection

def faceAnalyzer(path,features):
    
    #detect and highlight faces
    resultImg = cv2.imread(path)
    demography = DeepFace.analyze(resultImg,actions=features,enforce_detection=True )   
    print(demography)
    age = demography.get("age","")
    gender= demography.get("gender","")
    emotion = demography.get("dominant_emotion","")
    race = demography.get("dominant_race","")    
       
    output_str = ",".join([i for i in [str(age),gender,emotion,race]  if i!=''])    
    
    x=demography['region']['x']
    y=demography['region']['y']
    w=demography['region']['w']
    h=demography['region']['h']
    
    cv2.rectangle(resultImg, (x,y), 
                  (x+w,y+h), (0,255,0), int(round(resultImg.shape[1]/150)), 8)
        
    cv2.putText(resultImg, output_str, (x, y-10), cv2.FONT_HERSHEY_SIMPLEX, 0.0009*resultImg.shape[1], (255,0,255), int(0.004*resultImg.shape[1]), cv2.LINE_AA)        
    plt.imshow(cv2.cvtColor(resultImg, cv2.COLOR_BGR2RGB))

        
    return resultImg

And here are the results:

As you can see, only one face is handled, age, gender, and emotion are predicted correctly, while race detection is mistaken.

Let’s see the function with a custom face detector (OpenCV Haar Cascade in this example):

def faceAnalyzerWithCustomDetector(path,faceDetector,features):
    
    #detect and highlight faces
    resultImg = cv2.imread(path)
    resultImg,faceBoxes=faceDetector(resultImg)
    padding=20
    for faceBox in faceBoxes:            
        face=resultImg[max(0,faceBox[1]-padding):
                   min(faceBox[3]+padding,resultImg.shape[0]-1),max(0,faceBox[0]-padding)
                   :min(faceBox[2]+padding, resultImg.shape[1]-1)]
        
        demography = DeepFace.analyze(face,actions=features,enforce_detection=False,detector_backend="skip")       
        age = demography.get("age","")
        gender= demography.get("gender","")
        emotion = demography.get("dominant_emotion","")
        race = demography.get("dominant_race","")    
       
        output_str = ",".join([i for i in [str(age),gender,emotion,race]  if i!=''])    
    
        x=demography['region']['x']
        y=demography['region']['y']
        w=demography['region']['w']
        h=demography['region']['h']
    
        
        cv2.putText(resultImg, output_str, (faceBox[0], faceBox[1]-10), cv2.FONT_HERSHEY_SIMPLEX, 0.0009*resultImg.shape[1], (255,0,255), int(0.004*resultImg.shape[1]), cv2.LINE_AA)        
    plt.imshow(cv2.cvtColor(resultImg, cv2.COLOR_BGR2RGB))

        
    return resultImg

Here both faces are analyzed. Although we used the Haar-Cascade detector, like in the built-in face detector, the results are still different, probably because the deepface functionality additionally preprocesses faces (different padding, color changing, etc).

The results are correct, the emotions detected as “surprise” and not “fear”, but even humans without knowing the context cannot tell exactly what it is).

And regarding the elder man: the age detector flattered the person, but the other detectors are correct, even the emotions detector — as I’ve mentioned before, the 12th Doctor is grumpy by default, it is his neutral facial expression.

Live facial attribute analysis on the video stream

Now let’s see how facial attributes analysis can be implemented on an online video stream.

Let’s start with the models, developed by Israeli researchers; below is the code that uses appropriate models and recorded video:

#create and initialize face detectior object, dlib HOG + LInearSVM is choosen as a backend
face_detector = FaceDetector.build_model('dlib')
#create video captioning object
cap = cv2.VideoCapture(0)
fourcc = cv2.VideoWriter_fourcc(*'XVID')
out = cv2.VideoWriter('demo_eagr.avi', fourcc, 10.0, (640,  480))

frameCount=0
previousFrame = None
while True:    
    result, frame = cap.read()   
    frameHeight=frame.shape[0]
    frameWidth= frame.shape[1]
    framec=frame   
    framec= detectAgeGenderEmo(frame,OpenCV_DNN_highlight_face, ageNet,genderNet,emoNet) if frameCount%10==0 else previousFrame
    previousFrame = framec
        #frames.append(frqame)   
    out.write(framec)
    cv2.imshow("frame", framec) # This will open an independent window
    frameCount+=1
    if cv2.waitKey(1) & 0xFF==ord('q'): # quit when 'q' is pressed        
        break
# close the already opened camera
cap.release() 
out.release()
# Destroy all the windows        
cv2.destroyAllWindows()

Below is the code and the video for facial attributes analysis on live video stream with deepface:

#create and initialize face detectior object, dlib HOG + LInearSVM is choosen as a backend
face_detector = FaceDetector.build_model('dlib')
#create video captioning object
cap = cv2.VideoCapture(0)
fourcc = cv2.VideoWriter_fourcc(*'XVID')
out = cv2.VideoWriter('demo_eagr_deepface.avi', fourcc, 10.0, (640,  480))

frameCount=0
previousFrame = None
while True:    
    result, frame = cap.read()   
    frameHeight=frame.shape[0]
    frameWidth= frame.shape[1]
    framec=frame   
    framec= faceAnalyzerWithCustomDetector(framec,OpenCV_DNN_highlight_face, ["age","gender","emotion","race"]) if frameCount%10==0 else previousFrame
    previousFrame = framec
        #frames.append(frqame)   
    out.write(framec)
    cv2.imshow("frame", framec) # This will open an independent window
    frameCount+=1
    if cv2.waitKey(1) & 0xFF==ord('q'): # quit when 'q' is pressed        
        break
# close the already opened camera
cap.release() 
out.release()
# Destroy all the windows        
cv2.destroyAllWindows()

The code is very simple, it is the same for both cases, except for function calls.

In addition, deepface comes with API for face recognition and analyzing the video on online video streams.

Function stream enables the webcam and starts the streaming and analyzing process. Face recognition and analysis are done when that face is clearly and strictly frontally located against the camera.

The code (yes, this single line of code):

DeepFace.stream('data/db')

And the screenshot. You can see, that if we supply the location with the images, facial recognition is done as well.

What to choose

First of all, let’s see when and why we may need facial attributes.

Age, gender, maybe race:

face recognition steps failed or returned unclear results, we need more information to identify or verify a person

Emotions:

if the person is angry or afraid of something, maybe there is a potential to scam
can be used in the liveness detection stage (if the emotion is the same on each frame some flag can be raised)

We’ve seen in the previous section that the performance of facial attributes analysis is more or less the same, whatever models we’ve used. In addition, facial attributes analysis it is very slow to be used online constantly. stream function of deepface is great, but we need access to lower-level functionality. We can decide on explicitly working with deepface facial attribute analysis models, but we’ll apply them on demand, only if we really need additional information (or under some other condition). Another option: if some red flags have been raised during the video call, we can analyze the recorded video later offline, with all models we have, and will apply a majority vote in the case of uncertainty.

Summary

In this chapter, we’ve seen deep-learning-based state-of-the-art algorithms and techniques for facial attribute analysis. We reviewed the methods, and the main Python libraries frameworks/packages offered for face processing.

In the next part, we will learn, how to extract data from ID documents with a special emphasis on text extraction and processing.

Actimize

NICE Actimize leverages machine learning and AI to detect and prevent financial crimes across the financial services industry, including some of the largest global FIs. Our AI and analytics teams create models to detect anomalous activities associated with AML, Fraud, and market abuse.

The NICE Actimize KYC/CDD solution uses the latest technological innovations to provide complete customer lifecycle risk coverage — accounting for customer onboarding, ongoing due diligence, and enhanced due diligence (EDD) processes.