Exploring Other Face Recognition Approaches (Part 3) — DREAM

Published in

Analytics Vidhya

5 min readSep 11, 2020

In these series of articles, we are exploring various other approaches of recognizing faces rather than the common ones. In the previous articles
(Part 1 and Part 2), we discussed about Cosface and Arcface.
In the final part we will be discussing about: DREAM.

We will be covering three different types for face recognition approaches:
1. CosFace
2. ArcFace
3. DREAM: Deep Residual Equivariant Mapping

Introduction

Many face recognition algorithms still performs poorly when dealing with profile faces compared to frontal faces. A major reason is that the number of training images for frontal and profile faces are highly imbalanced as there are more frontal faces and few profile faces in the dataset.
Here, we hypothesize that there is an inherent mapping between frontal and profile faces and hence their discrepancy in the deep learning space can be bridged by an equivariant mapping. A novel Deep Residual Equivariant Mapping (DREAM) block is proposed which is capable of adaptively adding residuals to the deep representation to transform the profile face to canonical pose that helps simplifying recognition.

The above figure illustrates a deep representation embedding of faces belonging to the same subject. Given an input image of arbitrary pose, we can map its feature to the frontal space through a mapping function that adds residual.
Major highlights of the DREAM block are :
1. Simple to implement. Can be integrated to existing CNN architectures through stitching the block to the base network. It does alter the original dimensionality of the face embedding and can be trained end-to-end.
2. Light weight. Adds a tiny amount of parameter and computation to the base model.
3. Helps base network to further improve their recognition for profile faces as well.

Deep Residual Equivariant Mapping

Lets first discuss about feature equivariance and from there to the DREAM block.

Feature Equivariance

It has been observed that most of the layers in deep neural networks change in an easy predictable manner with the input. And such transformation can be learned from the data.
Formally, a convolutional neural network can be regarded as a function φ that maps an image x ∈ X to a vector φ(x) ∈ Rd (d is the dimension of the feature representation). The representation φ is said equivariant with a transformation g of the input image if the transformation can be transferred to the representation output. That is, equivariance with g is obtained when there exists a map Mg : Rd → Rd such that

For the mapping function Mg to work for any image, the function would capture intrinsic geometric properties of the representations. In this problem transformation g involves 3D geometric changes from profile to front faces.

DREAM Block

Lets begin by defining the problem statement.
We denote a CNN as a function φ and the image representation it maps from image x as φ(x). We call the network a stem CNN or base network. Let’s assume that we are given two types of face images, namely frontal face image, represented as x_f and profile face image, denoted by x_p.
We want to obtain a transformed representation of a profile image x_p through a mapping function M_g , such that M_g φ(x_p) ≈ φ(x_f) [Above equation]. To facilitate the incorporation of M_g φ(x_p) to a stem CNN, we formulate it as a sum of the original profile feature φ(x_p ) with residuals
given by a residual function R(φ(x_p)) weighted by a yaw coefficient Y(x_p).

The yaw coefficient helps in handling input images of arbitrary poses.
Y(x) ∈ [0, 1] provides a higher magnitude of residual to a face the deviates more than the frontal pose.
Note: Notice that roll and pitch angles are not considered. The effect of roll will be eliminated by face alignment while face images with large pitch angles are rare and it is possible to address pitch angles by adding another branch in our DREAM block.

Architecture

The stem CNN can be of any of the existing face recognition models . A fully connected layer is then used to extract the initial representation, φ(x) [face recognition feature vector], which is subsequently ‘fixed’ by the DREAM block. DREAM block consists of 2 branches :

Residual Branch

It generates the residuals R(φ(x)). It has two fully-connected layers with Parametric Rectified Linear Unit (PReLU) as the activation function. This branch is learnable separately from the stem CNN. It is trained by minimizing the Euclidean distance between the mapped profile feature and its corresponding frontal feature using stochastic gradient descent.

where ΘR denotes the parameters of R(·). We keep the parameters fixed for the Y(·) branch.

2. Head Rotation Estimator

The second branch produces the yaw coefficient Y(x). This branch assumes an input of 21 facial landmarks. This requirement does not add any additional burden to the stem CNN since the face alignment process is a standard preprocessing step of many face recognition pipelines.
To understand it in more details please refer to paper given in reference.

Visualization of deep features. The first and third rows show the reconstructed original features of profile faces. The second and fourth row depicts the reconstructed features after the mapping by DREAM block.

Conclusion

We learned about a way to improve recognition performance for profiles faces without making much changes in the existing architecture and also not adding too much computational cost.

References

Code: http://mmlab.ie.cuhk.edu.hk/projects/DREAM/

Paper: https://arxiv.org/abs/1803.00839