PatchNET: A Simple Face Anti-Spoofing Framework.

7 min readFeb 4, 2023

Face Spoofing refers to the attempts to fool a face recognition system by presenting image cut-outs, videos, card-masks instead of live feed of people to gain illegitimate access to the system.

Why do we need Face Anti-Spoofing?

Face Recognition is widely used in biometric authentication systems for recognition and validation of an individual owing to its ease of use. The list of places where Face Recognition is used is growing rapidly including eKYC, surveillance, electronic gadgets, online assessments/interviews, attendance systems and many more.

Thus, there has been an increasing number of attempts to create methods that can bypass face recognition systems by presenting some spoof medium in front of the camera instead of the actual person.

Some of the prevalent spoofing attacks are mentioned below:

PrintedPhoto : A printed photo of the victim is presented in front of the camera.
CardMask: The attacker puts up a mask with the face of the victim.
ReplayAttack : A video of the victim is played in front of the camera.

Fig.1 Types of Face Presentation Attacks [1].

Now we can easily imagine scenarios where Face-Spoofing can become really dangerous:

Unlocking victim’s phone using their recent video from social media.
An imposter trying to give online test/interview using face mask.
Opening fake bank account online by compromising eKYC systems.

So, before running the face recognition algorithm, the images captured by the camera needs to be fed to a Face Anti Spoofing module that filters out the faulty/fraud images.

Anti Spoofing using Deep Learning:

The general idea in Deep Learning literature is to feed the input image to a CNN backbone and get the feature representation of the image which is then fed to a binary classification layer that classifies the image into Live or Spoof. The loss function generally used is Binary Crossentropy.

Fig. 2 Simple Spoof/Real Classification.

As seen in Fig. 2, after the image is captured, unnecessary background is cropped out of the image before feeding it to the network. This lets the network focus more on the facial features. The major issue here is that different images can have different proportion of face covering the image, thus the cropped image will have different sizes and will need to be resized to common dimensions. The process of resizing causes the image to either lose information while downsizing or get filled with interpolated pixels while upsampling, thus reducing the quality of the image.

In this article we are going to understand “PatchNet: A Simple Face Anti-Spoofing Framework via Fine-Grained Patch Recognition”.

Working principles of PatchNet:

Fine Grained Patch-Type recognition problem: From each of cropped faces, they extract two equi-sized patches. Thus, no matter the differences in the image sizes, there is no need of image distortion. This helps in training the network as well, as in every epoch some new patches from the same face would be cropped acting as an additional augmentation along with random horizontal flip and random rotation. During testing they sample 9 patches from the face and aggregate classification probabilities from all the patches.

Asymmetric margin based softmax loss: Instead of using Cross-entropy, this loss allowed the authors to specify margin of separation between classification boundaries, thus reducing the chances of wrong classification.

Self-supervised similarity loss: They are using this additional loss to make the network focus more on the features that are relevant for the task. This helps in regularizing the features with location and rotation invariance.

Through their experiments the authors have shown the effect of such processing techniques on the results which we will discuss below.

Demystifying the loss functions:

Lets begin the fun part and try to understand the loss function introduced in the paper. It may seem intimidating, but trust me its just about employing a few simple tricks and we’ll get there.

Equation (1) in Fig.4 is our well known soft-max loss function, wherein the numerator is the exponential of the dot product between the feature representation (fi) and the weights (W) in the final classification layer corresponding to the correct class label (yi) of the i’th data sample. The denominator is similarly the sum of exponentials of such dot products of the feature representation with the weight vectors corresponding to all classes. The loss decreases as the term in the numerator increases.

Equation (2) is the expansion of the same dot product that leads to equation (3) after applying weight and feature normalization (||W|| = ||f|| = 1) and separating the denominator in two parts, one is the exponential of the correct class (same as numerator) and the other term contains the sum of exponentials of all other classes.

Now a slight modification is made (a margin m is introduced) to the exponential of the dot product of the correct class only (Fig. 5 (2)), to make the loss more stricter [7]. This will force the model to learn even better feature representations.

Finally, a scaling term (s) is multiplied to all the exponentials to get the loss function in Fig. 6 (1). AMS stands for Angular Margin Softmax loss. The authors of PatchNet have used different margins (ml, ms) for Live and Spoof classes which makes the loss function asymmetric, thus the final loss function, Fig. 6 (2), is called Asymmetric Angular Margin Softmax Loss.

Metrics of Success:

As this is Spoof Classification task, so Spoof is considered the Positive Class.

TP: #Spoof samples classified as Spoof.
TN: #Live samples classified as Live.
FP: #Live samples classified as Spoof.
FN: #Spoof samples classified as Live.

Attack Presentation Classification Error Rate (APCER ) = FN / (TP + FN)
Proportion of Attack samples classified as Bonafide.
Bonafide Presentation Classification Error Rate (BPCER ) = FP/(FP + TN)
Proportion of Bonafide samples classified as Attack.
Average Classification Error Rate (ACER) = APCER + BPCER / 2

Results Section:

Fig. 7 Results on Intra-dataset testing (a) SiW dataset [3], (b) OULU-NPU dataset [4]

Results are evaluated under two schemes to validate the generalization ability of the proposed model.

(a) Intra Dataset testing: In this scheme, the model is trained and tested on the same dataset. However, there are several protocols within the dataset themselves to test the robustness of the model like using samples from unseen environment, unseen spoof mediums and unseen capture devices in the test set. PatchNet’s performance is the best or comparable to the best in all the protocols from both SiW and OULU-NLU dataset.

Fig. 8 Results on Cross-Dataset Testing: O(OULU-NLU), C(CASIA-FASD), I(Replay Attack)[5], M(MSU-MFSD)[6]

(a) Cross Dataset testing: In this scheme, the model is trained on one or more dataset but tested on a completely different dataset. This experiment establishes PatchNet’s capability to learn the required generalizable features.

References:

Zhang, Zhiwei, et al. “A face antispoofing database with diverse attacks.” 2012 5th IAPR international conference on Biometrics (ICB). IEEE, 2012.
Wang, Chien-Yi, et al. “PatchNet: A simple face anti-spoofing framework via fine-grained patch recognition.” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022.
Liu, Yaojie, Amin Jourabloo, and Xiaoming Liu. “Learning deep models for face anti-spoofing: Binary or auxiliary supervision.” Proceedings of the IEEE conference on computer vision and pattern recognition. 2018.
Boulkenafet, Zinelabinde, et al. “OULU-NPU: A mobile face presentation attack database with real-world variations.” 2017 12th IEEE international conference on automatic face & gesture recognition (FG 2017). IEEE, 2017.
Chingovska, Ivana, André Anjos, and Sébastien Marcel. “On the effectiveness of local binary patterns in face anti-spoofing.” 2012 BIOSIG-proceedings of the international conference of biometrics special interest group (BIOSIG). IEEE, 2012.
Wen, Di, Hu Han, and Anil K. Jain. “Face spoof detection with image distortion analysis.” IEEE Transactions on Information Forensics and Security 10.4 (2015): 746–761.
Wang, Feng, et al. “Additive margin softmax for face verification.” IEEE Signal Processing Letters 25.7 (2018): 926–930.