FM2u-Net: Face Morphological Multi-branch Network for Makeup-invariant Face Verification
Wang, W., Fu, Y., Qian, X., Jiang, Y., Tian, Q., & Xue, X. (2020). FM2u-Net: Face Morphological Multi-Branch Network for Makeup-Invariant Face Verification. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 5729–5739.
The huge visual contrast of the same identity can be observed with and without facial cosmetics. Heavy facial makeup, especially in the eyes and mouth region, can significantly change facial characteristics making it challenging in learning a makeup-invariant face verification model.
Some previous works synthesize the non-makeup faces by makeup removal model rather than directly learning makeup-invariant facial representation. This paper highlighted three main challenges:
- insufficient paired makeup/non-makeup faces
- lack of makeup faces with diverse facial regions
- huge visual differences caused by cosmetics (especially in the eyes and mouth regions)
To address the aforementioned challenges, this paper proposed FM2u-Net that composed of Face Morphology Network (FM-Net) and Attention-based Multi-branch Network (AttM-Net).
FM-Net (Challenges 1 and 2)
This work focuses on local parts of the face such as the eyes and mouth because these regions are often covered by heavy cosmetics. It stacks two autoencoders to generate high-quality faces covered by diverse cosmetics.
AttM-Net (Challenges 3)
AttM-Net contains four subnetworks that learn the features of the whole face and three facial regions (two eyes and one mouth).
- proposed end-to-end face morphological multi-branch network(FM2u-Net) which can simultaneously synthesize diverse makeup faces using FM-Net and effectively learn cosmetics robust face representation through attention-based multi-branch learning network (AttM-Net).
- FM-Net contains two stacked weight-sharing autoencoders to synthesize realistic makeup faces.
- AttM-Net contains three local and one global face representation to capture detailed information.
- bring new datasets to the community, which are rephrased from the several existing dataset.
Face Morphological Multi-branch Network
Face Morphology Network
It is observed that three local facial regions (two eyes and one mouth) are usually covered by cosmetics. Thus, it motivates the author to synthesize abundant, diverse, and realistic makeup faces by transferring makeup patches between two similar faces. The goal is to keep the identity information of most facial regions while increasing the diversity of makeup information by introducing the local patches with cosmetics. It uses the set of original images and facial patches as supervision information and employs cycle consistent loss for the generating process.
FM-Net stacks two weight-sharing autoencoders. The flow of FM-Net is as follows:
- compute most similar top-K facial images by comparing similarities between features extracted from the facial recognition model.
- given an input image and its corresponding top-K image and their patch locations, the first autoencoder learns the mapping as a face morphology operation.
- synthesis result generated from original images by swapping the cosmetics between patches.
- motivates by cycle consistent loss, the original images are constructed using the second autoencoder via the same projection.
The intermediate generated images are synthetic faces with one facial makeup transferred. It can enlarge the training data and provide more diverse makeup changes.
Attention-based Multi-branch Network
Cosmetics can degrade the performance of makeup invariant face verification. This motivates the author to learn cosmetics-robust identity features on the local patches that are usually covered by cosmetics. They proposed AttM-Net which consists of an attention-based multi-branch recognition module (AttM-RM) and feature fusion module (AttM-FM).
AttM-RM contains four networks to extract one global and three local (two eyes and one mouth) features. AttM-FM fuses these features together under the guidance learned from the global one to enable dynamically weight contributions of four features to the final decision.
This work was evaluated using four datasets: 1) M-501, 2) M-203, 3) FAM, 4) Extended makeup dataset from the internet, and two main tasks: 1) makeup face verification task, 2) general face verification task.
The table below shows the result of makeup face recognition datasets.
From the table above, we can observe that the FM2u-Net shows remarkable verification accuracy improvement indicating that the AttM-Net is a more effective way to learn makeup invariant face features.
Compared to the state-of-the-art methods, FM2u-Net significantly outperforms BLAN (uses GAN to remove makeup). FM2u-Net enhances the makeup training by swapping local regions and letting the network generate discriminative features by learning from the parts that are often covered by heavy cosmetics.
The table below shows the results of general face recognition datasets.
They compare the FM2u-Net with some mainstream models for general face recognition. FM2u-Net can significantly outperform all existing methods. The result also shows the generalization capability and usefulness of this work for real-world applications.