ArcFace: Additive Angular Margin Loss for Deep Face Recognition

Retrieved from http://arxiv.org/abs/1801.07698

Mostafa Gazar
1-minute papers
2 min readJan 31, 2019

--

Summary

The main goal of this paper is to maximise face class separability by introducing a new loss function that is highly discriminative to features for face recognition.

According to the writers of this paper, their method showed the best results compared to other loss functions that are good with face recognition like triplet loss, intra-loss and inter-loss. I only compared ArcFace loss with Softmax loss and the improvement was quite noticeable.

Softmax loss has some drawbacks, one of them is that it does not explicitly optimize the feature embedding to enforce higher similarity for intra-class samples and diversity for inter-class samples, which results in a performance gap for deep face recognition under large intra-class appearance variations like pose variations and age gaps.

Using a loss function related to the data structure we want to classify can have a huge positive effect in improving the overall model performance. Dataset is also quite important, using dataset like VGGFace2 which has good variations in terms of subject poses, age, etc… can deliver superior results compared to used a biased bigger dataset like MS1M.

Strengths

ArcFace is easy to implement, does not require much extra computational overhead and able to converge quickly.

Weaknesses

It would have been interesting if they experimented with pretrained model weights to explore if changing the head of that model and fine-tuning it, can achieve similar results. Also explore the impact of the dataset that model was originally trained on.

Notes

Despite the numerical similarity between ArcFace, CosFace and SphereFace, ArcFace has a better geometric attribute as the angular margin has the exact correspondence to the geodesic distance.

Code

Source code released under MIT license and implemented using MXNet, https://github.com/deepinsight/insightface.

You can also find reimplementations in TensorFlow, PyTorch and Caffe.

--

--