Mostafa Gazar
Jan 31 · 2 min read


The main goal of this paper is to maximise face class separability by introducing a new loss function that is highly discriminative to features for face recognition.

According to the writers of this paper, their method showed the best results compared to other loss functions that are good with face recognition like triplet loss, intra-loss and inter-loss. I only compared ArcFace loss with Softmax loss and the improvement was quite noticeable.

Softmax loss has some drawbacks, one of them is that it does not explicitly optimize the feature embedding to enforce higher similarity for intra-class samples and diversity for inter-class samples, which results in a performance gap for deep face recognition under large intra-class appearance variations like pose variations and age gaps.

Using a loss function related to the data structure we want to classify can have a huge positive effect in improving the overall model performance. Dataset is also quite important, using dataset like VGGFace2 which has good variations in terms of subject poses, age, etc… can deliver superior results compared to used a biased bigger dataset like MS1M.


ArcFace is easy to implement, does not require much extra computational overhead and able to converge quickly.


It would have been interesting if they experimented with pretrained model weights to explore if changing the head of that model and fine-tuning it, can achieve similar results. Also explore the impact of the dataset that model was originally trained on.


Despite the numerical similarity between ArcFace, CosFace and SphereFace, ArcFace has a better geometric attribute as the angular margin has the exact correspondence to the geodesic distance.


Source code released under MIT license and implemented using MXNet,

You can also find reimplementations in TensorFlow, PyTorch and Caffe.

1-minute papers

Random deep learning papers summary.

Mostafa Gazar

Written by

An Android Pro, built million-downloads app. Y-Combinator alumni. I write about AI and Android

1-minute papers

Random deep learning papers summary.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade