RetinaFace: A Face Detection Model for High Resolution Images
This is an introduction to「RetinaFace」, a machine learning model that can be used with ailia SDK. You can easily use this model to create AI applications using ailia SDK as well as many other ready-to-use ailia MODELS.
Overview
RetinaFace is a high-precision face detection model released in May 2019, developed by the Imperial College London in collaboration with InsightFace, well-known for its face recognition library.
The model computes the bounding boxes of faces as well as keypoints for eyes and mouth. It also works flawlessly on high-resolution images without resizing and performs hierarchical detection processes, allowing for the robust detection of small faces within the image.
Architecture
RetinaFace enables the detection of small faces through hierarchical processing using a feature pyramid. It uses ResNet50 as its backbone, supplying feature vectors from multiple layers of ResNet50 to the detection stage.
The training was performed on the dataset Wider Face with the addition of 5-point facial landmarks.
The input images are processed by subtracting the mean RGB values (104, 117, 123)
from the 0–255 range before being supplied to the AI model. The output consists of three components: loc = (1, 16800, 4)
, conf = (1, 16800, 2)
, and landms = (1, 16800, 10)
. The value 16800 represents the number of anchors, which varies based on the input resolution.
The shape of the PriorBox (aka. anchor box), which serves as the anchor, is (16800, 4)
, storing the center points (cx, cy)
and sizes (cw, ch)
. These values can be uniquely determined from the input image size, allowing RetinaFace to be executed on images of any size and to process high-resolution images directly. The PriorBox is structured into three layers, totaling 12800 + 3200 + 800 = 16800
for the three layers.
[[0.00195312 0.00347826 0.0078125 0.01391304]
[0.00195312 0.00347826 0.015625 0.02782609]
[0.00585938 0.00347826 0.0078125 0.01391304]
...
[0.9765625 0.98782609 0.25 0.44521739]
[0.9921875 0.98782609 0.125 0.2226087 ]
[0.9921875 0.98782609 0.25 0.44521739]]
To calculate the coordinates of the bounding box, the model output loc
is used, which contains x, y, w, h
. By adding the scaled versions of the model output loc
's x, y
(multiplied by the variance of 0.1) to the anchors' priors' cx, cy
, and then scaling the anchors' priors' cw, ch
by multiplying them by the exponential of loc
's w, h
(scaled by the variance of 0.2), you get the bounding box's cx, cy, w, h
. Thus, by adjusting the coordinates of the anchors' priors with the model output loc
, the bounding box can be computed.
boxes = np.concatenate((
priors[:, :2] + loc[:, :2] * variances[0] * priors[:, 2:],
priors[:, 2:] * np.exp(loc[:, 2:] * variances[1])), axis = 1)
boxes[:, :2] -= boxes[:, 2:] / 2
boxes[:, 2:] += boxes[:, :2]
return boxes
To calculate the landmarks, the model output landms
, which contains x and y coordinates, is used and a similar computation logic is applied.
landms = np.concatenate((priors[:, :2] + pre[:, :2] * variances[0] * priors[:, 2:],
priors[:, :2] + pre[:, 2:4] * variances[0] * priors[:, 2:],
priors[:, :2] + pre[:, 4:6] * variances[0] * priors[:, 2:],
priors[:, :2] + pre[:, 6:8] * variances[0] * priors[:, 2:],
priors[:, :2] + pre[:, 8:10] * variances[0] * priors[:, 2:],
), axis = 1)
return landms
Finally, the bounding boxes are filtered based on conf.
Precision
RetinaFace has achieved an mAP (mean Average Precision) of 52.318 in face detection. It is worth noting that the backbone used in the numerical evaluation within the paper is ResNet151.
Application to face authentification
ArcFace is widely used as a face authentication algorithm but it does not define a method for face detection. By introducing RetinaFace as preprocessing and using aligned images for training and inference, it has been demonstrated that face authentication accuracy can be improved from 98.37% to 99.49%.
Usage
RetinaFace can be used with ailia SDK with the following command. By default, it uses ResNet50 as the backbone. Since RetinaFace uses input images without resizing, the processing time increases with the resolution of the image.
$ python3 retinaface.py --input input.jpg --savepath output.jpg
The backbone can be changed to MobileNet using the arch
option. This can speed up the processing for high-resolution images.
$ python3 retinaface.py --input input.jpg --savepath output.jpg --arch mobile0.25
ax Inc. has developed ailia SDK, which enables cross-platform, GPU-based rapid inference.
ax Inc. provides a wide range of services from consulting and model creation, to the development of AI-based applications and SDKs. Feel free to contact us for any inquiry.