How I made a mask detection algorithm

Galileo Parise

Published in

Analytics Vidhya

4 min readDec 11, 2020

Turning a simple classifier into an object detection model, while practicing deep learning in my quarantine

During the spread of coronavirus, Italy came into the second lockdown in autumn, and that gave me lots of free time.

I wanted to create a system able to detect if people are wearing masks both in an image both in a live video stream. I just wanted to challenge myself a bit with some computer vision tasks, but this system could be easily extended for real-life security purposes (e.g. shops, airports, streets, etc.).

The problem

What I just described is an object detection task; so ideally one would have used a large dataset labeled not only with “with_mask”/”without_mask” but also with the face position. Following this idea, I encountered the first problem: there were no such datasets. All that I found was a dataset consisting of roughly 7600 images: 50% “with mask”, 50% “without mask”, thanks to this Kaggle dataset; another approach could have been of considering a faces dataset and then artificially adding masks to people’s faces.
Furthermore, real-time detection adds another difficulty: the model has to be pretty fast, processing images on-the-fly as the stream arrives. The ideal architecture would have been MobilenetSSD or similar end-to-end networks, but the absence of large datasets annotated with labels and positions knocks them out.

In this scenario, I had two possible solutions:

consider anchor boxes with different shapes and perform sliding windows at multiple scales, looking for masked/not masked faces and then apply non-maximum-suppression;
consider a pre-existent face detection algorithm and then perform classification over the selected box from the first algorithm.

I decided to go for the second choice, because it is a better solution in computational terms, which was crucial for real-time detection. In fact, the first choice could process even thousands of sub-images for each frame!

So, once I got the dataset, I decided to go for MobilenetV2 in order to make binary classification (“with mask” - ”without mask”) on those sub-images; this model is pretty accurate and less complex than other models: the exact mix I was looking for!

In this way I was able to turn a simple classifier into an object detection model: I could even choose a KNN, Random Forest, or another classical statistical classifier.

Model training

Keras provides weights for the original MobilenetV2 model trained on Imagenet, so I added an average pooling and two more dense layers (with dropout) on top of that to achieve my goal, obtaining the architecture depicted in the figure.

The architecture of the original Mobilenet with few additional layers.

To enhance the power of the model, I’ve added an image augmentation generator with the ImageDataGenerator class provided by Keras, both in the training set (70% of the total images), and in the validation set (20%), with the following code:

from tensorflow.keras.preprocessing.image import ImageDataGeneratoraug = ImageDataGenerator(rotation_range=20,
                         zoom_range=0.15,
                         width_shift_range=0.2,
                         height_shift_range=0.2,
                         shear_range=0.15,
                         horizontal_flip=True,
                         fill_mode="nearest",
                         validation_split=0.2)# data augmentation on training set
train_generator = aug.flow(x_train, y_train, batch_size=BS, subset="training")# data augmentation on validation set
val_generator = aug.flow(x_train, y_train, batch_size=BS, subset="validation")

The model outstood me with surprisingly good results. Adding early stopping I prevented the model from overfitting the training data letting the model takes as epochs as it needs, and that gave me these results on the test set (10%):

Metrics of the models evaluated on the test set

And this learning curve:

Accuracy and loss of the model during training

For the face detection part, I tried both CascadeClassifier, a CV2 algorithm, both the MTCNN library, and they all gave me good results; personally, I went for the first, but even the second one should have been a great choice. CascadeClassifier detects faces and their location, so I could extract a portion of the original image for every face and pass it to MobilenetV2 as an independent image so that it can be classified.

Flask web application

To make all that I have done more accessible, I built a simple web app developed with Flask, a useful Python framework, adding some jQuery and Ajax magics.
It consists of two pages: on the home page, you can give a try to the real-time detection; on the second one, you can upload a custom image on which performing mask detection. Real-time streaming obviously needs to access your camera, so the system can process the images frame by frame and draw a bounding box around every faces (multiple detections are supported).

The real-time processing part was not that easy, but I followed a great resource: if you are interested in web applications, I suggest you take a look at Miguel Grinbergs’s site.

This is how I developed a real-time face mask detector: if you want to have a deeper look at the code, here is the link to Github’s repository.

Enjoy it and wear a mask!

How I made a mask detection algorithm

The problem

Model training

Flask web application

Written by Galileo Parise