Building a Covid19 mask detector with OpenCV, Keras and Tensorflow

Fernando Contreras
Aug 2, 2020 · 9 min read

During COVID19 quarantine I decided to build my own implementation of a mask-detector able to detect whether a person is wearing a mask in images or videos just for fun.

Like every Machine Learning project, the very first step is to collect the necessary data. As we’re trying to build a mask detector model which should be able to output “mask” or “no mask”, given an input face image, so we need to collect images of people wearing and not wearing a mask.

Collecting the data

I just came to ask all my friends to send me a selfie where they were wearing a mask and another where they were not wearing it. I was able to collect around 200 images which seems to be very poor for training an accurate Machine Learning model, however the results were quite acceptable.

Structuring the solution

To build a mask detector let’s first split the problem into 2 main steps:

1. Given an input image we need to detect faces on it, this is a task called “Object Detection” in the world of Computer Vision. Object Detection is the task of detecting object positions and their types over images as the example below:

In our problem we need to detect only faces and output their bounding boxes delimiting their positions, so we can pass them to the next step:

2. Given one or more face images, we need to classify them into “mask” or “no mask”. In the Machine Learning vocabulary this is called “binary classification”, where we need to classify some input data into 2 possible classes (in this case [“mask”, “no mask”]).
Our input data will be the RGB image representation of human faces.

So, given the 2 steps mentioned before, here we’re building a pipeline of processing, the first step takes an input image and outputs the bounding boxes of human faces found in that image, and the second step takes that cropped face images delimited by the bounding boxes and classifies them into “mask” or “nomask”.

Let’s start by talking about the second step: “The classification problem” as it is the main focus of this article.

Key Concepts

Transfer Learning: Transfer learning is a research problem in machine learning that focuses on storing knowledge gained while solving one problem and applying it to a different but related problem. For example, knowledge gained while learning to recognize cars could apply when trying to recognize trucks. Wikipedia

Data Augmentation: Data augmentation is a strategy that enables practitioners to significantly increase the diversity of data available for training models, without actually collecting new data. Data augmentation techniques such as cropping, padding, and horizontal flipping are commonly used to train large neural networks

Preprocessing Face Images

In order to build the mask detector model which takes faces as input and detects masks i needed to crop the faces from the collected images. Am not copying/pasting the code here to avoid making this article too large, but you can find the code here: labelling_images.ipynb

Training the model

Let’s start by coding step by step the model training algorithm. I used Python and Jupyter notebooks in a Google Colab environment with GPU, but you can also run the code in whatever python environment you prefer.

The training notebook can be found here in my Github repository if you prefer to see the entire code directly:

Let’s import some dependencies:

I’m using Google drive to store the training images, but feel free to use your local machine if you want to run your code locally.
Let’s mount the Google drive storage into the notebook and set the path to the collected images to base_path and get a directory info object with pathlib as follows:

In my google drive the images are stored with the following folder structure:
A folder named “mask” containing all the images with masks
A folder named “nomask” containing all the images without masks

let’s check that we are loading the images in the right path:

this should print: [‘nomask’ ‘mask’]

let’s define some constants and create the tensorflow image generator that will load images and feed them to the model training process:

Let’s show the images being loaded by the training data generator:

Let’s define some auxiliar methods that will be useful for data preprocessing:

Let’s load now all the images from the storage, including the test images that we will use to evaluate the model:

Let’s now apply the preprocessing functions defined before to the loaded images:

Let’s define an ImageDataGenerator, this will define a generator class that will perform Data Augmentation over the loaded images, this will allow us to train the model over a bigger distribution of data, it performs some operations over the images like: zoom, rotation, horizontal flip, etc.

Data augmentation is useful in many cases, when we don’t have enough training data or in the cases when the model is overfitting the training dataset. Intuitively we’re training a model to predict whether people are wearing masks, and our training data containing faces might be augmented by applying transformations to every face picture. In fact these transformations don’t modify the resulting class (“mask”, “nomask”), so let’s go for it.

Let’s generate a batch of images containing all the collected/training and validation images and fit the image generator with them.

Now we can start building our Deep Learning model.
As we mentioned before, we’re using Transfer Learning, which is the task of using a pre-trained model as part of our final model, this allows us to take advantage of the parameters learned by a general purpose computer vision model to build our model adapted to our requirements.

In the code block below we’re loading the MobileNET V2, you can find the research paper here if you want to know deeper details about the network architecture:

We’re setting the model propery: “traianble” to False, because we don’t want to retrain that model.

So now that we have the base model let’s complete it by adding some layers that we’ll need for our prediction outputs:

You should see the model summary as follows:

We are adding to the base model’s output a Global Average Pooling layer and a Dense layer with a “softmax” activation function, please, find more details about this layers in the oficial Keras documentation:

The model’s output is “(None, 2)” where: “None” represents the batch size which might vary, and “2" is the size of the softmax layer, corresponding to the number of classes. A softmax layer outputs a probability distribution of the possible output classes.

Now that we have the model, let’s proceed to the training. Let’s iterate through the training images 50 times to generate the images that we will transform afterwards with our DataAugmentation generator:

Now we can compile and fit our model:
We’re using here “Adam” as the optimisation algorithm during training, we are training the model for 10 epochs and we’re using early stopping which will stop the training process if the accuracy doesn’t get higher through 6 training iterations.

At the end of the training i had the following results:
Epoch 10/10 98/97 [==============================] — 46s 466ms/step — loss: 0.0018 — accuracy: 1.0000 — val_loss: 6.2493e-04 — val_accuracy: 1.0000

Accuracy: 1.00 means the model predicted the test dataset with 100% of accuracy. It doesn’t mean though that the model performs the same in every data set. Remember that for this experiment i’am using only near of 200 images during training, which isn’t enough for the most of Machine Learning problems.

In order to build a model that performs well and generalizes correctly in most of the cases we would need maybe thousands of images. However for the purpose of a proof of concept, it’s more than enough and the model seems to work very well on several people i used to test in real time using the webcam video stream.

Now that we have the trained model let’s save it and see how we can use it:

Now you can load the saved model from another python program like this:

Now that we have a mask detector model, we need the first part of our pipeline: “a face detector”. Object detection is one of the main tasks of Computer Vision. You can find a lot of pretrained models out there for object detection with sometimes several thousands of different classes. Here i used MTCNN which stands for “Multi Task Convolutional Neural Network”. You can find the github repository here:

let’s import MTCNN and create an instance of face detector:

Let’s load an image with opencv for testing:

Let’s run the face detector on the image:

‘box’: [463, 187, 357, 449],
‘confidence’: 0.9995754361152649,
‘keypoints’: {
‘left_eye’: (589, 346),
‘right_eye’: (750, 357),
‘nose’: (678, 442),
‘mouth_left’: (597, 525),
‘mouth_right’: (733, 537)

We can see that the face is detected and we have all the relevant information like bounding box, and position of points of interest. In this case we only need the bounding box, which will help us to crop the image delimiting the face.

And now let’s see how the model performs with the french president:
Am using some auxiliary functions for cropping and drawing faces that you can find in

And voilà: the model says the french president Macron is wearing a mask.

You can try it yourself with you own images, you can find the whole code and the trained model saved in Github:

You can run it in real time using opencv and your webcam, for details on how to run the program, please find instructions in

That’s all for this tutorial, remember this is just an experiment, it is not intended to be used in a real life environment because of its limitations. One important limitation is the fact that the face detector in many cases fails to detect masked faces, so it breaks the first step of the pipeline and it will fail to work as intended.

I hope you enjoyed reading.

Keep calm and wear a mask to help stop #covid19

Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data…

Sign up for Analytics Vidhya News Bytes

By Analytics Vidhya

Latest news from Analytics Vidhya on our Hackathons and some of our best articles! Take a look.

By signing up, you will create a Medium account if you don’t already have one. Review our Privacy Policy for more information about our privacy practices.

Check your inbox
Medium sent you an email at to complete your subscription.

Fernando Contreras

Written by

Software Engineer & Machine Learning Engineer.

Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data Science professionals. We are building the next-gen data science ecosystem

Fernando Contreras

Written by

Software Engineer & Machine Learning Engineer.

Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data Science professionals. We are building the next-gen data science ecosystem

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store