Face Mask Detection using Google Colab

Shibam Banerjee

Published in

Analytics Vidhya

5 min readJun 13, 2020

Bird’s-eye view of the project:

Business Problem: Description of the problem with respect to Business
Problem Statement: Description of the problem with respect to Machine Learning.
Source of the Data.
First Cut Approach: My take on the Problem.
Conclusion.
References.

Business Problem:

In this new era where we experiencing a pandemic and people all around are advised to wear masks, some people are not used to it and are avoiding to wear masks. The motivation behind this projects is that if we can take help of AI to detect people wearing or not wearing masks in public places, it would be helpful to increase our safety. If deployed correctly, the mask detector could potentially be used to help ensure our safety.

Also, it is very depressing to be alive in this period, to witness so much happening in this world, I decided why not do something out of it i.e convert a real world problem in which we humans have to wear masks to go out, into a Machine Learning problem.

Problem Statement:

The task here is to predict people wearing masks or not wearing them, given an image or a video. It is an object detection and classification problem with 2 different classes(Mask and Without Mask).

There are numerous methods for Object Detection but, for this I have decided to use YOLO v3 as it is simple fast and one of the most accurate method out there.

Source of the Data:

So there are multiple sources of data that can be used for this problem. You can either download images from the Internet and annotate them or you can simply get a prepared data that can be easily used to train.

I have used an existing dataset that is available in Kaggle.

Link: https://www.kaggle.com/alexandralorenzo/maskdetection

After collecting the data place all the images and its annotations in YOLO format as below and name the folder ‘obj’.

The annotations must contain the class and the box co-ordinates in YOLO format. <class> <x> <y> <w> <h>

YOLO v3:

Credits: https://pjreddie.com/darknet/yolo/

YOLO v3 is an object detection network part of the YOLO family (YOLO v1, YOLO v2). It is a fairly simple object detection approach with very good performance. It takes images as input, pass it through neural network and get a vector of bounding boxes and class predictions as output. YOLO v3 uses DarkNet-53 to make features detection followed by convolutional layers. Darknet-53 is a 53 layers Convolutional Neural network trained on ImageNet mainly compose of 3 × 3 and 1× 1 filters with skip connections like the residual network in ResNet.

For more information on the Input format, Output format or the architecture please go through this link.

First Cut Approach:

NOTE: If you are using Colab then link your Google Drive as it will be easy for you to store the backup weights and also manage all the necessary files.

Since I have used YOLO v3, the first thing is to clone the Darknet Github repository as we will be using it to train our model.
Download the YOLO v3 config file and edit it based on our dataset. Set all the ‘classes’ value to 2 and ‘random’ to 0 in the Yolo blocks. Also change the value of ‘filters’ in the Convolution blocks just above the Yolo blocks to (Number of classes + 5)*3 which in our case is 21.

3. Also change the ‘max_batches’ value to 2000*Number of classes, which in our case is 4000 and the ‘steps’ to 80% and 90% of the ‘max_batches’ value respectively. You can get a copy of the config file in my Github.

4. Create a file named ‘obj.names’ that will contain the names of labels of the data. In our case we have two labels — Mask and Without Mask.

5. Create another file named ‘obj.data’ that will contain the below information. The train and valid are the files containing the location of the train and validation images.

6. Create 2 txt files named ‘train’ and ‘valid’ based on your data, which will have the location of the images and must look like the below image.

The above code can be used to create the txt files.

7. After that place the files and the data in the Darknet directory and perform the training. For more information on the exact location and commands please visit my Github.

8. Perform the Training for 1000+ epochs. A loss value below 1 will perform significantly well.

9. Once you have achieved a low loss value, test it on some images or videos.

Results:

You can try it on various Images.

We can see that the loss falls drastically just after about 300+ epochs. I had trained for more than 1000 epochs but the plot shows upto about 700 epochs, as I had to retrain using my backup weights.

Conclusion:

The model did fairly well with such a small dataset and only 1000 epochs of training.
The model can be improved by using a larger dataset with variations in types of images. The dataset used to train in this implementation is very simple and most of the images contain a single person. In production scenario, large datasets with variations in images is a must.

Hope you have enjoyed this article!