Mask Detection Using Deep Learning

Harsh Sharma
Oct 16 · 8 min read

Please Wear a Mask!

Hello readers, Just like my previous article, this one is also related to our current dire situation of COVID 19. As the title indicates, I will be going to explain about how you can build a mask detection system on a video feed by using Deep Learning. Basically, you will be able to detect if someone is wearing a mask or not and can use it further to generate a trigger.

This system can be used in a workplace to monitor if the employee is wearing a mask or not. It can also be used in shopping malls, stations etc to make announcements from time to time to point out people not following the mask rule.

Image for post
Image for post

Our final product of detecting mask in a frame will include two major steps. One, Detecting faces in a frame and Two, Classifying the detected face, if it has mask on or not.

To do the face detection we will use an architecture called RetinaFace which is the state of the art model for detecting face in a picture and to further classify each face into mask or no mask we will be using a ResNet architecture. As I believe that if you know about what you are using you will be more comfortable in using that. So, I will explain about these two architectures first and then will discuss about it’s implementation and provide you the code.

Here I’ll be explaining about RetinaFace architecture and in my next article, I will explain about ResNet architecture and will discuss in detail about how to combine and implement these two models using Python.

RetinaFace

Architecture

Image for post
Image for post
Taken from RetinaNet Paper

From here on I’ll assume that you know about FPN. So, here they are using the output of different layers of Resnet which have different receptive fields and which make it possible for detecting different size of faces. Instead of just using the output of layers to locate and shift the box for faces, they have included one more layer of computation on top of each output, which they are calling as Context Module.

Context Module

One more important thing that they are using in the architecture is mesh decoder. This part of the architecture is pretty complex to dive deep into, but I will give an overview of what it ultimately does.

Mesh Decoder

This type of architecture and loss function incorporates the information about 3D structure of a face in an architecture which is very important as we want the model to locate the face in a given image.

Now, we have an understanding of the architecture. Next, we will look into the loss function which is arguably the second most important part of any neural network.

Loss Function

Image for post
Image for post
Fig : 1
  1. Classification loss : It is the Face classification loss Lcls(pi , p* i ), where pi is the predicted probability of anchor i being a face and p*i is 1 for the positive anchor and 0 for the negative anchor. The classification loss Lcls is the softmax loss for binary classes (face/not face).
  2. Box Regression Loss : It is given by Lbox(ti , t* i ), where ti = {tx, ty, tw, th}i and t*i = {t*x , t*y , t*w, t*h }i represent the coordinates of the predicted box and ground-truth box associated with the positive anchor respectively. It is a smooth-L1 loss.
  3. Facial Landmark Regression Loss : Along with the shape and location information of the boxes around faces, our model is also generating 5 landmarks of the face (left eye, right eye,left lip,right lip and nose). These predicted landmarks are then compared with the actual annotated landmarks for each face using smooth-L1 loss (Lpts).
  4. Dense Regression Loss : This is the loss from mesh decoder that we discussed above which incorporates the 3D information of the face in the model by taking pixel-wise difference of the rendered output. The actual function to calculate this loss is :
Image for post
Image for post
Taken from paper

In Fig : 1 which showed the combined loss function we can see that there some lambda(λ) parameters. These are called loss balancing parameters which ensures how much of what loss we want to include.

So, Now we have all the pieces to combine together. We have an architecture which spits out some predictions which are then compared against the true labels for an image using the loss function explained above. This loss function is then optimized using a variant of SGD to train the whole network to finally predict the box around faces along with 5 landmarks on the faces. You can see the result of RetinaFace on the image below.

Image for post
Image for post

Summary

One more loss which is calculated and added to the final loss is Dense Regression loss which helps to include the shape and texture information of the face in the model. All these losses are then combined using loss balancing parameters. Then the error is backpropogated and the weights are trained.

Combining all the losses ensures that different kind of information is fed into the model to learn, after it adjusts its parameters.

We now understand about how RetinaFace works. So to do the mask detection we will use RetinaFace to extract all the faces and then use a ResNet architecture for classification of the detected face in two classes i.e mask/no_mask. In my next article I will explain a bit about ResNet architecture that we use almost everywhere in computer vision tasks. I will then explain how to implement all this using code, for an end to end system.

Till then, STAY SAFE! WEAR MASKS!

References

Additional Links

  1. How to Track People Using Deep Learning

Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data…

Sign up for Analytics Vidhya News Bytes

By Analytics Vidhya

Latest news from Analytics Vidhya on our Hackathons and some of our best articles! Take a look

By signing up, you will create a Medium account if you don’t already have one. Review our Privacy Policy for more information about our privacy practices.

Check your inbox
Medium sent you an email at to complete your subscription.

Harsh Sharma

Written by

Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data Science professionals. We are building the next-gen data science ecosystem https://www.analyticsvidhya.com

Harsh Sharma

Written by

Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data Science professionals. We are building the next-gen data science ecosystem https://www.analyticsvidhya.com

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app