Face Mask Detection Using Convolution Neural Network

Vignesh
Team Recon Subsea
Published in
8 min readJun 15, 2020

After spending frantic 60 days at home I was even more anxious to step out to get some essentials from the nearby shop after the government announced few relaxations. I grabbed my mask, shopping bag, and headed straight to the shop only to encounter a nightmare when I saw the shop was crowded just like the pre-pandemic days and I threw a fit when I saw a few people summoned the courage to walk in without a mask. It was, even more, a horrifying scene at the billing counter where the person at the counter was sweating his life asking people without a mask to step out and asking other few to quickly get by with their purchase.
While washing myself off after this daring purchase I did, it occurred to me how great it would have been if there was an automated system that triggered an alarm if someone entered the shop without mask just like the startling shoplifting alarms in a mall. That’s when I decided to use my Machine Learning to come up with a classification model to detect people not wearing a mask.

In this blog, we will be looking at how to train a simple classification model using Keras to classify between masked and unmasked faces.
So let's look at the code step by step. The python notebook can be found on my GitHub: Github Link

I would suggest training the model on Google Colab. Google Colab is a free, browser-based notebook environment that runs entirely in the cloud and it provides free GPU access to speed up the training process. It also saves you from hectic downloads and installations on your local machine. Make sure to change your runtime type to GPU.

The libraries

First I started with importing the required libraries

I imported TensorFlow and used Keras for developing and training the model. Keras is an open-source neural network library written in Python. It is capable of running on top of TensorFlow, Microsoft Cognitive Toolkit, R, Theano, or PlaidML. For more details on Keras check out the documentation. Matplotlib is a plotting library that I used for plotting the accuracy curves of the model.

The DATA

Then I started working with the data required to train the model. I used this dataset available on Kaggle:https://www.kaggle.com/ahmetfurkandemr/mask-datasets-v1 The dataset contains 750 images for training and 350 for validation (Including both the classes).

I directly downloaded the data in Google Colab with the following lines of code. You can get the Kaggle API command for this particular dataset from the above link and your API credentials from your Kaggle account.

I then unzipped the downloaded zip folder.

After unzipping the directory would look something like this. As the name suggests the train directory consists of data for the training purpose and validation directory for testing the accuracy of the trained model.

I then defined the directories and the files.

Next, I worked a bit on data augmentation. Data augmentation is a very useful technique especially when we are not having enough data. Data augmentation generally includes cropping, padding, flipping, rotation, rescaling, etc. It improves the diversity of the data available for training the model without actually collecting new data.

The ImageDataGenerator method in Keras makes data augmentation very easy. We just need to specify its parameters and the directories and it's all done. Note that data augmentation is not used for the validation set since it is used only for testing the model so we don't need to do any such transformations in it.

The ImageDataGenerator class has the flow_from_dataframe() method to read the images from a big Numpy array and folders containing images. The first parameter of the flow_from_directory method is the train/validation directory where the images are stored. The batch size is the number of images to be yielded from the generator per batch. The class mode here is binary because we are dealing with only two classes (mask and no mask). The target size is the size to which all the images would be uniformly resized. You can check the other useful parameters in the official documentation.

The MODEL

After preparing the data for training the next objective is to define the model. I used Convolutional neural networks for this purpose. The name “convolutional neural network” indicates that the network employs a mathematical operation called convolution. Convolution is a specialized kind of linear operation. Convolutional networks are simply neural networks that use convolution in place of general matrix multiplication in at least one of its layers. The details and the theory behind CNN are beyond the scope of this blog.

This is the basic CNN model that I defined. Let's go through it line by line.

tf.keras.models.Sequential is used to define the sequential model. A Sequential model is appropriate for a plain stack of layers where each layer has exactly one input tensor and one output tensor. A sequential model is defined by passing a list of layers in it.

tf.keras.layers.Conv2D is for defining the convolutional layer. The first parameter is the number of filters in the convolutional layer. Then comes the shape of the filter( generally 3x3 or 5x5). The next argument is the activation function. I have used relu activation function. The input shape is (150,150,3) which is the size of the images after being resized by the ImageDataGenerator. Note that the input size needs to be mentioned only in the first convolutional layer of the Sequential model. The input shape for the rest of the layers is calculated on its own.

tf.keras.layers.MaxPooling2D: A pooling layer is another building block of a CNN. Its function is to progressively reduce the spatial size of the representation to reduce the number of parameters and computation in the network. It downsamples the input representation by taking the maximum value over the window defined by pool_size for each dimension along the features axis. We have defined the pool_size to be (2,2) here. By default, the window is moved by a stride of 1.

I have used 3 convolutional layers here followed by pooling layers.

tf.keras.layers.Dropout: The Dropout layer randomly sets input units to 0 with a frequency of specified rate at each step during training time, which helps prevent overfitting. Here the rate is set to 0.5.

tf.keras.layers.Flatten: This layer is used to flatten the input so that it can be fed into the dense layer.

Flattening A matrix

tf.keras.layers.Dense: The dense layer is the regular deeply connected neural network layer. The first parameter we have passed here is the number of units in the dense layer and the second one is the activation function(ReLu in this case).In the second dense layer, you may observe that the number of units is one. This is because this is the final output of the neural network which is the probability of the image belonging to one class. Also, the activation function here is a sigmoid function that is used for binary classification. For multi-class classification, the softmax activation function is used.

model.summary() : This will return the summary of the model defined with the output shape of each layer and number of parameters as shown below.

The next task is to compile the model. I used binary_cross_entropy as the loss function since it is a binary classification. The optimizer selected is RMSProp with the learning rate set to 0.001. You can also check out with different optimizers such as adam or sgd and check out the performance of different optimizers. I wanted to keep a check on the accuracy of the model so I set the metrics to be ‘accuracy’ for model compilation.

I then fitted the model to let it train for 100 epochs. I passed the train_generator and validation generator that we created in the pre preprocessing stage which will be our training and validation data. Steps per epoch is the training data size divided by the batch size(750/32=23(approx)). And the validation steps is the validation data size divided by the validation batch size(350/32=10(approx)). Verbose can be set to 0, 1, or 2 depending on how much information we want to see during the training process.

verbose=0 will show you nothing (silent)

verbose=1 will show you an animated progress bar like this:

verbose=2 will just mention the number of epoch like this:

The model took about an hour to finish its training. The output during the training process was something like this.

You can see that the training, as well as validation accuracy, increases after every epoch which shows that the training was heading in the right direction.

After 100 epochs I achieved around 98% accuracy on the training set and 97% accuracy on the validation set.

Testing

I then plotted the training and validation accuracy of the model after each epoch using matplotlib.

The curves look something like this. The x-axis represents the number of epochs and the y-axis represents the corresponding accuracy. The blue curve represents the training accuracy and the orange represents the Validation accuracy.

I then tested the model on new images using the model.predict method. In Google Colab the following code will let you upload multiple images and see the results.

What's next?

This is a very basic model and further improvements can also be done to improve accuracy. Also, I would suggest you to try tweaking the parameters in the model such as the number of layers, number of units, activation functions, etc. and check the performance of the model.

I also tried developing a flask web app and accessing live camera feed to detect masked faces. Also, this could be integrated with Internet of things to create an automatic gate or alert system in public places such as offices, shops, etc.

--

--