Implement your own Mask RCNN model

Published in

Analytics Vidhya

7 min readJun 26, 2021

In this post, I present a step-by-step guide to implement and deploy your own Mask RCNN model. I referred to a lot of blogs online when I created my own model for deployment, few blogs used images annotated with bounding boxes and single class classification, some used bounding box annotated images and multiple class classification, and others used polygon annotations with single-class classification. This post will provide the code and its explanation for all these scenarios. The flow of the post will be as follows:

Introduction to Mask RCNN Model
About my Mask RCNN Model
Step 1: Data collection and cleaning
Step 2: Image Annotation
Step 3: Download requirements
Step 4 a: Model Training (bounding box annotation and single class classification)
Step 4 b: Model Training (bounding box annotation and multiple class classifications)
Step 4 c: Model Training (polygon annotation and multiple class classifications)
Image Augmentation
Step 5: Model Evaluation
Step 6: Single Image Prediction
Step 7: Website Deployment using Flask Locally

Mask RCNN model has 63,749,552 total parameters, 63,638,064 trainable parameters, and 111,488 non-trainable parameters. That’s a lot of parameters, don’t worry you won’t have to understand everything about the Mask RCNN model to implement it, that being said I would highly recommend you to go through the Mask RCNN paper here. To implement the model I had used matterport Mask R-CNN.

Introduction to Mask RCNN Model

Mask RCNN is a Deep Learning model for image segmentation tasks. I visualize the Mask RCNN model as follows:

Backbone Network — implemented as ResNet 101 and Feature Pyramid Network (FPN), this network extracts the initial feature maps which are forward propagated to other components.

Region Proposal Network(RPN)—is used to extract Region of Interest(ROI) from images and Non-max suppression is applied to select the most appropriate bounding boxes or ROI generated from RPN.

ROI Align — wraps Region of Interest(ROI) into fixed dimensions.

Fully Connected Layers — consists of two parallel layers, one uses softmax for classification and the other regression for bounding box prediction.

Mask Classifier — generates a binary mask for each instance in an image.

Mask RCNN Architecture, Image Source: Mask RCNN Paper

About my Mask RCNN Model

I have developed a Mask RCNN model to detect four types of exterior damages in a car, namely, scratch, dent, shatter, and dislocation. I have trained my model using Step 4 a, Step 4 b, and also Step 4 c.

I found that if you want a model that generates an accurate mask you should use images annotated with a polygon and if you just only require an accurate bounding box you can use a bounding box annotation.

Step 1: Data collection and cleaning

The first step of any Data Science project is the collection of data. You can either collect image data by scraping web pages using BeautifulSoup for static web pages or Selenium for interactive web pages, or you can use the open-source dataset available to you on Kaggle. If you want to create a small dataset you can manually download images from Google Images. After collecting your dataset you need to clean images that are not adequate for your task. One important thing is to make sure that you name each image such that it has a unique numeric id example: 0001.jpg, 0002.jpg, etc.

For bounding box annotation, the dataset directory structure should be:

customImages
    |__ annots (all .xml annotation)
    |__ images (all images)

For polygon annotation, the dataset directory structure should be:

customImages
    |__ train (all training images and annotation file)
    |__ val (all validation images and annotation file)

Step 2: Image Annotation

As you might know, neural networks are supervised learning algorithms, therefore you need to annotate your images with ground truth i.e. create a bounding box or polygons around the object you want to detect. For polygon annotation, I would recommend using VGG image annotator and for bounding box annotation use Labellmg.

Polygon Annotation with VGG Image Annotator

Step 3: Download requirements

First, you will need to clone the matterport Mask RCNN repository.

git clone https://github.com/matterport/Mask_RCNN.git

Second, from the Matterport repository, you need to install the Mask RCNN library.

cd Mask_RCNN
python setup.py install

For Linux/ Mac OS use the sudo command instead.

Third, you need to install the following packages to train the model.

numpy
scipy
Pillow
cython
matplotlib
scikit-image
tensorflow>=1.3.0
keras>=2.0.8
opencv-python
h5py
imgaug
IPython[all]

This can be done by installing the requirements.txt file available in the matterport repository or from my own Mask RCNN project repository.

pip3 install -r requirements.txt

Finally, download the Mask RCNN weights for the MS COCO dataset here. You will train your custom dataset on these pre-trained weights and take advantage of transfer learning.

Now that you are all set !! Let’s start the training of your custom dataset model.

Model Training

Dataset class — we inherit Dataset class functionality into our user-defined class CustomDataset. This enables us to create our own functions to extract bounding boxes, load mask, and load dataset.

load_dataset function — this function is responsible for adding classes and images in the dataset before training our model on it, this is done by using the self.add_image() for adding images and self.add_class() for adding classes.

self.add_image(source, image_id, path, annotation) # bounding box
self.add_image(source, image_id, path, width, height, polygons, num_ids)# polygonself.add_class(source, class_id, class_name)

extract_boxes function — this function is only used in the case of bounding box annotation, we need to extract 4 values that define the bounding box which are xmin, ymin, xmax and ymax and also the height and width of the image.

load_mask function — this function is used in both bounding box annotated and polygon annotated images and is used to load the mask. Note that in the case of bounding box annotated images, the mask will always be a box.

image_inference function — this function is used to return the path of an image given its image id.

Config class — we inherit the Config class to initialize our configurations. We need to have a separate configuration for training and testing. Use the following to display configurations:

config = CustomConfig()
config.display()

evaluate_model function — this function is declared in the global namespace and is used to calculate the mean Average Precision (mAP) of your model.

Import Libraries

Note: You need to train the model only using one of the sub-step (a, b or c) in Step 4. However, please start to understand the code from Step 4 a. You can find all the model training codes in the model-training directory in my project repository.

Step 4 a: Model Training (bounding box annotation and single class classification)

Step 4 b: Model Training (bounding box annotation and multiple class classifications)

Step 4 c: Model Training (polygon annotation and multiple class classifications)

Now that you have understood the code for all three scenarios. Can you change the above codes for the following scenario yourself?

polygon annotation and single class classification

Image Augmentation

Image data augmentation is a technique that can be used to artificially expand the size of a training dataset by creating modified versions of images in the dataset. This can be done by fliping the image upside-down or left-right, rotating the image or scaling the image. This can be implemented as follows:

import imgaug.augmenters as iaamodel.train(train_set,
            test_set,
            learning_rate=config.LEARNING_RATE,
            epochs=15,
            layers='all',
            augmentation = iaa.Sometimes(5/6,iaa.OneOf([
            iaa.Fliplr(1),
            iaa.Flipud(1),
            iaa.Affine(rotate=(-45, 45)),
            iaa.Affine(rotate=(-90, 90)),
            iaa.Affine(scale=(0.5, 1.5))
            ])))

Step 5: Model Evaluation

If you are not able to complete the TODO tasks refer to my training notebook.

Step 6: Single Image Prediction

Step 7: Website Deployment using Flask Locally

When you are satisfied with your model’s performance then the next step is to develop a website for a user to interact with the model. I won’t explain the code in this section, but only provide it for you to use.

Put your_trained_weights.m5 file in the model directory.
Change line #22 in app/utils.py to the name of the weights of your model.
Run main.py file and the website will be hosted on http://127.0.0.1:5000/. Following are the URL rules I developed, you can add or delete these rules according to your preference.

app.add_url_rule('/base','base',views.base)
app.add_url_rule('/','index',views.index)
app.add_url_rule('/damageapp','damageapp',views.damageapp)
app.add_url_rule('/damageapp/damage','damage',views.damage,methods=['GET','POST'])

If you don’t want to use cost assessment functionality, just change cost_for_damage variable on line #45 of app/views.py to False. Cost assessment functionality is just for visual purposes and computes cost based on size of mask to size of image ratio.

Note: Use the mcrnn directory from my GitHub repository for deployment purposes, as I have made some changes to the original library.

REFERENCES

[1]He, K., Gkioxari, G., Dollár, P., and Girshick, R., “Mask R-CNN”, arXiv e-prints, 2017.

[2] Mask R-CNN for Object Detection and Segmentation by matterport.