Instance Segmentation using Mask R-CNN on a custom dataset

Published in

Analytics Vidhya

4 min readFeb 20, 2020

In this article, we will use Mask R-CNN for instance segmentation on a custom dataset.

Before getting into the details of implementation, what is segmentation exactly? What are the types of segmentation?

The picture gives a good explanation of the different computer vision techniques:

Source: https://engineering.matterport.com/splash-of-color-instance-segmentation-with-mask-r-cnn-and-tensorflow-7c761e238b46

Classification: Simply tells whether balloon is present or not in the image.
Object Detection: This locates all the balloon objects in the image and generates bounding boxes for each.
Semantic Segmentation: This highlights all the balloon pixels with the same color.
Instance Segmentation: This highlights different instances of balloon with different colors.

Hence, semantic segmentation will classify all the objects as a single instance. But in instance segmentation, different objects of the same class have been assigned as different instances.

Why do we need segmentation if we already have object detection?

Source: https://medium.com/analytics-vidhya/image-classification-vs-object-detection-vs-image-segmentation-f36db85fe81

Object detection creates a bounding box corresponding to each class in the image. But it gives no information about the shape of the object. We only get the set of bounding box coordinates.

Image segmentation creates a pixel-wise mask for each object in the image. And hence it gives us a far more granular understanding of the objects in the image.

This article gives a good insight into the theoretical part explaining what is Mask R-CNN, its architecture and how to train it on a custom dataset but I still found it a little difficult to implement the actual code so I am writing this article with detailed steps and some actual code snippets.

There is a pre-trained model here which is trained on the COCO dataset using Mask R-CNN but it only consists of 80 classes and hence we will see now how to train on a custom class using transfer learning.

The custom object on which we will be training is ‘bottle’. Here is the link to the dataset for the same. So let’s dive into the implementation details:

Step 1: Preparing the dataset

The dataset I mentioned above consists of 100 images out of which 76 are used for training and rest for validation. I used the tool VGG annotator for labeling the images. Its a simple tool and it labels all the images and exports it to a single JSON file.

Step 2: Clone the repository

Use the command below to clone the repository

git clone https://github.com/matterport/Mask_RCNN.git

After this, we need to install the dependencies required for Mask R-CNN.

Step 3: Install the dependencies

The repository contains a file named requirements.txt which lists all the dependencies and hence use this command to install all the dependencies:

pip install -r requirements.txt

Step 4: Download the pre-trained weights (trained on MS COCO)

Next, we need to download the pre-trained weights. You can use this link to download the pre-trained weights. These weights are obtained from a model that was trained on the MS COCO dataset. Once you have downloaded the weights, paste this file in the root folder of the Mask_RCNN repository that we cloned in step 2.

Step 5: Editing the code for a custom class

Inside the samples directory of the cloned repo, create a new folder of your class, in my case, it would be bottle. In this folder create a dataset folder and paste the train and validation images inside it. Also, copy the file balloons.py and rename it to bottle.py and edit it according to your needs. Here is the link to my bottle.py file Now let’s see what all we need to edit.

Dataset Class

It consists of 3 functions which need to be changed:

def load_balloons(self, dataset_dir, subset):
def load_mask(self, image_id):
def image_reference(self, image_id):

Replace ‘balloons’ with your custom class name(bottle here) inside all the 3 functions.

2. Configuration class

class BottleConfig(Config):NAME = "bottle"# Number of classes (including background)
    NUM_CLASSES = 1 + 1  # Background + bottle# Number of training steps per epoch   
    STEPS_PER_EPOCH = 100

Step 6: Training

Training is a computationally heavy task and will require a GPU, if you don’t have one don’t worry I will show how can you use google colab for training purpose. Run this command from inside the bottle directory to start training

python bottle.py train --dataset=/samples/bottle/dataset  
--weights=coco

For those not having GPU here is the link to my colab notebook.

The steps are as follows

Upload the entire directory structure with all the files to google drive.
Create a new Colab notebook.
For code refer my colab notebook and start training.
All the models will be saved to your google drive inside the folder rootdir/logs/