Custom Mask RCNN using Tensorflow Object Detection API

Vijendra Singh
7 min readOct 21, 2018

This blog post takes you through a sample project for building Mask RCNN model to detect the custom objects using Tensorflow object detection API. I have tried to make this post as explanatory as possible. In case you are stuck at any step then please comment for support. This post will be pointing you to the project’s Github repository at every step. You could found the project’s Github repository HERE. Note: Tensorflow version 1.13.1 used.

Folder Structure

folder structure

Create folders

Clone the Github repository or Create the folders following the structure given above (You could use a different name for any of the folders)

Prepare train and test images

The Project’s repository contains train and test images for the detection of a blue Bluetooth speaker and a mug but I will highly recommend you to create your own dataset. Pick up objects you want to detect and take some pics of it with varying backgrounds, angles, and distances. Training images used in this sample project are shown below:

sample data
sample data

Once you have captured images, transfer it to your PC and resize it to a smaller size (given images have the size of 512 x 384) so that your training will go smoothly without running out of memory. Now rename (for better referencing later) and divide your captured images into two chunks, one chunk for training(80%) and another for testing(20%). Finally, move training images into the dataset/train_images folder and testing images into the dataset/test_images folder.

Label the data

Now it’s time to label the training data. We will be doing this using the PixelAnnotationTool library.

You could follow the following tutorial for knowing how to use the tool.

This tool will generate three files in the image folder

  • <image_name>_color_mask.png
  • <image_name>_mask.png
  • <image_name>_watershed_mask.png.

You need to take all <image_name>_color_mask.png and place it in dataset/train_masks and then rename it from <image_name>_color_mask.png to <image_name>.png.

The color mask will look something like this:

color mask

Setup Tensorflow models repository

Now it’s time when we will start using Tensorflow object detection API so go ahead and clone it using the following command

git clone https://github.com/tensorflow/models.git

Once you have cloned this repository, change your present working directory to models/research/ and add it to your python path. If you want to add it permanently then you will have to make the changes in your .bashrc file or you could add it temporarily for current session using the following command:

export PYTHONPATH=$PYTHONPATH:`pwd`:`pwd`/slim

You also need to run following command in order to get rid of the string_int_label_map_pb2 issue (more details HERE)

protoc object_detection/protos/*.proto --python_out=.

Now your Environment is all set to use TensorFlow object detection API

Convert the data to Tensorflow record format

In order to use Tensorflow API, you need to feed the data in the Tensorflow record format. I have modified the script create_pet_tf_record.py given by Tensorflow and placed the same in the project repository inside the folder named as supporting_scripts. The name of the modified file is given as create_mask_rcnn_tf_record.py. All you need to do is to take this script and place it in the models/research/object_detection/dataset_tools.

Create_mask_rcnn_tf_record.py is modified in such a way that given a mask image, it should found bounding box around objects on it owns and hence you don’t need to spend extra time annotating bounding boxes but it produces wrong output if mask image has multiple objects of the same class because then it will not be able to find bounding box for each object of the same class rather it will take a bounding box encompassing all objects of that class.

If you have multiple objects of the same class in some images then use labelImg library to generate XML files with bounding boxes and then place all the XML files generated from the labelImg under dataset/train_bboxes folder. If you intend to use this method then you will have to set bboxes_provided flag as True while running create_mask_rcnn_tf_record.py otherwise set it to False. It's been forced to provide bboxes_provided flag in order to avoid the users from making mistakes.

To download the labelImg library along with its dependencies go to THIS LINK. Once you have the labelImg library downloaded on your PC, run lableImg.py. Select train_images directory by clicking on Open Dir and change the save directory to dataset/train_bboxes by clicking on Change Save Dir. Now all you need to do is to draw rectangles around the object you are planning to detect. You will need to click on Create RectBox and then you will get the cursor to label the objects. After drawing rectangles around objects, give the name for the label and save it so that Annotations will get saved as the .xml file in dataset/train_bboxes folder.

Bounding box annotation using the labelImg

After doing the above, one last thing is still remaining before we get our Tensorflow record file. You need to create a file for the label map, in the project repository, it’s given as label.pbtxt under the dataset subfolder. In the label map, you need to provides one item for each class. Each item holds the following information: class id, class name and the pixel value of the color assigned to the class in masks. You need to notice in the given sample label.pbtxt that the last three letters of the string assigned as name of the class will be considered as the pixel value. You could find the mask pixel value by opening the mask image as a grayscale image and then check pixel value in the area where your object is. A file with name Check_pixel_values.ipynb is given under subfolder named as supporting_script to help you with this task.

Now it time to create a tfrecord file. From models/research as present working directory run the following command to create Tensorflow record (given that you are following same folder structure as provided in the repository otherwise check all the flags which need to be provided to script and pass the appropriate one):

Python object_detection/dataset_tools/create_mask_rcnn_tf_record.py --data_dir_path=<path to directory containing dataset> --bboxes_provided=<True if you are providing bounding box annoations as xml file>

There are more flags which could be passed to the script, for more help run the following command:

python object_detection/dataset_tools/create_pascal_tf_record.py -h

An example if you are using bounding box annotations:

Python object_detection/dataset_tools/create_mask_rcnn_tf_record.py --data_dir_path=/Users/xyz/Custom-Mask-RCNN-using-Tensorfow-Object-detection-API/dataset --bboxes_provided=True

Training

Now that we have data in the right format to feed, we could go ahead with training our model. The first thing you need to do is to select the pre-trained model you would like to use. You could check and download a pre-trained model from Tensorflow detection model zoo Github page. Once downloaded, extract all file to the folder you had created for saving the pre-trained model files. Next you need to copy models/research/object_detection/sample/configs/<your_model_name.config> and paste it in the project repo. You need to configure 5 paths in this file. Just open this file and search for PATH_TO_BE_CONFIGURED and replace it with the required path. I used pre-trained mask RCNN which is trained with inception V2 as feature extractor and I have added modified config file (along with PATH_TO_BE_CONFIGURED as the comment above lines which has been modified) for same in this repo. You could also play with other hyperparameters if you want. Now you are all set to train your model, just run the following command with models/research as present working directory

python object_detection/legacy/train.py --train_dir=<path_to_the folder_for_saving_checkpoints> --pipeline_config_path=<path_to_config_file>

An example will be

python object_detection/legacy/train.py --train_dir=/Users/vijendra1125/Documents/tensorflow/object_detection/multi_object_mask/CP --pipeline_config_path=/Users/vijendra1125/Documents/tensorflow/object_detection/multi_object_mask/mask_rcnn_inception_v2_coco.config

Let it train till loss will be below 0.2 or even lesser. once you see that loss is as low as you want then give keyboard interrupt. Checkpoints will be saved in CP folder. Now its time to generate inference graph from saved checkpoints

python object_detection/export_inference_graph.py --input_type=image_tensor --pipeline_config_path=<path_to_config_file> --trained_checkpoint_prefix=<path to saved checkpoint> --output_directory=<path_to_the_folder_for_saving_inference_graph>

An example will be

python object_detection/export_inference_graph.py --input_type=image_tensor --pipeline_config_path=/Users/vijendra1125/Documents/tensorflow/object_detection/multi_object_mask/mask_rcnn_inception_v2_coco.config --trained_checkpoint_prefix=/Users/vijendra1125/Documents/tensorflow/object_detection/multi_object_mask/CP/model.ckpt-2000 --output_directory=/Users/vijendra1125/Documents/tensorflow/object_detection/multi_object_mask/IG

Bonus: If you want to train your model using Google Colab then check out the train.ipynb file

Test the trained model

Finally, it’s time to check the result of all the hard work you did. All you need to do is to copy model/research/object_detection/object_detection_tutorial.ipynb and modify it to work with you inference graph. A modified file is already given as eval.ipynb with this repo, you just need to change the path, number of classes and the number of images you have given as test image. Below is the result of the model trained for detecting the “UE Roll” blue Bluetooth speaker and a cup.

test results

--

--