Multi-Object Detection with The Tensorflow Object Detection API

5 min readMay 13, 2022

This blog post will be discussing using TFOD(Tensorflow object detection) API to detect custom objects in images using Google Colab platform.
It is required you have your Image dataset pre-collected, pre-annoted and split into a ‘test’ and ‘train’ folders. I uploaded my dataset on Google Drive to achieve data persistency to beat the frequent colab runtime disconnections.

Visual object detection seeks to discover objects of specific target classes in a given image with pinpoint accuracy and apply a class label to each object instance.

The method of locating specific items in photographs is known as object detection. We may deal with single-object or multi-object detection challenges, depending on the number of objects in the image.

Object Detection

Object Localization includes creating a bounding box around one or more objects in an image, whereas image classification involves assigning a class label to an image. Object detection combines these two tasks by drawing a bounding box around each object of interest in the image and assigning a class label to each of them.

Multi-Object detection’s main goal is to identify and find more effective targets in still pictures or video data.

The Tensorflow Object Detection API

The TensorFlow object detection API provides a platform for building deep learning models for object detection.

Developers can use the TFOD API to access a set of common operations without having to write code from scratch.

Object Detection Setup and Steps

For compatibility sake, package dependency variations and versioning, and hardware resource provisioning for running an end-to-end object detection program, we will carry out this program on Google Colab.

Note: The full code for this tutorial is written in python and is hosted here and is meant to be followed through hands-on with this post. The tutorial will not cover detections on a live webcam or a mobile device. nevertheless, In the jupyter notebook for this tutorial, there is a commented-out code section for live webcam detection.

step 1- Creating our folder structure

We define some string variables to hold the names of our custom model name, pre-trained model name, pre-trained model URL, and the name of the script to generate TensorFlow records.

Dictionary objects pathand filesare created to hold the names of the paths of our folder structure.

we use a for loop to create the folder structure below.

for path in paths.values():
    if not os.path.exists(path):
        if os.name == 'posix':
            !mkdir -p {path}
        if os.name == 'nt':
            !mkdir {path}

step 2- Install the TFOD( Tensorflow object detection) API

Because we are doing this on Google Colab, most package dependencies have been taken care of for us. However, the steps to set up the TFOD API on certain platforms are described herein in this documentation.

Now let's pull the repository of the TensorFlow model Garden to our environment.

if not os.path.exists(os.path.join(paths['APIMODEL_PATH'], 'research', 'object_detection')):
    !git clone https://github.com/tensorflow/models {paths['APIMODEL_PATH']}

we need to Run a verification script provided to sanity-check that we are on track.

VERIFICATION_SCRIPT = os.path.join(paths['APIMODEL_PATH'],   'research', 'object_detection', 'builders', 'model_builder_tf2_test.py')!python {VERIFICATION_SCRIPT}

Next, we download the pre-trained model we have chosen to use. I chose ‘ssd_mobilenet_v2_fpnlite_320x320_coco17_tpu-8’ .

!wget {PRETRAINED_MODEL_URL}
!mv {PRETRAINED_MODEL_NAME+'.tar.gz'}      {paths['PRETRAINED_MODEL_PATH']}

The PRETRAINED_MODEL_URL contains the direct weblink to download the chosen pre-trained object detection model from the Tensorflow Model Repo.

To get the model direct web link, go here, then right-click on the model of your choice. choose “copy link address”. The copied link string should be assigned to PRETRAINED_MODEL_URL

step 3- Create Label Map

with open(files['LABELMAP'], 'w') as f:
    for label in labels:
        f.write('item { \n')
        f.write('\tname:\'{}\'\n'.format(label['name']))
        f.write('\tid:{}\n'.format(label['id']))
        f.write('}\n')

The code above loops through labels — a list of dictionary objects to create a label map.

The Label file is a ‘.pbtxt’ extension. The TFOD API makes reference to this file through the pipeline config script for training execution.

step 4- Create TF Records

Let's download the Tensorflow Record Script that will aid us to convert our dataset to TF Records.

The TFRecord format is a straightforward binary record storage format. Probing TF records is faster for a performance boost.

if not os.path.exists(files['TF_RECORD_SCRIPT']):
    !git clone https://github.com/nicknochnack/GenerateTFRecord {paths['SCRIPTS_PATH']}

step 5- Copy the Model Config file to the training folder

In the code below we check for platform OS with os.name in other to use the appropriate ‘copy’ command. posix is for Linux based, nt is for windows.

why are we copying the config file? we are doing this because when it's time to initiate training, the training script will require as one of its arguments, the pipeline config file.

if os.name =='posix':
    !cp {os.path.join(paths['PRETRAINED_MODEL_PATH'], PRETRAINED_MODEL_NAME, 'pipeline.config')} {os.path.join(paths['CHECKPOINT_PATH'])}
if os.name == 'nt':
    !copy {os.path.join(paths['PRETRAINED_MODEL_PATH'], PRETRAINED_MODEL_NAME, 'pipeline.config')} {os.path.join(paths['CHECKPOINT_PATH'])}

step 6- Update Config for Transfer Learning

It is very important to label our configuration file. This config file would be accessed by the TFOD API for training and validation operations. In it, we are required to change some variables for our custom training.

We need to change the number of classes, pre-trained model checkpoint file directory, Label map file directory, Tf record file directory, and a few other things that meet the needs of our custom detection challenges.

below is what it looks like programmatically adjusting these values.

pipeline_config.model.ssd.num_classes = len(labels)
pipeline_config.train_config.batch_size = 4
pipeline_config.train_config.fine_tune_checkpoint_type = "detection"
.
.
.

step 7- Train the Model

Finally, we train the model. The command to train is a bit lengthy but we will look into all the command arguments shortly.

command = "python {} --model_dir={} --pipeline_config_path={} --num_train_steps=2000".format(TRAINING_SCRIPT, paths['CHECKPOINT_PATH'],files['PIPELINE_CONFIG'])

The command above runs the Training script and passes a few arguments to it.

--model_dir refers to the directory path of the model. the training script is going to look for the model files in that path. --pipeline_config_path refers to the directory path to the pipeline config file. The num_train_steps is set to 2000 number of iterations.

The predictions for this model are: xmin, ymin , xmax , ymax , width and height.

Conclusion

We went over 7 steps to training our custom image detection model with Tensorflow Object Detection API. The code blocks here are not comprehensive. I hosted the full code on my Github here for brevity's sake.

To increase performance in your model you can train for longer, change the model architecture or add more images to your dataset. You can do one or all of them.

Thanks for reading through. Enjoy!

Contact me

Twitter — https://twitter.com/DNnamaka
Github — https://github.com/Nnamaka
Email — nnamaka7@gmail.com