Custom Object Detection Using Tensorflow in Google Colab

Matus Tanonwong
7 min readJun 17, 2020

--

Google Colab is a free cloud service that is utterly beneficial to us for enhancing our programming language skills i.e. Python, running multiple codes on a provided GPU at its highest speed, or even creating our deep learning applications using available libraries such as Keras, Pytorch, OpenCV, and Tensorflow.

In this tutorial, we will write Python codes in Google Colab to build and train a Totoro-and-Nekobus detector, using both the pre-trained SSD MobileNet V1 model and pre-trained SSD MobileNet V2 model. Each pre-trained model has pros and cons, which we will discuss later.

By following the instructions below step by step, we can surely build and train our own object detector.

  1. Installation

1.1. Tensorflow

First and foremost, we need to install some required libraries. Since Python, Tensorflow, and GPU have already been pre-installed in Google Colab, we are only required to import the necessary libraries to our program and select the version of Tensorflow for training our model. In this case, Tensorflow version 1 is our priority.

We run the following codes to select the version of Tensorflow and check the version of the Tensorflow that we are currently using.

Note that tf_slim has been removed from the updated versions of the Tensorflow. Therefore, we need to install it ourselves.

1.2. Clone the repository of Tensorflow models

There are Tensorflow models available on the Github repository. We simply clone them into our cloud server in Google Colab since it would take forever if we had to build Tensorflow models from scratch.

First, change the directory to ‘ root ’ and then clone the Github repo to the Google Colab.

1.3. Install Tensorboard

We install Tensorboard beforehand by the following codes.

Consequently, we will see a link as the result, for example, http://05ec17a84e88.ngrok.io. When we click on the link, we will see log graphs for training losses. After we arrive at the step of training our model, we will see the log graph of the training losses being plotted in real-time, which the Tensorboard will automatically be refreshed every 30 seconds.

1.4. Setting up an environment

Keep in mind that we must compile Protobuf and change our Python path every time we start a new window to run the Python codes inside the Tensorflow models.

Let’s run a quick test to verify that the Object Detection API operates properly.

Up to this point, if there have been no errors occurring, then we are ready to proceed to the next step.

2. Collect images, label data set, and create label map

First of all, we have to collect images of Totoro and Nekobus either from the internet or videos and then save them to a local drive on our PC.

Then we download and install labelImg tool on the PC and open it.

Next, we click “open” on the toolbar on the left and choose the images we want to annotate.

After that, we click “Create Rectbox” on the toolbar, label the images, and name them either “Totoro” or “Nekobus”.

And lastly, we click “save” to save the file in XML format, which contains details of the labeled image.

After the annotations, we can directly upload our files into the Google Colab. At the top left corner of the screen in Google Colab, we can see the “upload” button. Click it and manually upload desired files.

For simplicity, we will clone the pre-labeled images and the label maps from Github repo to our ‘models’ directory on Google Colab instead.

3. Convert XML files to single CSV file

Currently, there are several pre-labeled images of Totoro and Nekobus in form of .xml files. We prefer to have only one file for creating TFRecords, which are significant data formats for the Tensorflow. Hence, we will convert all XML files to a single CSV file by running the code below.

4. Create TFRecords

As aforementioned, TFRecords are the essential data formats for the Tensorflow. We must transform our data into the TFRecord format prior to training our custom object detector.

The data set will be divided into two sets, which are training (train.record) and validation sets (test.record).

Note that the validation set is not the data set that we will use as image inputs for testing our models. The validation set is the set that we use to check the overfitting of our model.

To transform our data into TFRecord format, the Python code ‘generate_tf_record.py’ that we have cloned from Github repo in step 2 will be deployed. The code is stored in ‘/root/models/totoro/tfrecord/’ path

After running the code above, the results will show
Successfully created the TFRecords: /root/models/totoro/tfrecord/train.record” and “Successfully created the TFRecords: /root/models/totoro/tfrecord/test.record” respectively.

5. Download Pre-trained model

There are a bunch of pre-trained object detection models provided in the model zoo. To train our custom data set, the models must save their checkpoints (.ckpt files) that record the previous states of the models.

To demonstrate that, we are going to download ‘ssd_mobilenet_v1_coco’ pre-trained model from the model zoo.

For convenience’s sake, we move our config file to the ‘model’ directory.

6. Modify the config file

To modify the config file, we will use ‘pipeline’ commands to change the texts within the config file.

Note that the ‘pipeline’ commands are only available in the Tensorflow version 1.

Here are the items we need to change:

  1. Since we want to detect Totoro and Nekobus, change “num_class” to 2
  2. “fine_tune_checkpoint” indicates which checkpoint file to use. Set this to ‘/root/models/pretrained_model/model.ckpt’
  3. The model needs to know where the TFRecord files and label maps for both training and validation sets. So, change their paths to the absolute paths where they are kept.
  4. num_step” is the amount of the total cycles we train our custom model. The more “num_step” we set, the less loss our custom model is likely to possess.

After the code above is run, the config file will be modified.

7. Train our model

Up to this point, we are ready to train our model.

Note that we must compile Protobuf and change our Python path every time we start a new window to run the Python codes inside the Tensorflow models.

After running the following code, our checkpoints and results will be saved at ‘/root/models/trained’ accordingly.

8. Export trained model

At this point, our trained model will be exported to be used as inference.

Since “num_step” can be varied, we are required to write a code to obtain the latest checkpoint of our trained model.

Until now, we have already built and trained our custom object detector.

9. Classify images

We are now ready to test our detector by detecting Totoro and Nekobus in pictures.

Before going any further, there are some necessary libraries we need to import

Then we need to tell what model to download, input the number of classes our trained model, and set the path to the frozen detection graph

Next, run the following code to see if our category indexes are correct.

The result should be {1: {‘id’: 1, ‘name’: ‘totoro’}, 2: {‘id’: 2, ‘name’: ‘nekobus’}}

After that, we add the path of the images we want to test to the “TEST_IMAGE_PATHS” path and run the codes below.

Finally, our results should look like the followings:

Results from Detector with ‘ssd_mobilenet_v1_coco’ Pre-trained Model

If we replace ‘ssd_mobilenet_v1_coco’ pre-trained model’ with ‘ssd_mobilenet_v2_coco’ pre-trained model and repeat all the steps above, we will yield the following results:

Results from Detector with ‘ssd_mobilenet_v2_coco’ Pre-trained Model

From both results from ‘ssd_mobilenet_v1_coco’ pre-trained model and ‘ssd_mobilenet_v2_coco’ pre-trained model, we can see that our detector with ‘ssd_mobilenet_v1_coco’ pre-trained model can detect near objects more accurately and precisely than the ones with the ‘ssd_mobilenet_v2_coco’ pre-trained model. The reason is that ‘ssd_mobilenet_v2_coco’ pre-trained model makes our detector too sensitive. This results in them mistakenly detecting and framing the objects more than once.

On the other hand, our detector with ‘ssd_mobilenet_v1_coco’ pre-trained model is less effective than the ones with ‘ssd_mobilenet_v2_coco’ pre-trained model when they detect far objects. If we look at the right bottom picture closely, we will notice that the percent accuracy of predictions from the detector with ‘ssd_mobilenet_v1_coco’ pre-trained model is drastically less than the detector with ‘ssd_mobilenet_v2_coco’ pre-trained model.

Here are the Python notebook and Github links as references:

https://github.com/tensorflow/models

--

--