Making a Custom Object Detector using a Pre-trained Model in Tensorflow

7 min readJul 11, 2018

***Procedures in this article is not applicable to the most recent Tensorflow models repo. Expect an update.***

Recent development in computer vision has enabled exciting new technologies like self-driving cars, gesture recognition, and machine vision. The processing power required to create computer vision models was a barrier of entry for those interested in exploring this technology. However, this is no longer the case with pre-trained models today.

Instead of training your own model from scratch, you can build on existing models and fine-tune them for your own purpose without requiring as much computing power.

In this tutorial, we’re going to get our hands dirty and train our own corgi detector using a pre-trained SSD MobileNet V2 model.

1. Installation

This tutorial is based on Anaconda virtual environment with Python 3.6.

1.1 Tensorflow

Install Tensorflow using the following command:

$ pip install tensorflow

If you have a GPU that you can use with Tensorflow:

$ pip install tensorflow-gpu

1.2 Other dependencies

$ pip install pillow Cython lxml jupyter matplotlib

Install protobuf using Homebrew (you can learn more about Homebrew here)

$ brew install protobuf

For protobuf installation on other OS, follow the instructions here.

1.3 Clone the Tensorflow models repository

In this tutorial, we’re going to use resources in the Tensorflow models repository. Since it does not come with the Tensorflow installation, we need to clone it from their Github repo:

First change into the Tensorflow directory:

# For example: ~/anaconda/envs/<your_env_name>/lib/python3.6/site-packages/tensorflow$ cd <path_to_your_tensorflow_installation>

Clone the Tensorflow models repository:

$ git clone https://github.com/tensorflow/models.git

From this point on, this directory will be referred to as the models directory

1.4 Setting up the environment

Every time you start a new terminal window to work with the pre-trained models, it is important to compile Protobuf and change your PYTHONPATH.

Run the following from your terminal:

$ cd <path_to_your_tensorflow_installation>/models/research/$ protoc object_detection/protos/*.proto --python_out=.$ export PYTHONPATH=$PYTHONPATH:`pwd`:`pwd`/slim

Run a quick test to confirm that the Object Detection API is working properly:

$ python object_detection/builders/model_builder_test.py

If the result looks like the following, you’re ready to proceed to the next steps!

...............
----------------------------------------------------------------------
Ran 15 tests in 0.123s

OK

1.5 Recommended folder structure

To make this tutorial easier to follow along, create the following folder structure within the models directory you just cloned:

models 
    ├── annotations
    |   └── xmls    
    ├── images
    ├── checkpoints
    ├── tf_record
    ├── research
    ...

These folders will be used to store required components for our model as we proceed.

2. Collect images

Data preparation is the most important part of training your own model. Since we’re going to train a corgi detector, we must collect pictures of corgis! About 200 of them would be sufficient.

I recommend using google-images-download to download images. It searches Google Images and then download images based on the inputs you provided. In the inputs, you can specify search parameters such as keywords, number of images, image format, image size, and usage rights.

Since we’re downloading more than 100 images at a time, we need a chromedriver in the models directory (download here). Once you have the chromedriver ready, you could use this sample command to download images. Make sure all your images are in the jpg format:

# From the models directory$ googleimagesdownload --keywords 'welsh corgi dog' \
--limit 200 \
--size medium \
--chromedriver ./chromedriver \
--format jpg

After downloading, save all images to models/images/. To make subsequent processes easier, let's rename the images as numbers (e.g. 1.jpg, 2.jpg) by running the following script:

3. Label your data set

Once you’ve collected all the images you need, you need to label them manually. There are many packages that serve this purpose. labelImg is a popular choice.

labelImg provides a user-friendly GUI. Plus, it saves label files (.xml) in the popular Pascal VOC format. Here's what a labelled image looks like in labelImg:

Double check that every image has a corresponding .xml file and save them in models/annotations/xmls/.

4. Create Label Map (`.pbtxt`)

Classes need to be listed in the label map. Since we’re only detecting corgis, the label map should contain only one item like the following:

label_map.pbtxt

Note that id must start from 1, because 0 is a reserved id.

Save this file as label_map.pbtxt in models/annotations/

5. Create `trainval.txt`

trainval.txt is a list of image names without file extensions. Since we have sequential numbers for image names, the list should look like this:

trainval.txt

Save this file as trainval.txt in models/annotations/

6. Create TFRecord (`.record`)

TFRecord is an important data format designed for Tensorflow. (Read more about it here). Before you can train your custom object detector, you must convert your data into the TFRecord format.

Since we need to train as well as validate our model, the data set will be split into training (train.record) and validation sets (val.record). The purpose of training set is straight forward - it is the set of examples the model learns from. The validation set is a set of examples used DURING TRAINING to iteratively assess model accuracy.

We’re going to use create_tf_record.py to convert our data set into train.record and val.record. Download here and save it to models/research/object_detection/dataset_tools/.

This script is preconfigured to do 70–30 train-val split. Execute it by running:

# From the models directory$ python research/object_detection/dataset_tools/create_tf_record.py

If the script is executed successfully, train.record and val.record should appear in your models/research/ directory. Move them into the models/tf_record/ directory.

7. Download pre-trained model

There are many pre-trained object detection models available in the model zoo. In order to train them using our custom data set, the models need to be restored in Tensorflow using their checkpoints (.ckpt files), which are records of previous model states.

For this tutorial, we’re going to download ssd_mobilenet_v2_coco here and save its model checkpoint files (model.ckpt.meta, model.ckpt.index, model.ckpt.data-00000-of-00001) to our models/checkpoints/ directory.

8. Modify Config (`.config`) File

Each of the pretrained models has a config file that contains details about the model. To detect our custom class, the config file needs to be modified accordingly.

The config files are included in the models directory you cloned in the very beginning. You can find them in:

models/research/object_detection/samples/configs

In our case, we’ll modify the config file for ssd_mobilenet_v2_coco. Make a copy of it first and save it in the models/ directory.

Here are the items we need to change:

Since we’re only trying to detect corgis, change num_classes to 1
fine_tune_checkpoint tells the model which checkpoint file to use. Set this to checkpoints/model.ckpt
The model also needs to know where the TFRecord files and label maps are for both training and validation sets. Since our train.record and val.record are saved in tf_record folder, our config should reflect that:

train_input_reader: {
  tf_record_input_reader {
    input_path: "tf_record/train.record"
  }
  label_map_path: "annotations/label_map.pbtxt"
}

eval_input_reader: {
  tf_record_input_reader {
    input_path: "tf_record/val.record"
  }
  label_map_path: "annotations/label_map.pbtxt"
  shuffle: false
  num_readers: 1
}

9. Train

At this point, your models directory should look like this:

models 
    ├── annotations
    |   ├── label_map.pbtxt
    |   ├── trainval.txt
    |   └── xmls
    |       ├── 1.xml
    |       ├── 2.xml
    |       ├── ...
    |
    ├── images
    |   ├── 1.jpg
    |   ├── 2.jpg
    |   ├── ...    
    |
    ├── checkpoints
    |   ├── model.ckpt.data-00000-of-00001
    |   ├── model.ckpt.index
    |   └── model.ckpt.meta
    |
    ├── tf_record
    |   ├── train.record
    |   └── val.record
    |
    ├── research
    |   ├── ...
    ...

If you have successfully completed all previous steps, you’re ready to start training!

Follow the steps below:

# Change into the models directory
$ cd tensorflow/models

# Make directory for storing training progress
$ mkdir train

# Make directory for storing validation results
$ mkdir eval

# Begin training
$ python research/object_detection/train.py \
    --logtostderr \
    --train_dir=train \
    --pipeline_config_path=ssd_mobilenet_v2_coco.config

Training time varies depending on the computing power of your machine.

10. Evaluation

Evaluation can be run in parallel with training. The eval.py script checks the train directory for progress and evaluate the model based on the most recent checkpoint.

# From the models directory$ python research/object_detection/eval.py \
    --logtostderr \
    --pipeline_config_path=ssd_mobilenet_v2_coco.config \
    --checkpoint_dir=train \
    --eval_dir=eval

You can visualize model training progress using Tensorboard:

# From the models directory$ tensorboard --logdir=./

Based on the graphs output by Tensorboard, you may decide when you want to stop training. Usually, you may stop the process when the loss function is tapering off and no longer decreasing by a significant amount. In my case, I stopped at step 3258.

11. Model export

Once you finish training your model, you can export your model to be used for inference. If you’ve been following the folder structure, use the following command:

# From the models directory$ mkdir fine_tuned_model $ python research/object_detection/export_inference_graph.py \    
--input_type image_tensor \    
--pipeline_config_path ssd_mobilenet_v2_coco.config \    
--trained_checkpoint_prefix  train/model.ckpt-<the_highest_checkpoint_number> \    
--output_directory fine_tuned_model

12. Classify images

Now that you have a model, you can use it to detect corgis in pictures and videos! For the purpose of demonstration, we’re going to detect corgis in an image. Before you proceed, pick an image you want to test the model with.

The models directory came with a notebook file (.ipynb) that we can use to get inference with a few tweaks. It is located at models/research/object_detection/object_detection_tutorial.ipynb. Follow the steps below to tweak the notebook:

MODEL_NAME = 'ssd_mobilenet_v2_coco_2018_03_29'
PATH_TO_CKPT = 'path/to/your/frozen_inference_graph.pb'
PATH_TO_LABELS = 'models/annotations/label_map.pbtxt'
NUM_CLASSES = 1
Comment out cell #5 completely (just below Download Model)
Since we’re only testing on one image, comment out PATH_TO_TEST_IMAGES_DIR and TEST_IMAGE_PATHS in cell #9 (just below Detection)
In cell #11 (the last cell), remove the for-loop, unindent its content, and add path to your test image:

imagepath = 'path/to/image_you_want_to_test.jpg

After following through the steps, run the notebook and you should see the corgi in your test image highlighted by a bounding box!

Corgi found by our custom object detector

There you have your custom corgi detector! In the next tutorial, I’ll be walking you through the set up of real-time object detection on your webcam. Stay tuned!

More details

Tensorflow Object Detection Model Documentation