Step by Step TensorFlow Object Detection API Tutorial — Part 4: Training the Model

Daniel Stang
Oct 24, 2017 · 4 min read

At this point in the tutorial you have selected a pre-trained model, found an existing dataset or created your own, and converted it into a TFRecord file. You are now ready to train your model.


Model Config File

If you have previous transfer learning experience you likely have a question that has been lingering since part 2 of this tutorial. That question is, how do I modify the pre-trained model that was designed to work on the 90 classes of the COCO dataset, to work on the X number of classes of my new dataset? To accomplish this before the object detection API, you would have to remove the last 90 neuron classification layer of the network and replace it with a new layer. An example of this in TensorFlow is shown below.

# Assume fc_2nd_last is the 2nd_last fully connected layer in your network and nb_classes is the number of classes in your new dataset.
shape = (fc_2nd_last.get_shape().as_list()[-1], nb_classes)
fc_last_W = tf.Variable(tf.truncated_normal(shape, stddev=1e-2))
fc_last_b = tf.Variable(tf.zeros(nb_classes))
logits = tf.nn.xw_plus_b(fc_2nd_last, fc_last_W, fc_last_b)

To accomplish this with the object detection API, all you need to do is modify one line in the models config file. Where you cloned the TensorFlow models repository, navigate to object_detection/samples/configs. In this folder you will find config files for all of the pre-trained models.

Copy the config file for the model you selected and move it to a new folder where you will perform all the training. In this new folder, create a folder called data and move your TFRecord file inside of it. Create another folder called models and move the .ckpt (checkpoint) files (3 of them) of the pre-trained model you selected into this folder. Recall that model_detection_zoo.md contains download links for each of the pre-trained models and the download for each of the models here will contain not only the .pb file (which we used in our jupyter notebook in Part 1) but also a .ckpt file. Inside the models folder create another folder called train.


Modifying the Config File

Open up the newly moved config file in a text editor and at the very top change the number of classes to the amount in your dataset. Next change the fine_tune_checkpoint path to point to the model.ckpt file. If you followed the model structure I suggested this will be:

fine_tune_checkpoint: "models/model.ckpt"

The parameter num_steps determines how many training steps you will run before finishing. This number really depends on the size of your dataset along with a number of other factors (including how long you are willing to let the model train for). Once you start training I suggest you see how long it’s taking for each training step and adjust num_steps accordingly.

Next you need to change the input_path and label_map_path for both the training and evaluation dataset. Input_path just goes to your TFRecord file. Before we can set the path for label_map_path we need to create the file it’s supposed to point to. All it’s looking for is a .pbtxt file that contains the id and name for each of the labels of your dataset. You can create this in any text file by following the format below.

item {
id: 1
name: 'Green'
}
item {
id: 2
name: 'Red'
}

Ensure you start with id: 1 and not 0. I’d recommend placing this file inside your data folder. Lastly set num_examples to the number of evaluation samples you have.


Training

Navigate to the object_detection folder and copy train.py to your newly created training folder. To begin training simply navigate your terminal window to this folder (ensure you have followed the install instructions in Part 1) and enter in the command line,

python train.py --logtostderr --train_dir=./models/train --pipeline_config_path=rfcn_resnet101_coco.config

where pipline_config_path points to your config file. Training will now begin. Beware that depending on your system, training could take a few minutes to start so if it doesn’t crash or stop, give it some more time.

If you are running out of memory and this is causing training to fail, there are a number of solutions you can try. First try adding the arguments

batch_queue_capacity: 2
prefetch_queue_capacity: 2

to your config file in the train_config section. For example, placing the two lines between gradient_clipping_by_norm and fine_tune_checkpoint will work. The number 2 above should only be starting values to get training to begin. The default for those values are 8 and 10 respectively and increasing those values should help speed up training.

That’s it, you have now started the training that will fine tune your model! If you want to get a better idea of how training is progressing, look into using TensorBoard.

In the next post I’ll show you how to save your trained model and deploy it in a project!