Training an Object Detection Model with TensorFlow API using Google COLAB

Nathaniel O Solomon
Analytics Vidhya
Published in
12 min readOct 29, 2019

Updated: 5:23 am 19th of April, 2020.
Click here to get the Notebook

Original image without the Anchor box was gotten from Google images

Colab offers free access to a computer that has reasonable GPU, even TPU. It is a cloud service based on Jupyter Notebooks and internet connectivity is required for access.

Few things you Should Know about Colab

  • The virtual machine allows absolutely anyone to develop deep learning applications using popular libraries such as PyTorch, TensorFlow, Keras, and OpenCV
  • It supports Python 2.7 and 3.6.
  • There is a limit to your sessions and size, but you can definitely get around that if you’re creative and don’t mind occasionally re-uploading your files
  • You have the instance for 12 hours. This means that after 12 hours everything on the assigned computer will be wiped clean.
  • You can use it together with Google Drive for storage purposes. Which is advisable.

Let’s get started.

Note: Some of the processes will/can be done offline and uploaded to the google drive, for Example, Image annotation and python scripts creation.

Step 1: Create a directory in your google drive where you can save all the files needed for the training the model.

Step 2: Go to Colab, sign in with the same Google account used for the google-drive and create a new notebook.

Step 3: In the notebook go to Runtime > Change Runtime Type and make sure to select GPU as Hardware accelerator.

Step 4: Run the code in the cells below

%tensorflow_version 1.x

Due to the upgrade in the TensorFlow on colab, run the code above. Since object detection API for TensorFlow, 2.0 hasn't been updated as of the time this publication is been reviewed.

import tensorflow as tf
device_name = tf.test.gpu_device_name()
if device_name != '/device:GPU:0':
raise SystemError('GPU device not found')
print('Found GPU at: {}'.format(device_name))
print(tf.__version__)

You should see ‘Found GPU’ and tf version 1.x

NB: TensorFlow 2.x not supported as of the time of this publication review

Step 5: Mount Google Drive with the code below and click on the link. Then sign in with your google drive account, and grant it access. You will be redirected to a page, copy the code on that page and paste it in the text-box of the Colab session you are running then hit the ENTER key.

from google.colab import drive
drive.mount('/content/gdrive')

Step 6: Change directory to the folder you created initially on your google drive. In my case, I named the folder Desktop.

%cd '/content/gdrive/My Drive/Desktop/'

Step 7: Clone the TensorFlow models repository. Also, you may clone the COCO repository and install the COCO object detection API for evaluation purpose.

!git clone https://github.com/tensorflow/models.git

Step 8: Install some needed tools and dependencies.

!apt-get install protobuf-compiler python-pil python-lxml python-tk
!pip install Cython

Compile the model definition. Note all directories as it might differ from yours. For my case, research is in models inside the Desktop folder in My Drive.

%cd /content/gdrive/My Drive/Desktop/models/research/
!protoc object_detection/protos/*.proto --python_out=.

set the environment

import os
os.environ['PYTHONPATH'] += ':/content/gdrive/My Drive/Desktop/models/research/:/content/gdrive/My Drive/Desktop/models/research/slim'

Always run the codes below for every session restart

!python setup.py build
!python setup.py install

Note: if you wish to know the remaining hours you have for your colab session, run the copy and run the code below.

import time, psutil
Start = time.time()- psutil.boot_time()
Left= 12*3600 - Start
print('Time remaining for this session is: ', Left/3600)

Test with the code in the snippet below to see if all we need for the training has been installed.

%cd /content/gdrive/My Drive/Desktop/models/research/object_detection/builders/
!python model_builder_test.py

You should see something similar output to below

Running tests under Python 3.6.9: /usr/bin/python3 [ RUN ] ModelBuilderTest.test_create_experimental_model [ OK ] ModelBuilderTest.test_create_experimental_model [ RUN ] ModelBuilderTest.test_create_faster_rcnn_model_from_config_with_example_miner [ OK ] ModelBuilderTest.test_create_faster_rcnn_model_from_config_with_example_miner [ RUN ] ……ModelBuilderTest.test_unknown_meta_architecture [ RUN ] ModelBuilderTest.test_unknown_ssd_feature_extractor [ OK ] ModelBuilderTest.test_unknown_ssd_feature_extractor — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — Ran 17 tests in 0.180s OK (skipped=1)

As of this point you should have a folder in the object detection directory that contains your train and test images with a respective xml file of each image. We need to create a TensorFlow record file from the xml file we have. For this I will use some of Dat Tran’s code for conversion of XML_TO CSV and to generate TFRECORD doing a little correction to suit my need.

Step 9: Copy and paste the code below and run the cell to perform the xml_to_csv operation.

import os
import glob
import pandas as pd
import xml.etree.ElementTree as ET
def xml_to_csv(path):
xml_list = []
for xml_file in glob.glob(path + '/*.xml'):
tree = ET.parse(xml_file)
root = tree.getroot()
for member in root.findall('object'):
value = (root.find('filename').text,
int(root.find('size')[0].text),
int(root.find('size')[1].text),
member[0].text,
int(member[4][0].text),
int(member[4][1].text),
int(member[4][2].text),
int(member[4][3].text)
)
xml_list.append(value)
column_name = ['filename', 'width', 'height', 'class', 'xmin', 'ymin', 'xmax', 'ymax']
xml_df = pd.DataFrame(xml_list, columns=column_name)
return xml_df
def main(directory_list):
for Image_cat in directory_list:
image_path = os.path.join(os.getcwd(), 'images/{}'.format(Image_cat))
xml_df = xml_to_csv(image_path)
xml_df.to_csv('data/{}_labels.csv'.format(Image_cat), index=None)
print('Successfully converted xml to csv.')
main(['train','test'])

NOTE: Make sure you have folders named ‘training’, ‘data’ and ‘Images’ in object detection folder. Also, let the data-sets be in the folder called Images and in this Images folder, split the data-sets into two folders named train and test folders. We will save the CSV files in the data folder.

Step 10: Create a python file named generate_tfrecord.py then copy, edit the necessary parts, or simply paste the code below in the python file then upload it into the object detection directory if the same configs applies to you or download the generate_tfrecord.py file

NB: the “# TO-DO replace this with label map” section of the code below has information on the code usage for multiple labels.

"""
Usage:
# Create train data:
python generate_tfrecord.py --label=<LABEL> --csv_input=<PATH_TO_ANNOTATIONS_FOLDER>/train_labels.csv --output_path=<PATH_TO_ANNOTATIONS_FOLDER>/train.record
# Create test data:
python generate_tfrecord.py --label=<LABEL> --csv_input=<PATH_TO_ANNOTATIONS_FOLDER>/test_labels.csv --output_path=<PATH_TO_ANNOTATIONS_FOLDER>/test.record
"""
from __future__ import division
from __future__ import print_function
from __future__ import absolute_import
import os
import io
import pandas as pd
import tensorflow as tf
import sys
sys.path.append("../../models/research")
from PIL import Image
from object_detection.utils import dataset_util
from collections import namedtuple, OrderedDict
flags = tf.app.flags
flags.DEFINE_string('csv_input', '', 'Path to the CSV input')
flags.DEFINE_string('output_path', '', 'Path to output TFRecord')
flags.DEFINE_string('label', '', 'Name of class label')
# if your image has more labels input them as
# flags.DEFINE_string('label0', '', 'Name of class[0] label')
# flags.DEFINE_string('label1', '', 'Name of class[1] label')
# and so on.
flags.DEFINE_string('img_path', '', 'Path to images')
FLAGS = flags.FLAGS
# TO-DO replace this with label map
# for multiple labels add more else if statements
def class_text_to_int(row_label):
if row_label == FLAGS.label: # 'ship':
return 1
# comment upper if statement and uncomment these statements for multiple labelling
# if row_label == FLAGS.label0:
# return 1
# elif row_label == FLAGS.label1:
# return 0
else:
None
def split(df, group):
data = namedtuple('data', ['filename', 'object'])
gb = df.groupby(group)
return [data(filename, gb.get_group(x)) for filename, x in zip(gb.groups.keys(), gb.groups)]
def create_tf_example(group, path):
with tf.gfile.GFile(os.path.join(path, '{}'.format(group.filename)), 'rb') as fid:
encoded_jpg = fid.read()
encoded_jpg_io = io.BytesIO(encoded_jpg)
image = Image.open(encoded_jpg_io)
width, height = image.size
filename = group.filename.encode('utf8')
image_format = b'jpg'
# check if the image format is matching with your images.
xmins = []
xmaxs = []
ymins = []
ymaxs = []
classes_text = []
classes = []
for index, row in group.object.iterrows():
xmins.append(row['xmin'] / width)
xmaxs.append(row['xmax'] / width)
ymins.append(row['ymin'] / height)
ymaxs.append(row['ymax'] / height)
classes_text.append(row['class'].encode('utf8'))
classes.append(class_text_to_int(row['class']))
tf_example = tf.train.Example(features=tf.train.Features(feature={
'image/height': dataset_util.int64_feature(height),
'image/width': dataset_util.int64_feature(width),
'image/filename': dataset_util.bytes_feature(filename),
'image/source_id': dataset_util.bytes_feature(filename),
'image/encoded': dataset_util.bytes_feature(encoded_jpg),
'image/format': dataset_util.bytes_feature(image_format),
'image/object/bbox/xmin': dataset_util.float_list_feature(xmins),
'image/object/bbox/xmax': dataset_util.float_list_feature(xmaxs),
'image/object/bbox/ymin': dataset_util.float_list_feature(ymins),
'image/object/bbox/ymax': dataset_util.float_list_feature(ymaxs),
'image/object/class/text': dataset_util.bytes_list_feature(classes_text),
'image/object/class/label': dataset_util.int64_list_feature(classes),
}))
return tf_example
def main(_):
writer = tf.python_io.TFRecordWriter(FLAGS.output_path)
path = os.path.join(os.getcwd(), FLAGS.img_path)
examples = pd.read_csv(FLAGS.csv_input)
grouped = split(examples, 'filename')
for group in grouped:
tf_example = create_tf_example(group, path)
writer.write(tf_example.SerializeToString())
writer.close()
output_path = os.path.join(os.getcwd(), FLAGS.output_path)
print('Successfully created the TFRecords: {}'.format(output_path))
if __name__ == '__main__':
tf.compat.v1.app.run()

In the object detection directory, run the codes below to generate the records.

# Create train data:
!python generate_tfrecord.py --label=<LABEL> --csv_input=<PATH_TO_ANNOTATIONS_FOLDER>/train_labels.csv
--img_path=<PATH_TO_IMAGES_FOLDER>/train --output_path=<PATH_TO_ANNOTATIONS_FOLDER>/train.record

# Create test data:
!python generate_tfrecord.py --label=<LABEL> --csv_input=<PATH_TO_ANNOTATIONS_FOLDER>/test_labels.csv
--img_path=<PATH_TO_IMAGES_FOLDER>/test
--output_path=<PATH_TO_ANNOTATIONS_FOLDER>/test.record
# For example
!python generate_tfrecord.py --label='ARDUINO DEVICE' --csv_input=data/train_labels.csv --output_path=data/train.record --img_path=images/train
!python generate_tfrecord.py --label='ARDUINO DEVICE' --csv_input=data/test_labels.csv --output_path=data/test.record --img_path=images/test

Step 11: Get the pre-trained Object detection model from TensorFlow with the code below

!wget http://download.tensorflow.org/models/object_detection/ssd_mobilenet_v1_coco_11_06_2017.tar.gz
!tar -xvf ssd_mobilenet_v1_coco_11_06_2017.tar.gz

Also, get the config file which you might need to edit. Please note the directories.

!wget https://github.com/tensorflow/models/blob/master/research/object_detection/samples/configs/ssd_mobilenet_v1_coco.config

Within the .config file, set the “PATH_TO_BE_CONFIGURED” assigning proper values to them. The batch size is 24 you can change this depending on what your memory can handle. But here we are using a Tesla GPU so, 24 is fine.

some lines of the config file code to edit depending on what you are doing:- num_classes: 1 # Set this to the number of different label classes
-
type: 'ssd_mobilenet_v1' # Set to the name of your chosen pre-trained model
-
batch_size: 12 # Increase/Decrease this value depending on the available memory (Higher values require more memory and vice-versa)
-
fine_tune_checkpoint: "pre-trained-model/model.ckpt" # Path to extracted files of pre-trained model
- input_path for training: "annotations/train.record" # Path to training TFRecord file
- input_path for evaluation: "annotations/test.record" # Path to test TFRecord file- label_map_path: "annotations/label_map.pbtxt" # Path to label map file

You should change the num_classes, num_examples, and label_map_path. The config file should look like below: you can copy and save the code below as the_name_you_want_call_it.config.

# SSD with Mobilenet v1 configuration for MSCOCO Dataset.
# Users should configure the fine_tune_checkpoint field in the train config as
# well as the label_map_path and input_path fields in the train_input_reader and
# eval_input_reader. Search for "PATH_TO_BE_CONFIGURED" to find the fields that
# should be configured.
model {
ssd {
num_classes:90 #number of classes to be trained. in my case 1
box_coder {
faster_rcnn_box_coder {
y_scale: 10.0
x_scale: 10.0
height_scale: 5.0
width_scale: 5.0
}
}
matcher {
argmax_matcher {
matched_threshold: 0.5
unmatched_threshold: 0.5
ignore_thresholds: false
negatives_lower_than_unmatched: true
force_match_for_each_row: true
}
}
similarity_calculator {
iou_similarity {
}
}
anchor_generator {
ssd_anchor_generator {
num_layers: 6
min_scale: 0.2
max_scale: 0.95
aspect_ratios: 1.0
aspect_ratios: 2.0
aspect_ratios: 0.5
aspect_ratios: 3.0
aspect_ratios: 0.3333
}
}
image_resizer {
fixed_shape_resizer {
height: 300
width: 300
}
}
box_predictor {
convolutional_box_predictor {
min_depth: 0
max_depth: 0
num_layers_before_predictor: 0
use_dropout: false
dropout_keep_probability: 0.8
kernel_size: 1
box_code_size: 4
apply_sigmoid_to_scores: false
conv_hyperparams {
activation: RELU_6,
regularizer {
l2_regularizer {
weight: 0.00004
}
}
initializer {
truncated_normal_initializer {
stddev: 0.03
mean: 0.0
}
}
batch_norm {
train: true,
scale: true,
center: true,
decay: 0.9997,
epsilon: 0.001,
}
}
}
}
feature_extractor {
type: 'ssd_mobilenet_v1'
min_depth: 16
depth_multiplier: 1.0
conv_hyperparams {
activation: RELU_6,
regularizer {
l2_regularizer {
weight: 0.00004
}
}
initializer {
truncated_normal_initializer {
stddev: 0.03
mean: 0.0
}
}
batch_norm {
train: true,
scale: true,
center: true,
decay: 0.9997,
epsilon: 0.001,
}
}
}
loss {
classification_loss {
weighted_sigmoid {
}
}
localization_loss {
weighted_smooth_l1 {
anchorwise_output: true
}
}
hard_example_miner {
num_hard_examples: 3000
iou_threshold: 0.99
loss_type: CLASSIFICATION
max_negatives_per_positive: 3
min_negatives_per_image: 0
}
classification_weight: 1.0
localization_weight: 1.0
}
normalize_loss_by_num_matches: true
post_processing {
batch_non_max_suppression {
score_threshold: 1e-8
iou_threshold: 0.6
max_detections_per_class: 100
max_total_detections: 100
}
score_converter: SIGMOID
}
}
}
train_config: {
batch_size: 24
optimizer {
rms_prop_optimizer: {
learning_rate: {
exponential_decay_learning_rate {
initial_learning_rate: 0.004
decay_steps: 800720
decay_factor: 0.95
}
}
momentum_optimizer_value: 0.9
decay: 0.9
epsilon: 1.0
}
}
fine_tune_checkpoint: "ssd_mobilenet_v1_coco_11_06_2017/model.ckpt"
from_detection_checkpoint: true
# Note: The below line limits the training process to 200K steps, which we
# empirically found to be sufficient enough to train the pets dataset. This
# effectively bypasses the learning rate schedule (the learning rate will
# never decay). Remove the below line to train indefinitely.
#num_steps: 200000
data_augmentation_options {
random_horizontal_flip {
}
}
data_augmentation_options {
ssd_random_crop {
}
}
}
train_input_reader: {
tf_record_input_reader {
input_path: "data/train.record"
}
label_map_path: "training/object-detection.pbtxt"
}
eval_config: {
# (Optional): Uncomment the line below if you installed the Coco evaluation tools
# and you want to also run evaluation
# metrics_set: "coco_detection_metrics"
# (Optional): Set this to the number of images in your <PATH_TO_IMAGES_FOLDER>/train
# if you want to also run evaluation
num_examples: 8000
# Note: The below line limits the evaluation process to 10 evaluations.
# Remove the below line to evaluate indefinitely.
max_evals: 10
}
eval_input_reader: {
tf_record_input_reader {
input_path: "data/test.record"
}
label_map_path: "training/object-detection.pbtxt"
shuffle: false
num_readers: 1

A label map file called object-detection.pbtxt must be created and saved in ‘training’ folder. The label map will look like below code

item {
id: 1
name: 'your Label'#the label from annotation in my case Arduino Device
}

''' You can create more items like
item {
id: 2
name: 'Raspberry Devices'
}
and so on
'''

Step 12: To background track your training checkpoints, run the code cell below. This is the latest way to get your Tensorboard running on colab. NB: you can change the log directory.

%load_ext tensorboard
%tensorboard --logdir training/

Step 13: The TRAINING PROPER!

Open your google drive and go to the Legacy folder in the object detection directory, copy or move the train.py file into the object detection folder. then go back to Colab and run the training with the code below.

!python train.py --logtostderr --train_dir=training/ --pipeline_config_path=training/ssd_mobilenet_v1_coco.config

In the absence of errors, you should see an output like:

INFO:tensorflow:global step 1: loss = 25.45 (5.327 sec/step)........
........
INFO:tensorflow:global step 1350: loss = 0.6345 (0.231 sec/step)
INFO:tensorflow:global step 1351: loss = 0.5220 (0.332 sec/step)
INFO:tensorflow:global step 1352: loss = 0.6718 (0.133 sec/step)
INFO:tensorflow:global step 1353: loss = 0.6758 (0.432 sec/step)
INFO:tensorflow:global step 1354: loss = 0.7454 (0.452 sec/step)
INFO:tensorflow:global step 1355: loss = 0.8354 (0.323 sec/step)

The first step has more loss compared to others. It might take some time to train. But, when your loss is less than 1 you can stop the training with CTRL + C. Note you might have to restart run-time before the next step can execute.

Export your inference graph.

Create a folder trained_inference _graph in the object detection folder then run the code below. Also, open the training folder and check for the model.ckpt-XXX file with the highest number, in my case I have model.ckpt-6602 to be the highest. Do the necessary edits to the code below then Run it.

!python export_inference_graph.py --input_type image_tensor --pipeline_config_path ssd_mobilenet_v1_coco.config --trained_checkpoint_prefix training/model.ckpt-6602 --output_directory trained_inference_graph/

Zip your inference

!zip -r Arduino_exp_graph.zip trained_inference_graph

Let's check what we just trained

MODEL_NAME = 'trained_inference_graph'PATH_TO_FROZEN_GRAPH = MODEL_NAME + '/frozen_inference_graph.pb'PATH_TO_LABELS = 'training/object-detection.pbtxt'NUM_CLASSES = 1 #remember number of objects you are training? cool.

then use the code below to test your model

import numpy as npimport osimport six.moves.urllib as urllibimport sysimport tarfileimport tensorflow as tfimport zipfilefrom distutils.version import StrictVersionfrom collections import defaultdictfrom io import StringIOfrom matplotlib import pyplot as pltfrom PIL import Image# This is needed since the notebook is stored in the object_detection folder.sys.path.append("..")from object_detection.utils import ops as utils_opsfrom object_detection.utils import label_map_utilfrom object_detection.utils import visualization_utils as vis_util### Model preparation variableMODEL_NAME = 'trained_inference_graph'PATH_TO_FROZEN_GRAPH = MODEL_NAME + '/frozen_inference_graph.pb'PATH_TO_LABELS = 'training/object-detection.pbtxt'NUM_CLASSES = 1 #remember number of objects you are training? cool.### Load a (frozen) Tensorflow model into memory.detection_graph = tf.Graph()with detection_graph.as_default():od_graph_def = tf.GraphDef()with tf.gfile.GFile(PATH_TO_FROZEN_GRAPH, 'rb') as fid:serialized_graph = fid.read()od_graph_def.ParseFromString(serialized_graph)tf.import_graph_def(od_graph_def, name='')###Loading label mapcategory_index = label_map_util.create_category_index_from_labelmap(PATH_TO_LABELS, use_display_name=True)### Load image into numpy functiondef load_image_into_numpy_array(image):(im_width, im_height) = image.sizereturn np.array(image.getdata()).reshape((im_height, im_width, 3)).astype(np.uint8)###STATING THE PATH TO IMAGES TO BE TESTEDPATH_TO_TEST_IMAGES_DIR = 'test_images/'TEST_IMAGE_PATHS = [ os.path.join(PATH_TO_TEST_IMAGES_DIR, 'image{}.jpg'.format(i)) for i in range(1, 4) ]IMAGE_SIZE = (12, 8)### Function to run inference on a single image which will later be used in an iterationdef run_inference_for_single_image(image, graph):with graph.as_default():with tf.Session() as sess:# Get handles to input and output tensorsops = tf.get_default_graph().get_operations()all_tensor_names = {output.name for op in ops for output in op.outputs}tensor_dict = {}for key in ['num_detections', 'detection_boxes', 'detection_scores','detection_classes', 'detection_masks']:tensor_name = key + ':0'if tensor_name in all_tensor_names:tensor_dict[key] = tf.get_default_graph().get_tensor_by_name(tensor_name)if 'detection_masks' in tensor_dict:# The following processing is only for single imagedetection_boxes = tf.squeeze(tensor_dict['detection_boxes'], [0])detection_masks = tf.squeeze(tensor_dict['detection_masks'], [0])# Reframe is required to translate mask from box coordinates to image coordinates and fit the image size.real_num_detection = tf.cast(tensor_dict['num_detections'][0], tf.int32)detection_boxes = tf.slice(detection_boxes, [0, 0], [real_num_detection, -1])detection_masks = tf.slice(detection_masks, [0, 0, 0], [real_num_detection, -1, -1])detection_masks_reframed = utils_ops.reframe_box_masks_to_image_masks(detection_masks, detection_boxes, image.shape[1], image.shape[2])detection_masks_reframed = tf.cast(tf.greater(detection_masks_reframed, 0.5), tf.uint8)# Follow the convention by adding back the batch dimensiontensor_dict['detection_masks'] = tf.expand_dims(detection_masks_reframed, 0)image_tensor = tf.get_default_graph().get_tensor_by_name('image_tensor:0')# Run inferenceoutput_dict = sess.run(tensor_dict,feed_dict={image_tensor: image})# all outputs are float32 numpy arrays, so convert types as appropriateoutput_dict['num_detections'] = int(output_dict['num_detections'][0])output_dict['detection_classes'] = output_dict['detection_classes'][0].astype(np.int64)output_dict['detection_boxes'] = output_dict['detection_boxes'][0]output_dict['detection_scores'] = output_dict['detection_scores'][0]if 'detection_masks' in output_dict:output_dict['detection_masks'] = output_dict['detection_masks'][0]return output_dict### To iterate on each image in the test image path defined### NB define the range of numbers and let it match the number of imAGES IN TEST FOLDER +1for image_path in TEST_IMAGE_PATHS:image = Image.open(image_path)# the array based representation of the image will be used later in order to prepare the# result image with boxes and labels on it.image_np = load_image_into_numpy_array(image)# Expand dimensions since the model expects images to have shape: [1, None, None, 3]image_np_expanded = np.expand_dims(image_np, axis=0)# Actual detection.output_dict = run_inference_for_single_image(image_np_expanded, detection_graph)# Visualization of the results of a detection.vis_util.visualize_boxes_and_labels_on_image_array(image_np,output_dict['detection_boxes'],output_dict['detection_classes'],output_dict['detection_scores'],category_index,instance_masks=output_dict.get('detection_masks'),use_normalized_coordinates=True,line_thickness=1)display(Image.fromarray(image_np))

Note: Copy some 9 images to folder named ‘test_images’ and rename them to image1.jpg, image2.jpg, …….. , image9.jpg then run the code cell above.

I hope you enjoyed the walkthrough — please comment and leave your feedback if you found it helpful or if you have any suggestions to make.

Your thoughts and feedback will encourage me. Smiles D:

--

--

Nathaniel O Solomon
Analytics Vidhya

I am basically an artificial intelligence enthusiast. Reach me on twitter:@electronath