Operationalizing TensorFlow Object Detection on Azure — Part 1: Using Docker and Deep Learning VMs

Sertaç Özercan
7 min readNov 20, 2017

--

In this series of blog posts, we are going to be learning about operationalizing TensorFlow Object Detection API on Microsoft Azure.

This part, Part 1, will cover TensorFlow Object Detection API and how to setup our training and evaluation workflow using Docker containers and virtual machines.

Part 2 will cover how to train and scale using Kubernetes and distributed TensorFlow.

Finally, Part 3 will cover how we can serve our trained model using TensorFlow Serving as a web service, and we will be deploying a simple client to get results from our service.

You can find the project repository at https://github.com/sozercan/tensorflow-object-detection/

TensorFlow Object Detection API

Recently, Google released TensorFlow Object Detection API which is an open-source framework on top of TensorFlow, that makes it very easy to build, train and deploy models for object detection.

In this guide, we will be learning how to use TensorFlow Object Detection API to build and train our model in a single virtual machine and then using distributed TensorFlow to train using a Kubernetes cluster.

Using Docker and Deep Learning VMs

In this part of the tutorial, we are going to be using Deep Learning VMs in Microsoft Azure to train, evaluate and export but steps should work in any system with an NVIDIA GPU, and dockerandnvidia-docker installed.

This is one of the reasons we are using Azure Deep Learning VMs since it makes it straightforward to use the GPU instances, and comes with preinstalled NVIDIA drivers and nvidia-docker to make setup much easier.

If you are interested in learning more about Azure Deep Learning VMs, please check out:

To find out which Azure regions includes GPUs, please check out:

Let’s start by creating a VM:

NAME=[name of your vm]RESOURCE_GROUP=[name of your resource groupSSHKEY=[path to your public key]LOCATION=[region of your choice. make sure that GPUs are supported in that region, eg. southcentralus]az group create -n $RESOURCE_GROUP -l $LOCATIONaz vm create --name $NAME --resource-group $RESOURCE_GROUP --image microsoft-ads:linux-data-science-vm-ubuntu:linuxdsvmubuntu:latest --size Standard_NC6 --ssh-key-value $SSHKEY --admin-username $USER --public-ip-address-dns-name $NAME

Opening ports in the network security group (NSG) for TensorBoard and Jupyter notebook:

az network nsg rule create --resource-group $RESOURCE_GROUP --name Port_6006 --nsg-name ${NAME}NSG --priority 100 --destination-port-ranges 6006az network nsg rule create --resource-group $RESOURCE_GROUP --name Port_8888 --nsg-name ${NAME}NSG --priority 200 --destination-port-ranges 8888

After this is finished deploying, let’s ssh into our newly created VM:

ssh $USER@$NAME.$LOCATION.cloudapp.azure.com

and then clone our project repo:

git clone github.com/sozercan/tensorflow-object-detection

Step 1 — Creating Dockerfile

First, let’s build our Docker container:

cd tensorflow-object-detectionnvidia-docker build -f tensorflow-object-detection/docker/Dockerfile -t $USER/tensorflow-object-detection .

Dockerfile

Here is the Dockerfile:

Note that this uses a clone of tensorflow/models for train_eval.py, so it can do train and evaluation at the same time.You can also pull the image from sozercan/tensorflow-object-detection. Tags are gpu and cpu , for GPU and CPU support.

Step 2 — Downloading pre-trained model

First, we’ll download a pre-trained model to speed up the training process. Let’s download our pre-trained model which is going to be the COCO pre-trained Resnet-101 model.

sudo mkdir -p /data/tensorflowsudo chown -R $USER /data/tensorflow/wget http://storage.googleapis.com/download.tensorflow.org/models/object_detection/faster_rcnn_resnet101_coco_11_06_2017.tar.gz -O /data/tensorflow/faster_rcnn_resnet101_coco_11_06_2017.tar.gz

Once it is done downloading, let’s unzip it using:

tar -xvf /data/tensorflow/faster_rcnn_resnet101_coco_11_06_2017.tar.gz

Step 3a — Downloading model

We will be using Pascal VOC dataset for our dataset. Dataset includes images, their bounding boxes and classifications. You can access the raw data set at The PASCAL Visual Object Classes Homepage

Downloading and extracting:

wget http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtrainval_06-Nov-2007.tar -O /data/tensorflow/VOCtrainval_06-Nov-2007.tartar -xvf /data/tensorflow/VOCtrainval_06-Nov-2007.tar

Converting dataset to TF Record

We created our container with TensorFlow and TensorFlow Object Detection installed in the first step, let’s jump inside and start configuring.

PATH_TO_YOUR_VOC_DATASET=[path to where you saved and extracted the files above. eg, /data/tensorflow]nvidia-docker run -it -d -p 0.0.0.0:6006:6006 -p 0.0.0.0:8888:8888 -v ${PATH_TO_YOUR_VOC_DATASET}:/data/ $USER/tensorflow-object-detection

After the last command, it will print out your container id. After getting the container id,

nvidia-docker exec -it [YOUR_CONTAINER_ID] bash

to run shell inside the container.

TensorFlow Object Detection wants our data to be in TFRecords format so we will have to convert using create_tf_record.py utility.

As arguments, we are providing

  • label_map_path, which contains the file for the labels of the objects we are tracking,
  • data_dir which is where we downloaded the VOC dataset
  • year is for which year of the dataset we are using (in this case, it is 2007 so it'll use theVOC2007sub directory)
  • set is whether we are training (train), evaluating (val), or both (trainval) or testing (test)
  • output_path is where the resulting .record file will be saved

Let’s export our training and validation set:

PATH_TO_LABEL_MAP_DIR=/tensorflow/models/research/object_detection/data/pascal_label_map.pbtxtPATH_TO_TFRECORD_OUTPUT=/data/VOCdevkitpython object_detection/create_pascal_tf_record.py \
--label_map_path=${PATH_TO_LABEL_MAP_DIR} \
--data_dir=${PATH_TO_TFRECORD_OUTPUT} --year=VOC2007 \
--set=trainval \
--output_path=/data/pascal_trainval.record

Step 3b — Make your own dataset

Instead of Pascal VOC dataset, you can also bring your own images and construct your own dataset. To do this, we’ll have to tag and label images, export to TensorFlow format from VoTT and finally convert to TF Records format.

One of the utilities for labelling I would recommend is Visual Object Tagging Tool (VOTT). Using VOTT, we can easily tag and label images and videos. You can download VoTT for Windows and macOS from here.

After you download it, you can open the image folder and start labeling.

labelling an image

Once you are done labeling, export it as Tensorflow format.

export to TensorFlow Pascal VOC format

Just like the step above, we will have to convert it to TFRecords format for Tensorflow Object Detection.

This time we will have to use a more generic way to convert since exported dataset is structured a little different. Process to convert is same as above (step 3a), but instead ofcreate_pascal_tf_record.py, you have to use a more generic exporter. You can find generic_create_pascal_tf_record.py in the project repo to convert your own dataset exported with VoTT.

Step 4 — Configuring our environment

Next step is to tweak any parameters and set up input and label paths. You can download an example at my repo (faster_rcnn_resnet101_voc07.config)

At the minimum, make sure to configure PATH_TO_BE_CONFIGURED with your relevant paths for fine_tune_checkpoint, input_path and label_map_path for train and evaluation.

If you are following the guide as is:

  • fine_tune_checkpoint should be /data/tensorflow/faster_rcnn_resnet101_coco_11_06_2017/model.ckpt
  • input_path should be /data/tensorflow/pascal_trainval.record for both train_input_reader and eval_input_reader sections
  • label_map_path should be /tensorflow/models/research/object_detection/data/pascal_label_map.pbtxt

You can also configure other options as such asdata_augmentation_options where it will augment your training, like random horizontal flip or rotation. You can all available options at models/preprocessor.proto

Step 5 — Train and Evaluation

In this step, we will be starting our training and evaluation. This process will take a while. Sample configuration at faster_rcnn_resnet101_voc07.config will train the model to 1000 steps but will stop after 200 steps for evaluation and then will continue until 1000 steps.

PATH_TO_YOUR_PIPELINE_CONFIG=/data/faster_rcnn_resnet101_voc07.config
PATH_TO_TRAIN_DIR=/data/train
PATH_TO_EVAL_DIR=/data/eval
# from /tensorflow/models/researchpython object_detection/train_eval.py \
--logtostderr \
--pipeline_config_path=${PATH_TO_YOUR_PIPELINE_CONFIG} \
--train_dir=${PATH_TO_TRAIN_DIR} \
--eval_dir=${PATH_TO_EVAL_DIR}

Run Tensorboard

Either during (in a new tab or window) or after the above process, we can check progress against testing and evaluation using TensorBoard.

Run the following anywhere inside our container:

tensorboard --logdir=/data/

Run Jupyter Notebook

We can also run the sample Jupyter notebook to check if everything is working correctly.

Run the following anywhere inside our container:

jupyter notebook --allow-root

Export

To prepare to serve our model later, let’s export our inference graph as a frozen model.

Run the following inside our container:

# from tensorflow/models/research/python object_detection/export_inference_graph.py \
--input_type encoded_image_string_tensor \
--pipeline_config_path ${PATH_TO_YOUR_PIPELINE_CONFIG} \
--trained_checkpoint_prefix /data/train/model.ckpt-##### \
--output_directory /data/export

Conclusion

Even though Docker containers made the training process much simpler, this was pretty manual work and we only used 1 GPU inside 1 VM so our training can be much faster and effective.

In part 2, we are not only going to look into automating this using Kubernetes, we are going to learn how can we train and scale with distributed TensorFlow.

If you have any questions or comments, please leave a comment below or reach out to me on Twitter @sozercan

--

--