Object Detection: Tensorflow and Google Cloud Platform

A quick note on running this object detection module/tutorial, after it caused me a lot of pain to setup and run, on windows 10 and the Google Cloud Platform (GCP).

The full tutorial and credit can be found and associated to the guys and girls here:

However, I have made some updates which might help with a couple of compatibility issues between GCP and tensorflow. Mainly to do with the setup.py file!

The aim of the tutorial mentioned is to get an object detection ML algorithm for pets up and running. To do this we use the tensorflow object detection model, in conjunction with an oxford visualisation pet image dataset. Hopefully you can walk into this as cold as I did and get a working version up and running a lot quicker.

Clone the object detection repository:

git clone https://github.com/tensorflow/models.git

Due to some recent changes to the object detection module, and GCP still being stuck with tensorflow 1.2, you will need to checkout the following version of the module:

git reset — hard a4944a57ad

You should now have a repo with a single models folder, and subsequent modules folders beneath(Ie. adversarial_crypto, adversarial_text, ….. object_detection….slim, etc.)

Install tensorflow

Amended steps below, full original installation steps here

Add two paths to PYTHONPATH system variable:

  1. location of the models folder — Eg. C://location_to_models_folder/models
  2. location of the models/slim folder — Eg. C://location_to_models_folder/models/slim

Install tensorflow

(I used an anaconda setup) — Full setup link here

Using anaconda or native python, run one of the following commands.

Install tensorflow command

#CPU version
pip install --ignore-installed --upgrade tensorflow
#GPU version -- need a NVIDIA graphics card with more than 2gb
pip install --ignore-installed --upgrade tensorflow-gpu

Install dependencies

pip install pillow
pip install lxml
pip install jupyter
pip install matplotlib
# or in one
pip install pillow lxml jupyter matplotlib

Protobuf compilation

#From models directory
protoc object_detection/protos/*.proto --python_out=.

Test installation

Run the following, there should be two errors when you run it using a python version greater than 3 — due to iteritems() function being used, which has been deprecated in python 3. Don’t worry about the two errors, these are expected.

python object_detection/builders/model_builder_test.py

Set up the object detection model

Download the datasets annotations and grounded truth from http://www.robots.ox.ac.uk/~vgg/data/pets/

Whilst these are downloading (being in Australia and still sucking ADSL, these took forever!). So in the meantime:

  1. Set up the GCP project and bucket (name it something memorable) — Sign up and it should create ‘your first project’, just enable storage.
  2. Enable Machine Learning (ML) Engine API’s — should be in the left hand nav bar of your project
  3. Install the Google Cloud SDK

Once the datasets have been downloaded, move them to sit at the same folder level as the object_detection folder.

Extract the dataset folders.

Now we need to convert to something tensorflow can read.

# From models directory
python object_detection/create_pet_tf_record.py \
--label_map_path=object_detection/data/pet_label_map.pbtxt \
--data_dir=`pwd` \
--output_dir=`pwd`
# From the source itself - Warnings are normal when you run the script, so don't worry

You should have two new files in the models directory, pet_train.record and pet_val.record.

You should have one new file in the /models/object_detection/data/ directory called pet_label_map.pbtxt

In your GCP storage bucket, create a folder called data and copy these files into it.

Download COCO-pretrained Model for Transfer Learning

Pre-trained model to assist in general object detection:

“ Training a state of the art object detector from scratch can take days, even when using multiple GPUs! In order to speed up training, we’ll take an object detector trained on a different dataset (COCO), and reuse some of it’s parameters to initialize our new model.”

  1. Download the following: COCO-Pretrained Faster R-CNN with Resnet — 101 model
  2. Extract
  3. Upload contents with a /model.ckpt.* pattern to the GCP storage bucket, under the data folder

Configuring the Object Detection Pipeline

  1. In the object_detection/samples/configs folder, open the faster_rcnn_resnet101_pets.config
  2. Replace all instances of PATH_TO_BE_CONFIGURED with gs://${YOUR_GCP_BUCKET_NAME}/data/
  3. Upload this file to the GCP bucket under the data folder

Checkpoint

You should have the following files in the GCP bucket, under the data folder

+ ${YOUR_GCS_BUCKET}/
+ data/
- faster_rcnn_resnet101_pets.config
- model.ckpt.index
- model.ckpt.meta
- model.ckpt.data-00000-of-00001
- pet_label_map.pbtxt
- pet_train.record
- pet_val.record

Modify the setup.py file

In the Models directory modify the setup.py file to reflect the following. This is to install some dependencies required by the object_detection module that aren’t prepackage modules on the GCP ML environments.

Mainly python-tk and Matplotlib.

"""Setup script for object_detection."""
import logging
import subprocess
from setuptools import find_packages
from setuptools import setup
from setuptools.command.install import install
class CustomCommands(install):
"""A setuptools Command class able to run arbitrary commands."""
def RunCustomCommand(self, command_list):
p = subprocess.Popen(
command_list,
stdin=subprocess.PIPE,
stdout=subprocess.PIPE,
stderr=subprocess.STDOUT)
# Can use communicate(input='y\n'.encode()) if the command run requires
# some confirmation.
stdout_data, _ = p.communicate()
logging.info('Log command output: %s', stdout_data)
if p.returncode != 0:
raise RuntimeError('Command %s failed: exit code: %s' %
(command_list, p.returncode))

def run(self):
self.RunCustomCommand(['apt-get', 'update'])
self.RunCustomCommand(
['apt-get', 'install', '-y', 'python-tk'])
    install.run(self)
REQUIRED_PACKAGES = ['Pillow>=1.0', 'Matplotlib>=2.1']
setup(
name='object_detection',
version='0.1',
install_requires=REQUIRED_PACKAGES,
include_package_data=True,
packages=[p for p in find_packages() if p.startswith('object_detection')],
description='Tensorflow Object Detection Library',
cmdclass={
'install': CustomCommands,
}
)

Modify cloud parameters settings

Edit the following file: object_detection/samples/cloud/cloud.yml

trainingInput:
runtimeVersion: "1.2"
scaleTier: CUSTOM
masterType: standard_gpu
workerCount: 5
workerType: standard_gpu
parameterServerCount: 3
parameterServerType: standard

Start training

Having modified all the required parameter files, we can now build and ship it off to Google ML for processing:

Build the packages:

# From models directory
python setup.py sdist
(cd slim && python setup.py sdist)

Run the training job

#Note: being in Aus, we can’t use ML from our local australia-southeast1 data center hence we are using us-central1.

Make sure to cd back into the models directory

# From models directory
gcloud ml-engine jobs submit training object_detection_${version_unique_ID} \
--job-dir=gs://${YOUR_GCS_BUCKET}/train \
--packages dist/object_detection-0.1.tar.gz,slim/dist/slim-0.1.tar.gz \
--module-name object_detection.train \
--region us-central1 \
--config object_detection/samples/cloud/cloud.yml \
-- \
--train_dir=gs://${YOUR_GCS_BUCKET}/train \
--pipeline_config_path=gs://${YOUR_GCS_BUCKET}/data/faster_rcnn_resnet101_pets.config

Run the eval job

The eval job can be run in conjunction with training. However you will most likely have to increase the amount of resources allocated to your ML platform to run them in tandem.

# From models directory 
gcloud ml-engine jobs submit training object_detection_EVAL_${version_unique_ID} \
--job-dir=gs://${YOUR_GCS_BUCKET}/train \
--packages dist/object_detection-0.1.tar.gz,slim/dist/slim-0.1.tar.gz \
--module-name object_detection.eval \
--region us-central1 \
--scale-tier BASIC_GPU \
-- \
--checkpoint_dir=gs://${YOUR_GCS_BUCKET}/train \
--eval_dir=gs://${YOUR_GCS_BUCKET}/eval \
--pipeline_config_path=gs://${YOUR_GCS_BUCKET}/data/faster_rcnn_resnet101_pets.config

Monitoring the jobs using tensorboard:

Open the Google cloud SDK shell, top right hand corner of the GCP.

SDK Shell, in browser console

Run the following:

tensorboard --logdir=gs://${GCP_BUCKET_NAME} --port=8080

Open the tensorboard dashboard using the preview feature in the console

Web preview — tensorboard

You should be able to check in on your training and eval models as they are being processed.

If this is the first time you are running the model, it might take awhile to populate, so be patient. Also, don’t forget to stop your training job!

Tensorboard performance monitoring

Conclusion

Anyways, hopefully some of this helps you run TF on a GCP trial via windows.

On another note, before you get disheartened by your results of the pets dataset make sure to read the paper.