Object Detection: Tensorflow and Google Cloud Platform
A quick note on running this object detection module/tutorial, after it caused me a lot of pain to setup and run, on windows 10 and the Google Cloud Platform (GCP).
The full tutorial and credit can be found and associated to the guys and girls here:
However, I have made some updates which might help with a couple of compatibility issues between GCP and tensorflow. Mainly to do with the setup.py file!
The aim of the tutorial mentioned is to get an object detection ML algorithm for pets up and running. To do this we use the tensorflow object detection model, in conjunction with an oxford visualisation pet image dataset. Hopefully you can walk into this as cold as I did and get a working version up and running a lot quicker.
Clone the object detection repository:
git clone https://github.com/tensorflow/models.git
Due to some recent changes to the object detection module, and GCP still being stuck with tensorflow 1.2, you will need to checkout the following version of the module:
git reset — hard a4944a57ad
You should now have a repo with a single models folder, and subsequent modules folders beneath(Ie. adversarial_crypto, adversarial_text, ….. object_detection….slim, etc.)
Install tensorflow
Amended steps below, full original installation steps here
Add two paths to PYTHONPATH system variable:
- location of the models folder — Eg. C://location_to_models_folder/models
- location of the models/slim folder — Eg. C://location_to_models_folder/models/slim
Install tensorflow
(I used an anaconda setup) — Full setup link here
Using anaconda or native python, run one of the following commands.
Install tensorflow command
#CPU version
pip install --ignore-installed --upgrade tensorflow#GPU version -- need a NVIDIA graphics card with more than 2gb
pip install --ignore-installed --upgrade tensorflow-gpu
Install dependencies
pip install pillow
pip install lxml
pip install jupyter
pip install matplotlib# or in onepip install pillow lxml jupyter matplotlib
Protobuf compilation
#From models directory
protoc object_detection/protos/*.proto --python_out=.
Test installation
Run the following, there should be two errors when you run it using a python version greater than 3 — due to iteritems() function being used, which has been deprecated in python 3. Don’t worry about the two errors, these are expected.
python object_detection/builders/model_builder_test.py
Set up the object detection model
Download the datasets annotations and grounded truth from http://www.robots.ox.ac.uk/~vgg/data/pets/
Whilst these are downloading (being in Australia and still sucking ADSL, these took forever!). So in the meantime:
- Set up the GCP project and bucket (name it something memorable) — Sign up and it should create ‘your first project’, just enable storage.
- Enable Machine Learning (ML) Engine API’s — should be in the left hand nav bar of your project
- Install the Google Cloud SDK
Once the datasets have been downloaded, move them to sit at the same folder level as the object_detection folder.
Extract the dataset folders.
Now we need to convert to something tensorflow can read.
# From models directory
python object_detection/create_pet_tf_record.py \
--label_map_path=object_detection/data/pet_label_map.pbtxt \
--data_dir=`pwd` \
--output_dir=`pwd`# From the source itself - Warnings are normal when you run the script, so don't worry
You should have two new files in the models directory, pet_train.record and pet_val.record.
You should have one new file in the /models/object_detection/data/ directory called pet_label_map.pbtxt
In your GCP storage bucket, create a folder called data and copy these files into it.
Download COCO-pretrained Model for Transfer Learning
Pre-trained model to assist in general object detection:
“ Training a state of the art object detector from scratch can take days, even when using multiple GPUs! In order to speed up training, we’ll take an object detector trained on a different dataset (COCO), and reuse some of it’s parameters to initialize our new model.”
- Download the following: COCO-Pretrained Faster R-CNN with Resnet — 101 model
- Extract
- Upload contents with a /model.ckpt.* pattern to the GCP storage bucket, under the data folder
Configuring the Object Detection Pipeline
- In the object_detection/samples/configs folder, open the faster_rcnn_resnet101_pets.config
- Replace all instances of PATH_TO_BE_CONFIGURED with gs://${YOUR_GCP_BUCKET_NAME}/data/
- Upload this file to the GCP bucket under the data folder
Checkpoint
You should have the following files in the GCP bucket, under the data folder
+ ${YOUR_GCS_BUCKET}/
+ data/
- faster_rcnn_resnet101_pets.config
- model.ckpt.index
- model.ckpt.meta
- model.ckpt.data-00000-of-00001
- pet_label_map.pbtxt
- pet_train.record
- pet_val.record
Modify the setup.py file
In the Models directory modify the setup.py file to reflect the following. This is to install some dependencies required by the object_detection module that aren’t prepackage modules on the GCP ML environments.
Mainly python-tk and Matplotlib.
"""Setup script for object_detection."""
import logging
import subprocess
from setuptools import find_packages
from setuptools import setup
from setuptools.command.install import installclass CustomCommands(install):
"""A setuptools Command class able to run arbitrary commands."""def RunCustomCommand(self, command_list):
p = subprocess.Popen(
command_list,
stdin=subprocess.PIPE,
stdout=subprocess.PIPE,
stderr=subprocess.STDOUT)
# Can use communicate(input='y\n'.encode()) if the command run requires
# some confirmation.
stdout_data, _ = p.communicate()
logging.info('Log command output: %s', stdout_data)
if p.returncode != 0:
raise RuntimeError('Command %s failed: exit code: %s' %
(command_list, p.returncode))
def run(self):
self.RunCustomCommand(['apt-get', 'update'])
self.RunCustomCommand(
['apt-get', 'install', '-y', 'python-tk']) install.run(self)REQUIRED_PACKAGES = ['Pillow>=1.0', 'Matplotlib>=2.1']setup(
name='object_detection',
version='0.1',
install_requires=REQUIRED_PACKAGES,
include_package_data=True,
packages=[p for p in find_packages() if p.startswith('object_detection')],
description='Tensorflow Object Detection Library',
cmdclass={
'install': CustomCommands,
}
)
Modify cloud parameters settings
Edit the following file: object_detection/samples/cloud/cloud.yml
trainingInput:
runtimeVersion: "1.2"
scaleTier: CUSTOM
masterType: standard_gpu
workerCount: 5
workerType: standard_gpu
parameterServerCount: 3
parameterServerType: standard
Start training
Having modified all the required parameter files, we can now build and ship it off to Google ML for processing:
Build the packages:
# From models directory
python setup.py sdist
(cd slim && python setup.py sdist)
Run the training job
#Note: being in Aus, we can’t use ML from our local australia-southeast1 data center hence we are using us-central1.
Make sure to cd back into the models directory
# From models directory
gcloud ml-engine jobs submit training object_detection_${version_unique_ID} \
--job-dir=gs://${YOUR_GCS_BUCKET}/train \
--packages dist/object_detection-0.1.tar.gz,slim/dist/slim-0.1.tar.gz \
--module-name object_detection.train \
--region us-central1 \
--config object_detection/samples/cloud/cloud.yml \
-- \
--train_dir=gs://${YOUR_GCS_BUCKET}/train \
--pipeline_config_path=gs://${YOUR_GCS_BUCKET}/data/faster_rcnn_resnet101_pets.config
Run the eval job
The eval job can be run in conjunction with training. However you will most likely have to increase the amount of resources allocated to your ML platform to run them in tandem.
# From models directory
gcloud ml-engine jobs submit training object_detection_EVAL_${version_unique_ID} \
--job-dir=gs://${YOUR_GCS_BUCKET}/train \
--packages dist/object_detection-0.1.tar.gz,slim/dist/slim-0.1.tar.gz \
--module-name object_detection.eval \
--region us-central1 \
--scale-tier BASIC_GPU \
-- \
--checkpoint_dir=gs://${YOUR_GCS_BUCKET}/train \
--eval_dir=gs://${YOUR_GCS_BUCKET}/eval \
--pipeline_config_path=gs://${YOUR_GCS_BUCKET}/data/faster_rcnn_resnet101_pets.config
Monitoring the jobs using tensorboard:
Open the Google cloud SDK shell, top right hand corner of the GCP.
Run the following:
tensorboard --logdir=gs://${GCP_BUCKET_NAME} --port=8080
Open the tensorboard dashboard using the preview feature in the console
You should be able to check in on your training and eval models as they are being processed.
If this is the first time you are running the model, it might take awhile to populate, so be patient. Also, don’t forget to stop your training job!
Conclusion
Anyways, hopefully some of this helps you run TF on a GCP trial via windows.
On another note, before you get disheartened by your results of the pets dataset make sure to read the paper.