Deploying Object Detection Model with TensorFlow Serving — Part 2

Gaurav Kaila
The Innovation Machine
5 min readDec 10, 2017

--

In Part 1 of this series, I wrote about how we can create a production-ready model in TensorFlow that is compatible with TensorFlow serving. In this part, we will see how can we create TF-serving environment using Docker.

About Docker

Docker is a software tool that lets you package software into standardised units for development, shipment and deployment. Docker container image is a lightweight, stand-alone, executable package of a piece of software that includes everything needed to run it: code, runtime, system tools, system libraries, settings.

In short, Docker lets us you isolate your application and its dependencies in a stand-alone package that can be used anywhere and anytime without having to worry about installing code and system dependencies. Our motivation for using Docker for TensorFlow serving is that it we can ship our container to run on the cloud and easily scale our service without having to install any dependencies again.

Official documentation of TensorFlow serving describes how to build it from source. It’s good but I (and a lot of the community) had problems compiling it in the Docker container. So we will go over the steps one-by-one here.

  1. Build the container using the official docker image

Assuming you have cloned the official TensorFlow serving repo as described in the last part, you can build the docker image by,

# Move to the directory of the docker files
cd ./serving/tensorflow_serving/tools/docker/
# Build the image (CPU)
docker build --pull -t $USER/tensorflow-serving-devel-cpu -f Dockerfile.devel .
or # Build the image (GPU)
docker build --pull -t $USER/tensorflow-serving-devel-gpu -f Dockerfile.devel-gpu .

Before starting the docker container, increase the memory (to 10–12 GBs) and CPUs (to 4–6) available to the container in the preferences section of the docker app. Building TensorFlow serving is a memory intensive process and the default parameters might not work. Once done, you can start the container by,

[FOR CPU]
docker run -it -p 9000:9000 $USER/tensorflow-serving-devel-cpu /bin/bash
or[FOR GPU]
docker run -it -p 9000:9000 $USER/tensorflow-serving-devel-gpu /bin/bash

In the container,

[FOR CPU]
# Clone the TensorFlow serving Github repo in the container
git clone --recurse-submodules https://github.com/tensorflow/serving
cd serving/tensorflow
# Configure TensorFlow
./configure
cd ..
# Build TensorFlow serving
bazel build -c opt --copt=-msse4.1 --copt=-msse4.2 tensorflow_serving/...
or [FOR GPU]
# TensorFlow serving Github repo is already present in the container # so do not need to clone again
# Configure TensorFlow with CUDA by accepting (-y) --
# with_CUDA_support flag
cd serving/tensorflow
./configure
# Build TensorFlow serving with CUDA
bazel build -c opt --copt=-msse4.1 --copt=-msse4.2 --copt=-mavx --copt=-mavx2 --copt=-mfma --copt=-O3 --copt=/usr/local/cuda tensorflow_serving/...

The build process can take up to 1 hour depending the host system and docker configuration. Once the build is finished without any errors, you can test if the model server is running by,

bazel-bin/tensorflow_serving/model_servers/tensorflow_model_server

The output should look something like,

Flags:--port=8500                       int32 port to listen on--enable_batching=false           bool enable batching--batching_parameters_file=""     string If non-empty, read an ascii BatchingParameters protobuf from the supplied file name and use the contained values instead of the defaults.--model_config_file=""            string If non-empty, read an ascii ModelServerConfig protobuf from the supplied file name, and serve the models in that file. This config file can be used to specify multiple models to serve and other advanced parameters including non-default version policy. (If used, --model_name, --model_base_path are ignored.)--model_name="default"            string name of model (ignored if --model_config_file flag is set--model_base_path=""              string path to export (ignored if --model_config_file flag is set, otherwise required)--file_system_poll_wait_seconds=1 int32 interval in seconds between each poll of the file system for new model version--tensorflow_session_parallelism=0 int64 Number of threads to use for running a Tensorflow session. Auto-configured by default.Note that this option is ignored if --platform_config_file is non-empty.--platform_config_file=""         string If non-empty, read an ascii PlatformConfigMap protobuf from the supplied file name, and use that platform config instead of the Tensorflow platform. (If used, --enable_batching is ignored.)

Your serving environment is now ready to be used. Exit the container and commit the changes in the container to an image. You can do this by,

  • Pressing [Cltr-p] + [Cltr-q] to exit the container
  • Find the container Id,
# Find the container Id
docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
  • Commit the changes,
# Commit the changes
[FOR CPU]
docker commit ${CONTAINER ID} $USER/tensorflow-serving-devel-cpu
or [FOR GPU]
docker commit ${CONTAINER ID} $USER/tensorflow-serving-devel-gpu
  • Re-enter the container,
docker exec -it ${CONTAINER ID} /bin/bash

Note: For TensorFlow serving container to access the GPUs on your host system, you need to install nvidia-docker on your system and run the container by,

nvidia-docker docker run -it -p 9000:9000 $USER/tensorflow-serving-devel-gpu /bin/bash

You can then check your GPU usage inside the container by using the nvidia-smi cmd.

Pre-built Docker images

As seen on a number of Github issues (see resources) that people are unable to compile TensorFlow serving on docker, I have pre-built Docker images for both CPU and GPU support.

You can find them at my Docker Hub page or pull the images down by,

[FOR CPU]
docker pull gauravkaila/tf_serving_cpu
or [FOR GPU]
docker pull gauravkaila/tf_serving_gpu

In the next part, I will describe how/where to store our model created in part 1 and create a client that can request the TensorFlow serving service created in this part. At the end of the next part, we will be able to run inference on a test image using the model being served on the docker container.

Resources

Github issues:

About the author: Gaurav is a data science manager at EY’s Innovation Advisory in Dublin, Ireland. His interests include building scalable machine learning systems for computer vision applications. Find more at gauravkaila.com

--

--

Gaurav Kaila
The Innovation Machine

Data Science Manager @EY and Chief Data Scientist @IdeaChain; A hub for ideas, discussion and collaboration -http://ideacha.in