Deploying Object Detection Model with TensorFlow Serving — Part 3

Published in

The Innovation Machine

5 min readDec 27, 2017

In the first part of this series, we discussed how to create a production-ready model in TensorFlow that is compatible with TensorFlow serving and in the second part we discussed how to create TF-serving environment using Docker. In this part, we will talk about creating a client that will request the model server running in the Docker container for inference on a test image.

Ideachain is the network of innovators and great ideas

Share your great ideas with the world and showcase your innovation mindset. Discover like-minded inventors and…

ideacha.in

A quick introduction to gRPC (Google Remote Procedure Call) and Protocol Buffers

gRPC (Google’s Remote Procedure Call) is google’s HTTP2 wrapped RPC protocol. What this allows is for a client running on a computer to access a remote computer, via a computer network, and call a “function” on that remote computer as if the function was local to the client. TensorFlow serving uses this protocol to serve models for inference. According to the official documentation,

In gRPC a client application can directly call methods on a server application on a different machine as if it was a local object, making it easier for you to create distributed applications and services.

Here the gRPC server is our docker container running the TensorFlow serving service and our client is in python that requests this service for inference. This article describes how RPC works in a very structured way.

gRPC uses Protocol Buffers to serialize structured data as well as define the parameters and return responses for the callable methods. It is language-neutral and platform-neutral. It has a structured language which then compiles the transport serialization code into your chosen language to be included in your project. It transmits data in a binary format which is smaller and faster as compared to good old JSON and XML.

Creating the client

TensorFlow serving request can be of three types:

Classification: Uses classification RPC API that accepts an input tensor (eg. image) and outputs a class and a score.
Prediction and Regression: Uses prediction RPC API that accepts an input tensor (eg. image) and outputs multiple tensors such as (for object detection) bounding_boxes, classes, scores, etc.

As the problem at hand here is a prediction problem, we will be using the prediction RPC API. For this we need the predict protobuf’s available on the TensorFlow serving github and convert them into our language specific code (i.e. python). You can do this yourself or go the easy way and download the python files from this github repo. We will use this object to create a prediction request in our client.

Template of a Prediction RPC client

Stub is a piece of code that is used to convert parameters during a remote procedure call (RPC). As the client and the server sit in different address spaces, it is required that the parameter sent from the from client to the server (and vice-versa) be converted so that the remote server computer perceives the RPC as a local function call. The stub used here is the code generated from the predict protobuf as described above.

Launching the TensorFlow serving service

As described in the previous part, our TensorFlow serving service will run in a docker container with ports open to the outside world. Assuming the docker image is available, the container can be started by,

$ docker run -it -d -P --name tf_serving_cpu -p 3000:3000 gauravkaila/tf_serving_cpu

Here the port 3000 is open to the world and the client can access the TensorFlow serving service via this port. Export the model directory created in the first part to a folder inside the container by,

$ docker cp /path/to/model tf_serving_cpu:/path/to/destination

To run the service, move into the container and start by,

# Move to the serving/ directory
$ cd serving/# Start the service
$ bazel-bin/tensorflow_serving/model_servers/tensorflow_model_server 
--port=3000 
--model_name=obj_det
--model_base_path=/path/to/dest &> obj_det &

Make sure the model_name flag has the same name as specified in the client. The output is logged in obj_det . If all went well, you will be able to see the following output when you type,

$ tail -f obj_det

tensorflow_serving/model_servers/main.cc:288] Running ModelServer at 0.0.0.0:3000 …

The model is being served and is ready to be used by our client.

Visualize bounding boxes on test images

The aim of an object detection model is to visualize the bounding boxes of the located objects on the image. In order to visualize the final image with the bounding boxes, we will use the visualization_utils.py file from the TensorFlow object detection API.

We can access the individual outputs from the result by,

boxes = result.outputs['detection_boxes'].float_val
classes = result.outputs['detection_classes'].float_val
scores = result.outputs['detection_scores'].float_val

This returns protobuf objects that can be fed into the visualization_utils.py file by,

image_vis = vis_util.visualize_boxes_and_labels_on_image_array(
    {input_image},
    np.reshape(boxes,[100,4]),
    np.squeeze(classes).astype(np.int32),
    np.squeeze(scores),
    category_index,
    use_normalized_coordinates=True,
    line_thickness=8)

The final client script will look like,

Final Output

Sending in a test image of a clock, our final output should look something like this. Note: the model used here is faster RCNN pre-trained on COCO dataset for which class number 85 corresponds to a clock.

outputs {
key: “detection_boxes”
value {
dtype: DT_FLOAT
tensor_shape {
dim {
size: 1
}
dim {
size: 300
}
dim {
size: 4
}
}
float_val: 0.24750074744224548
float_val: 0.17159424722194672
float_val: 0.9083144068717957
float_val: 0.797699511051178outputs {
key: "detection_classes"
value {
dtype: DT_FLOAT
tensor_shape {
dim {
size: 1
}
dim {
size: 300
}
}
float_val: 85.0outputs {
key: "detection_scores"
value {
dtype: DT_FLOAT
tensor_shape {
dim {
size: 1
}
dim {
size: 300
}
}
float_val: 0.9963208436965942

What have we achieved

We started off with an object detection use-case to demonstrate the power of TensorFlow serving. We exported our trained model to a format expected by TensorFlow serving, compiled TF-serving using Docker and created a client script that could request the model server for inference.

What does the future hold

Using this use-case as a template, we can use TensorFlow serving to serve other prediction and classification models.
We can leverage the GPU version of TensorFlow serving to attain faster inference.
We can scale our service by deploying multiple Docker containers running the TF-serving service.
Batching of input images can be done instead of sending one image per request.

About the author: Gaurav is a data science manager at EY’s Innovation Advisory in Dublin, Ireland. His interests include building scalable machine learning systems for computer vision applications. Find more at gauravkaila.com