How to install TensorFlow Serving, load a Saved TF Model and connect it to a REST API in Ubuntu 16.04

4 min readApr 9, 2018

(Update 2022: TensorFlow Lite is now available as an easy and quick alternative)

TensorFlow Serving is the best solution for serving a high performance TF ML model. Though clear instructions are hard to come by, when rightly done, it is blazing fast!

The installation

(Simplified from https://www.tensorflow.org/serving/setup)

Get the dependencies:

sudo apt-get update && sudo apt-get install -y \
 build-essential \
 curl \
 libcurl3-dev \
 git \
 libfreetype6-dev \
 libpng12-dev \
 libzmq3-dev \
 pkg-config \
 python-dev \
 python-numpy \
 python-pip \
 software-properties-common \
 swig \
 zip \
 zlib1g-dev

2. Ensure latest gcc lib is in place (TF serving would not run without it):

sudo apt-get install libstdc++6

The following may also be needed:

sudo add-apt-repository ppa:ubuntu-toolchain-r/test 
sudo apt-get update
sudo apt-get upgrade
sudo apt-get dist-upgrade

3. gRPC:


sudo pip install grpcio

If at this point you are getting an “unsupported locale setting” error, follow the instructions from this SO post:

export LC_ALL="en_US.UTF-8"
export LC_CTYPE="en_US.UTF-8"
sudo dpkg-reconfigure locales

4. TensorFlow


sudo pip install tensorflow

5. TensorFlow Serving:

sudo pip install tensorflow-serving-apiecho "deb [arch=amd64] http://storage.googleapis.com/tensorflow-serving-apt stable tensorflow-model-server tensorflow-model-server-universal" | sudo tee /etc/apt/sources.list.d/tensorflow-serving.listcurl https://storage.googleapis.com/tensorflow-serving-apt/tensorflow-serving.release.pub.gpg | sudo apt-key add -sudo apt-get update && sudo apt-get install tensorflow-model-server

That’s it.

To check that the server is installed by entering:

tensorflow_model_server

To actually run it, this needs to be followed by some arguments (more details below)

Serving a Saved TF Model

Getting a saved model has a few steps but they are well documented. As in https://www.tensorflow.org/programmers_guide/saved_model#using_savedmodel_with_estimators (for estimator models) and https://www.tensorflow.org/programmers_guide/saved_model#specify_the_outputs_of_a_custom_model (for other models)

However, would like to mention in passing that I used a premade estimator (DNNClassifier), and a very simple function (from a Cloud ML sample) worked well:

def json_serving_input_fn():
 inputs = {}
 for feat in my_feature_columns:
 inputs[feat.name] = tf.placeholder(shape=[None], dtype=feat.dtype) if feat.name!=’dow’ else tf.placeholder(shape=[None], dtype=tf.int32) # this was needed because the 'dow' feature column type (categorical with identity) didn't provide the dtype
 
 return tf.estimator.export.ServingInputReceiver(inputs, inputs)

followed by:

servable_model_dir = "...path to../serving_savemodel" servable_model_path = classifier.export_savedmodel(servable_model_dir, json_serving_input_fn)

To export the Saved Model.

Now, to finally start TF Serving with the model, use:

tensorflow_model_server --port=9000 --model_name={give it a name, and remember it, let's call it model-name for now!} --model_base_path=/..path../serving_savemodel/ &> tfs_log &#(from https://towardsdatascience.com/how-to-deploy-machine-learning-models-with-tensorflow-part-2-containerize-it-db0ad7ca35a7)

Check:

cat tfs_log

It should show “Running ModelServer at 0.0.0.0:9000” in the last line, and yes, we are ready to serve!

Figure out the inputs and outputs of your saved model

There is a beautiful tool “SavedModel CLI” that helps with that. And best of all, it is comes already installed with TF.

saved_model_cli show --dir /..path../saved_model_dir --all

Note the signature_def, inputs names, types and outputs. These are needed to fetch results from the model server.

Okay, now we are ready for some gRPC

The protocol for talking to TF Serving. (Don’t worry, it comes to just a few lines of python)

Create a python script and import the following:

import tensorflow as tf
from grpc.beta import implementations
from tensorflow_serving.apis import predict_pb2
from tensorflow_serving.apis import prediction_service_pb2
from tensorflow.python.framework import tensor_util

2. Create an “insecure” channel (but don’t feel insecure!):

channel = implementations.insecure_channel(“localhost”,9000)

You can use the IP address or URL of the TF Model Server, instead of “localhost” if its on another machine, ensuring that the firewalls are setup correctly for the selected port (9000 in this case)

3. And then, a “stub”:

stub = prediction_service_pb2.beta_create_PredictionService_stub(channel)

4. Finally, to send a request:

request = predict_pb2.PredictRequest()
 
request.model_spec.name = ‘<the model name used above when starting TF serving>’
request.model_spec.signature_name = ‘<get this from Saved Model CLI output>’#do this for each input - worked for me.  Get the input names from the Saved Model CLI.request.inputs['<input name>'].CopyFrom(tf.make_tensor_proto(<a list of values, one or more e.g., [1.0,2.0,3.0] corresponding to <input name>]>, dtype=<The appropriate dtype of the values e.g., tf.float32, again the Saved Model CLI output will help>))
#...#finally:result = stub.Predict(request, 60.0) # 60 is the timeout in seconds, but its blazing fast

Note: text enclosed in<..> above are all comments, to be changed appropriately!

You can then:

print(result)

To have a look — it’s of a rather strange type, “Tensor Proto”.

5. To get out a particular output as a numpy array, use:

tensor_util.MakeNdarray(result.outputs[“<output name>”])

Final notes: connecting this to a flask script for making a production REST API

I could not get the flask script to work with apache / mod_wsgi. It seems there is an incompatibility with mod_wsgi and gRPC. However, nginx with uwsgi worked fine, after adding a line: lazy_app=true in the uwsgi ini file. ( I was following these instructions: https://www.digitalocean.com/community/tutorials/how-to-serve-flask-applications-with-uwsgi-and-nginx-on-ubuntu-16-04, where the ini file is named myproject.ini).