Deploy a Trained RNN/LSTM Model with TensorFlow-Serving and Flask, Part 1: Introduction and Installations

Zhanwen Chen
Repro Repo
Published in
4 min readOct 19, 2018


(Update: Part 2 is out!)

After following a few popular tutorials such as Stian Lind Petlund’s TensorFlow-Serving 101 (Parts 1 and 2), and Vitaly Bezgachev’s How to deploy Machine Learning models with TensorFlow (Parts 1 and 2), I found their explanations unclear and their configurations overly complicated (or not officially supported). However, there are also a few topics that they covered vey well. This speaks to the difficulty of pedagogy. Here, I try to be as explicit as possible by avoiding complications such as Docker and Protobuf and using official practices such as exporting models with SavedModelBuilder.

0. Introduction to TensorFlow-Serving

First, let me attempt to explain the TensorFlow-Serving ecosystem from the bottom up.

Artistic rendition of the crime scene that is the TensorFlow-Serving ecosystem. ©Zhanwen Chen 2018

Assuming you have a trained model, we begin with:

TensorFlow-Serving/ModelServer. TensorFlow-Serving, aka ModelServer, is a software separate from TensorFlow but requires it as a peer dependency at the same version (if you are familiar with React, this relationship is much like that between React and ReactDOM). You are required to have TensorFlow-Serving/ModelServer whether you decide to use JavaScript, Python, or Go to write your client. The ModelServer exposes Protobuf objects (think of them as C++ compiled binaries) for use by any language, which brings us to:

TensorFlow-Serving Client API (Language-Specific). In order for you to choose any language to deploy your prediction service, there needs to be a bridge between the above-mentioned Protobuf objects and your language in the form of a Protobuf-to-language API. You don’t technically need this API, but then you’d have to develop low-level code to manipulate Protobuf objects (I won’t cover that here). In this tutorial, we will use Python for our client and use the Protobuf-to-Python API called tensorflow-serving-api. In addition, because your Python client requires a standard gRPC server, you need to also install the gRPC API for Python, grpcio. (Note: At the time of writing, there seems to be only one API which is for Python. For other languages, you may have to compile Protobuf objects yourself, after all)

Your TensorFlow-Serving (Python) Client. This is a low-level client featuring fairly standardized code. No matter what your model looks like or even which language you use, the client should start a standard gRPC server (using grpcio), make a tensorflow-serving-api prediction service, and expose a way to make a grpc/tensorflow-serving-api request out of raw input. The only difference for you is how you preprocessed/fed your model back when you trained it. In other words, you need to repeat your pre-processing and model building steps while constructing a request object.

Your TensorFlow-Serving (Python) Client Wrapper/App. We avoid Docker and build a high-level server app (with Python’s web development framework, Flask) to explicitly illustrate how to use the above low-level Python prediction client you wrote. In this web app, you start a gRPC server which reads the trained model from the ModelServer port (say port 9000). You then wrap some standard web I/O logic around the simple prediction method that you defined in the low-level client. That’s it!

1. Export Trained Model with SavedModelBuilder

Although I assumed that we already had a trained model, chances are the model we have is not in the correct format — we probably saved a “checkpoint.” Officially, TensorFlow-Serving wants us to use the SavedModelBuilder API to export and serve our models. To convert a checkpoint to a SavedModel, use the following code.

The snippet above loads your last checkpoint from an assumed “checkpoints/dev.” Then it loads that checkpoint and initializes a SavedModelBuilder object with the target path “models/0” (you could name it “models/1” or “lolz/42”, but the immediate parent folder name must be an integer. Otherwise you’ll get a weird bug later on). I then use the add_meta_graph_and_variables method to include the loaded model and name it tf.saved_model.tag_constants.TRAINING as well as tf.saved_model.tag_constants.SERVING (you might want only SERVING in production). Later on, you can access the model by either of those two names (they are really strings). You now have a legal trained model!

2. Install TensorFlow-Serving (of the Same Version as Your TensorFlow’s)

If you are building TensorFlow-Serving from source, you must checkout a version-specific git branch such as r1.10. For this tutorial, I will simply things by installing with package packages. You may choose any version (I used 1.10), as long as the three packages share the same version.

(Warning: if TensorFlow, TensorFlow-Serving, and TensorFlow-Serving Client API are not of the same exact version, you may encounter mystery bugs that have few search results on Google)

First, add this custom Google repository to your Ubuntu list of repos.

echo "deb [arch=amd64] stable tensorflow-model-server tensorflow-model-server-universal" | sudo tee /etc/apt/sources.list.d/tensorflow-serving.list && \
curl | sudo apt-key add -

sudo apt-get install tensorflow-model-server=1.10 # or your version

3. Install TensorFlow Client (Python) API 1.10 with pip

First, you must install the gRPC Python API for use by your high-level app

pip install grpcio

Then install the TensorFlow-Serving Client API for Python

pip install tensorflow-serving-api==1.10 # or your version

You now have all the tools to serve a model! In the next part, I will show you exactly how to use the tools you’ve installed.



Zhanwen Chen
Repro Repo

A PhD student interested in learning from data.