Deploy a Trained RNN/LSTM Model with TensorFlow-Serving and Flask, Part 2: Export and Serve Model with ModelSaver

Zhanwen Chen
Repro Repo
Published in
5 min readJan 9, 2019

In part 1, we explained the TensorFlow-Serving architecture, showed how to export a model the official way using SavedModelBuilder, and installed all the tools to serve that model. Here we will use some command-line magic to put those tools to work.

4. Install the “Screen” Utility

Assuming you’ll ssh into this server, you will soon find out that you need to initiate some permanent processes that use up whole screens. I don’t enjoy typing the ssh command repeatedly so I’ll use the terminal manager “screen.”

On your new server, get screen by

sudo apt-get update
sudo apt-get install screen

5. Copy over your model

Under your home directory (assuming you are not root. Please don’t be root for the love of web security), create a folder for you project, say “translation_project.” Move your trained model there, so that its path is “~/translation_project/models/0.” Now cd into the project folder and start the awesome process:

6. Start the ModelServer

Let’s use screen to create a fake terminal just for the ModelServer process. call it “model_server:”

screen -S model_server

You’ll now see an empty terminal. When you do an ls, you should see your models folder. To start the model_server, type the following command:

tensorflow_model_server --port=9000 \--model_name=0 --model_base_path=$(pwd)/models

IMPORTANT: Like I said in part 1, you must give your model file an integer name such as “0” or “1”. They are treated as version numbers.

Upon success, you should see something like this:

2018-11-22 17:08:07.986390: I tensorflow_serving/model_servers/main.cc:157] Building single TensorFlow model file config:  model_name: 0 model_base_path: /home/username_lol/project_name_lol/models2018-11-22 17:08:07.986563: I tensorflow_serving/model_servers/server_core.cc:462] Adding/updating models.2018-11-22 17:08:07.986589: I tensorflow_serving/model_servers/server_core.cc:517]  (Re-)adding model: 02018-11-22 17:08:08.087134: I tensorflow_serving/core/basic_manager.cc:739] Successfully reserved resources to load servable {name: 0 version: 0}2018-11-22 17:08:08.087180: I tensorflow_serving/core/loader_harness.cc:66] Approving load for servable version {name: 0 version: 0}2018-11-22 17:08:08.087198: I tensorflow_serving/core/loader_harness.cc:74] Loading servable version {name: 0 version: 0}2018-11-22 17:08:08.087222: I external/org_tensorflow/tensorflow/contrib/session_bundle/bundle_shim.cc:360] Attempting to load native SavedModelBundle in bundle-shim from: /home/username_lol/project_name_lol/models/02018-11-22 17:08:08.087238: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:31] Reading SavedModel from: /home/username_lol/project_name_lol/models/02018-11-22 17:08:08.094848: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:54] Reading meta graph with tags { serve }2018-11-22 17:08:08.103802: I external/org_tensorflow/tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA2018-11-22 17:08:08.140437: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:113] Restoring SavedModel bundle.2018-11-22 17:08:08.166257: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:148] Running LegacyInitOp on SavedModel bundle.2018-11-22 17:08:08.166315: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:233] SavedModel load for tags { serve }; Status: success. Took 79069 microseconds.2018-11-22 17:08:08.166350: I tensorflow_serving/servables/tensorflow/saved_model_warmup.cc:83] No warmup data file found at /home/username_lol/project_name_lol/models/0/assets.extra/tf_serving_warmup_requests2018-11-22 17:08:08.166443: I tensorflow_serving/core/loader_harness.cc:86] Successfully loaded servable version {name: 0 version: 0}2018-11-22 17:08:08.167392: I tensorflow_serving/model_servers/main.cc:327] Running ModelServer at 0.0.0.0:9000 ...

To get out of this screen, hit Ctrl-A followed by Ctrl-D.

(NOTE on screen: to get back to the “model_server” screen, do

screen -r model_server

To go to scroll mode (aka “Copy Mode” ) while in a screen, do Ctrl-A, followed by your escape key.

To exit Copy Mode, just hit escape again.

To stop this process and deleting this screen, Ctrl-C out of the tensorflow_model_server command and then type exit

To see all screens, do

screen -list

)

7. Write the Low-Level API Client

Now that we have a running gRPC server exposing the trained model, we need to be able to pass data to it and get a prediction back. Think of it as a traditional web API such as Google Maps API — you send a query and receive a bunch of results back. This is the same idea here, except

  1. Instead of sending an HTTP request with a URL (we haven’t seen a URL so far), we use a known domain (localhost, or better 0.0.0.0) and a port (we used 9000) to access the service.
  2. Instead of building a standard HTTP POST request object with either a JSON body or a form, we must use a tf-serving specific template that looks like
{
model_spec: { # required param
'name': '0', # required param
'signature_name': 'serving_default' # required param
},
inputs: { # required param
'input': <a Protobuf object> # our param,
'keep_prob': <a Protobuf object> # our param,
'target_sequence_length': <a Protobuf object> # our param,
'source_sequence_length': <a Protobuf object> # our param,
},
}

This is . The gRPC/tf-serving object requires at least a “name.” We didn’t have one so it defaults to the same as our version name, “0.” The required “signature_name” of “serving_default” is equivalent to tf.saved_model.tag_constants.SERVING. In terms of “inputs,” we defined them back in training — they are simply the variable names in the graph. However, instead of using TensorFlow variables as we did in training, we now need to use their Protobuf equivalents. To do this, we use the method make_tensor_proto to turn a native Python scalar or list (it doesn’t handle NumPy objects as smoothly) into a Protobuf object. For an unknown reason, we can’t set the resultant Protobuf objects directly to a dictionary but must use a strangely named tf-serving request param setter method, CopyFrom.

My code below illustrates the logic:

The Server.__init__ method builds a boiler-plate gRPC channel and in turn a tensorflow-serving-api prediction_service_pb2 Stub. These two lines should most likely apply to your problem as well. Then I have some custom logic that loads my cached preprocessing results from disk.

In the translate method, I use some custom helper methods to process a “word” (my user input) into inference data, before turning that inference data into a tf-serving request object.

The idea is that we’ll have a high-level web app that uses the Server API defined in this script.

8. Write the Flask App to Use the Server Cl

Here’s a standard Flask app that uses the service. It also supports HTTPS (you might thank me later — it took me a week to figure out).

We will use a new screen to run this program

screen -S flask_translate
# new screen
python flask_translate.py
# Hit CTRL-A, CTRL-D to exit

Voila! That’s the minimal example so far. If you are interested, I may build a small React frontend for a better illustration. Please let me know in the comments if you’d like to see that!

--

--

Zhanwen Chen
Repro Repo

A PhD student interested in learning from data.