Deploy a Trained RNN/LSTM Model with TensorFlow-Serving and Flask, Part 2: Export and Serve Model with ModelSaver
In part 1, we explained the TensorFlow-Serving architecture, showed how to export a model the official way using SavedModelBuilder, and installed all the tools to serve that model. Here we will use some command-line magic to put those tools to work.
4. Install the “Screen” Utility
Assuming you’ll ssh into this server, you will soon find out that you need to initiate some permanent processes that use up whole screens. I don’t enjoy typing the ssh command repeatedly so I’ll use the terminal manager “screen.”
On your new server, get screen by
sudo apt-get update
sudo apt-get install screen
5. Copy over your model
Under your home directory (assuming you are not root. Please don’t be root for the love of web security), create a folder for you project, say “translation_project.” Move your trained model there, so that its path is “~/translation_project/models/0.” Now cd
into the project folder and start the awesome process:
6. Start the ModelServer
Let’s use screen
to create a fake terminal just for the ModelServer process. call it “model_server:”
screen -S model_server
You’ll now see an empty terminal. When you do an ls
, you should see your models
folder. To start the model_server, type the following command:
tensorflow_model_server --port=9000 \--model_name=0 --model_base_path=$(pwd)/models
IMPORTANT: Like I said in part 1, you must give your model file an integer name such as “0” or “1”. They are treated as version numbers.
Upon success, you should see something like this:
2018-11-22 17:08:07.986390: I tensorflow_serving/model_servers/main.cc:157] Building single TensorFlow model file config: model_name: 0 model_base_path: /home/username_lol/project_name_lol/models2018-11-22 17:08:07.986563: I tensorflow_serving/model_servers/server_core.cc:462] Adding/updating models.2018-11-22 17:08:07.986589: I tensorflow_serving/model_servers/server_core.cc:517] (Re-)adding model: 02018-11-22 17:08:08.087134: I tensorflow_serving/core/basic_manager.cc:739] Successfully reserved resources to load servable {name: 0 version: 0}2018-11-22 17:08:08.087180: I tensorflow_serving/core/loader_harness.cc:66] Approving load for servable version {name: 0 version: 0}2018-11-22 17:08:08.087198: I tensorflow_serving/core/loader_harness.cc:74] Loading servable version {name: 0 version: 0}2018-11-22 17:08:08.087222: I external/org_tensorflow/tensorflow/contrib/session_bundle/bundle_shim.cc:360] Attempting to load native SavedModelBundle in bundle-shim from: /home/username_lol/project_name_lol/models/02018-11-22 17:08:08.087238: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:31] Reading SavedModel from: /home/username_lol/project_name_lol/models/02018-11-22 17:08:08.094848: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:54] Reading meta graph with tags { serve }2018-11-22 17:08:08.103802: I external/org_tensorflow/tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA2018-11-22 17:08:08.140437: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:113] Restoring SavedModel bundle.2018-11-22 17:08:08.166257: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:148] Running LegacyInitOp on SavedModel bundle.2018-11-22 17:08:08.166315: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:233] SavedModel load for tags { serve }; Status: success. Took 79069 microseconds.2018-11-22 17:08:08.166350: I tensorflow_serving/servables/tensorflow/saved_model_warmup.cc:83] No warmup data file found at /home/username_lol/project_name_lol/models/0/assets.extra/tf_serving_warmup_requests2018-11-22 17:08:08.166443: I tensorflow_serving/core/loader_harness.cc:86] Successfully loaded servable version {name: 0 version: 0}2018-11-22 17:08:08.167392: I tensorflow_serving/model_servers/main.cc:327] Running ModelServer at 0.0.0.0:9000 ...
To get out of this screen, hit Ctrl-A followed by Ctrl-D.
(NOTE on screen: to get back to the “model_server” screen, do
screen -r model_server
To go to scroll mode (aka “Copy Mode” ) while in a screen, do Ctrl-A, followed by your escape key.
To exit Copy Mode, just hit escape again.
To stop this process and deleting this screen, Ctrl-C out of the tensorflow_model_server
command and then type exit
To see all screens, do
screen -list
)
7. Write the Low-Level API Client
Now that we have a running gRPC server exposing the trained model, we need to be able to pass data to it and get a prediction back. Think of it as a traditional web API such as Google Maps API — you send a query and receive a bunch of results back. This is the same idea here, except
- Instead of sending an HTTP request with a URL (we haven’t seen a URL so far), we use a known domain (localhost, or better 0.0.0.0) and a port (we used 9000) to access the service.
- Instead of building a standard HTTP POST request object with either a JSON body or a form, we must use a
tf-serving
specific template that looks like
{
model_spec: { # required param
'name': '0', # required param
'signature_name': 'serving_default' # required param
},
inputs: { # required param
'input': <a Protobuf object> # our param,
'keep_prob': <a Protobuf object> # our param,
'target_sequence_length': <a Protobuf object> # our param,
'source_sequence_length': <a Protobuf object> # our param,
},
}
This is . The gRPC/tf-serving object requires at least a “name.” We didn’t have one so it defaults to the same as our version name, “0.” The required “signature_name” of “serving_default” is equivalent to tf.saved_model.tag_constants.SERVING
. In terms of “inputs,” we defined them back in training — they are simply the variable names in the graph. However, instead of using TensorFlow variables as we did in training, we now need to use their Protobuf equivalents. To do this, we use the method make_tensor_proto
to turn a native Python scalar or list (it doesn’t handle NumPy objects as smoothly) into a Protobuf object. For an unknown reason, we can’t set the resultant Protobuf objects directly to a dictionary but must use a strangely named tf-serving
request param setter method, CopyFrom
.
My code below illustrates the logic:
The Server.__init__
method builds a boiler-plate gRPC channel and in turn a tensorflow-serving-api
prediction_service_pb2
Stub
. These two lines should most likely apply to your problem as well. Then I have some custom logic that loads my cached preprocessing results from disk.
In the translate
method, I use some custom helper methods to process a “word” (my user input) into inference data, before turning that inference data into a tf-serving
request object.
The idea is that we’ll have a high-level web app that uses the Server API defined in this script.
8. Write the Flask App to Use the Server Cl
Here’s a standard Flask app that uses the service. It also supports HTTPS (you might thank me later — it took me a week to figure out).
We will use a new screen to run this program
screen -S flask_translate
# new screen
python flask_translate.py
# Hit CTRL-A, CTRL-D to exit
Voila! That’s the minimal example so far. If you are interested, I may build a small React frontend for a better illustration. Please let me know in the comments if you’d like to see that!