Published in


TensorFlow Serving 101 pt. 1

Part 1: Saving and serving your model

  • Save a TensorFlow model so it can be loaded with TensorFlow Serving ModelServer and used in production.
  • Serve your model with the TensorFlow Serving ModelServer.
  • Send requests to your model (and get responses).


  • You spend way too much time exploring all the options and learning new concepts instead of just building what you were supposed to build in the first place.
  • You copy some stuff you don’t really understand, but it works. When it eventually breaks or you want to build a new feature, you spend all your time figuring out how it works (again).

1. Write a simple model

placeholder_name = 'a'
operation_name = 'add'

a = tf.placeholder(tf.int32, name=placeholder_name)
b = tf.constant(10)

# This is our model
add = tf.add(a, b, name=operation_name)

with tf.Session() as sess:

# Run a few operations to make sure our model works
ten_plus_two = sess.run(add, feed_dict={a: 2})
print('10 + 2 = {}'.format(ten_plus_two))

ten_plus_ten = sess.run(add, feed_dict={a: 10})
print('10 + 10 = {}'.format(ten_plus_ten))

2. Save the model

  • If your model(s) are complex and run slowly on CPU, you would want to run your models on more accelerated hardware (like GPUs). Your API-microservice(s), on the other hand, usually run fine on CPU and they’re often running in “everything agnostic” Docker containers. In that case you may want to keep those two kinds of services on different hardware.
  • If you start messing up your neat Docker images with heavy TensorFlow models, they grow in every possible direction (CPU usage, memory usage, container image size, and so on). You don’t want that.
  • Let’s say your service uses multiple models written in different versions of TensorFlow. Using all those TensorFlow versions in your Python API at the same time is going to be a total mess.
  • You could of course wrap one model into one API. Then you would have one service per model and you can run different services on different hardware. Perfect! Except, this is what TensorFlow Serving ModelServer is doing for you. So don’t go wrap an API around your Python code (where you’ve probably imported the entire tf library, tf.contrib, opencv, pandas, numpy, …). TensorFlow Serving ModelServer does that for you.
  • Most importantly, the TensorFlow team wrote TensorFlow Serving and the ModelServer for a reason. They are probably better than you when writing a high performance serving system. Use it!
  • First we have to grab the input and output tensors.
  • Create a signature definition from the input and output tensors. The signature definition is what the model builder use in order to save something a model server can load.
  • Save the model at a specified path where a server can load it from.
# Pick out the model input and output
a_tensor = sess.graph.get_tensor_by_name(placeholder_name + ':0')
sum_tensor = sess.graph.get_tensor_by_name(operation_name + ':0')

model_input = build_tensor_info(a_tensor)
model_output = build_tensor_info(sum_tensor)

# Create a signature definition for tfserving
signature_definition = signature_def_utils.build_signature_def(
inputs={placeholder_name: model_input},
outputs={operation_name: model_output},
builder = saved_model_builder.SavedModelBuilder('./models/simple_model/1')

sess, [tag_constants.SERVING],

# Save the model so we can serve it with a model server :)

3. Serving the model

docker run -it -p 9000:9000 --name simple -v $(pwd)/models/:/models/ epigramai/model-server:light --port=9000 --model_name=simple --model_base_path=/models/simple_model
  • When you do docker run, you run the image epigramai/model-server:light. The default entrypoint for this image is tensorflow_model_server. This means that when you run the container, you also start the model server.
  • Because the model is not built into the image (remember, the image is just the model server) we make sure the container can find the model by mounting (-v) the models/ folder to the container.
  • The -it option basically tells docker to show you the logs right in the terminal and not run in the background. The name option is just the name of the container, this has nothing to do with TensorFlow or the model.
  • Then there’s the -p option, and this one is important. This option tells docker to map its internal port 9000 out to port 9000 of the outside world. The outside world in this case is your computer also known as localhost. If we omitted this option, the model server would serve your model on port 9000 inside the container, but you would not be able to send requests to it from your computer.
  • The three last flags are sent all the way to the model server. The port is 9000 (yep, the port we are mapping out to your machine). With the model_name flag we give our model a name. And with the last flag we tell the model server where the model is located. Again, the model is not in the image, but because we used the -v option and mounted the folder to the container, the model server can find the model inside the running container.
docker stop simple && docker rm simple

Is this really production ready? What happened to GPUs?



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store