How to Deploy a TensorFlow Model to a Virtual Private Server

This is the first post in a series of three for the Chicago Python (ChiPy) Mentorship Program, if you’re in Chicago you should check it out:

I’m a grad student. I do some machine learning. I want other people to be able to use what I’ve created. But a few weeks ago, I didn’t know anything about deploying machine learning models. I didn’t even know where to start. ChiPy paired me with a great mentor, and here I’ll show you what we’ve been working on. If you are interested in deploying machine learning models, read on.

The objective

Deploy a TensorFlow model so people can send the model data and get predictions back.

How are we going to make the model available?

TensorFlow Serving provides several options to make your model available. Here I’m going to create a RESTful API, so we can make POST requests to the model.

Saving a TensorFlow model

Before we serve a model, it needs to be in the right format. The easiest way to save a model for serving is to use the SavedModelBuilder. My model has three inputs and two outputs. I’m keeping everything else as a black box for simplicity.

Here’s a quick example on how to save a TensorFlow model:

The steps are as follows:

  1. Create the SavedModelBuilder, all you need to do is provide it with the directory you want to save your model. TensorFlow will throw an error if you try to use a model path that already exists, so add a timestamp or another unique version identifier to your model path to avoid this. This is also important for how TensorFlow serves your models: it will automatically use the highest version number. In this case it will use the most recent timestamp.
  2. Create a SignatureDef for the model. The Signature Definition is what the RESTful API is going to expect for the input and output structure. For this model, it will expect three inputs: x1, x2, keep_prob, and will return two outputs y1, y2. There are three types of SignatureDefs: Classification, Prediction, and Regression. I’m using the prediction signature here, so I use the signature constant PREDICT_METHOD_NAME. You could also use REGRESS_METHOD_NAME or CLASSIFY_METHOD_NAME depending on your model. Use build_tensor_info to get your inputs and output placeholders for the signature builder.
  3. Add our graph and variables to the SavedModelBuilder. Because we want to serve our model from the cloud, we need to specify the SERVING tag. Later we will be calling get_score with x1, x2, keep_prob as inputs to get our predictions y1, y2.

SavedModelBuilder saves our model in a protobuf (.pb) file, and our variables in their own folder. See the docs for more info on protobuf files. Here’s an example of what my models directory looks like: I ended up building two different models, and each model has two versions.

Directory structure for two different TensorFlow models, each with two versions.

Let’s inspect a version of one of the models to see what it looks like.

Signature definition for one of the models we saved.

Now our model is ready to serve.

Configuring the server

My model is a lightweight Siamese Neural Net, so I decided to put it on a cheap virtual private server from Amazon Web Services. I’m using a server with Ubuntu 16.04.5 LTS. There are a couple of settings we need to change in the Amazon Lightsail dashboard:

  1. Attach a static IP. This will make it easy to send requests to the model in Postman later.
  2. In the Networking Tab, edit the Firewall rules to open Port 8501 (the default port TensorFlow uses for its RESTful API). I also opened Port 443 for HTTPS so I could copy my models to the server using Git. Here’s what my settings look like:

Serving with Docker

The TensorFlow docs recommend using Docker to serve models. Here’s an outline of the steps to get TensorFlow Serving working with Docker.

  1. Install Docker on your server
  2. Get the TensorFlow serving image (pull it from Docker)
  3. Create your own serving image. For our case, it looks like this:

Notice that the version folder is NOT included in the model path. At first, I kept getting errors that the model could not be found because I was including the version folder in the model path. Leave the version folder out of the path.

4. Run a container for the model.

Our TensorFlow ModelServer is now running in a container, and can be accessed on port 8501 of the server.

Sending a POST request

Eventually we are going to take user input from a website and feed it into our model. But to start, let’s try sending the model some data and verify that it makes a prediction.

First, we need some data to send, and the data needs to be in the correct format. Here’s a link to the docs that provide the formats TensorFlow Serving accepts.

When we inspected our model above, you could see the inputs x1 and x2 had 46 features. The keep_prob input had unknown shape, but I can tell you it’s for dropout regularization and is a float in the range(0,1). When I test my model I want it to equal 1.0. Here’s some fake example data so you can get an idea of what the request body will look like:

Fake data example for the body of a POST request.

Note that when you send the POST request, it should be in the following format:

http://IP-ADDRESS:PORT-NUMBER/v1/MODEL_PATH:predict

TensorFlow automatically uses the highest version of our model, and sends us back our outputs y1 and y2. The next post will set up a website that uses our RESTful API.