Tensorflow 2.0 — from preprocessing to serving (Part 4)

Published in

DataSeries

7 min readMay 20, 2020

Welcome to this the fourth and last part of a tutorial into tensorflow and it’s keras API. We’ll be discussing everything deep learning — starting from how to preprocess input data, then modelling your neural net to encode your data and process an output, optimize training and serve the model as a REST API.

As you can see from the title, we’ve already had 3 stories on this topic already, in this one we’ll go through all those steps quickly, and use the model created to serve requests.

Before you start reading this article and it’s predecessors, you should be aware of the basics of the following subjects so that you don’t feel flustered while reading them:
1.Calculus
2.Linear Algebra
3.Neural Networks
4.Numpy, Pandas

And as you might have inferred from the topic this is a programming article so it might help having some pre-existing experience in python.

As this article involves serving models, it’s better to do this on Google Colab -Colaboratory is a free Jupyter notebook environment that requires no setup and runs entirely in the cloud.

With Colaboratory you can write and execute code, save and share your analyses, and access powerful computing resources, all for free from your browser.

The dataset we’ll be using here is the Fashion-MNIST. Fashion-MNIST is a dataset of Zalando's article images-consisting of a training set of 60,000 examples and a test set of 10,000 examples. Each example is a 28x28 grayscale image, associated with a label from 10 classes. We intend Fashion-MNIST to serve as a direct drop-in replacement for the original MNIST dataset for benchmarking machine learning algorithms. It shares the same image size and structure of training and testing splits.

An exemplar of the dataset is:

Sample from the Fashion MNIST Dataset (each class takes three-rows)

You guys might have prior experience with MNIST’s handwritten digits dataset but we won’t use that here because of two very elegant reasons:
1.Network design for MNIST is too easy, you can get a 98% accuracy even though you use Dense Networks.
2.It’s overused, we want to learn something new, not regurgitate old material.

But first we should get to know what an API is, An application program interface (API) is a set of routines, protocols, and tools for building software applications. Basically, an APIspecifies how software components should interact.

We’ll look at a small example:

Our Tensorflow model server is going to be the same, except it returns the predictions from a ML model

Now let us load in and normalize the data just like in the first part of the tutorial

We’ll then expand dimensions, reshape and get the number of unique classes. Now we go ahead and build out our convolutional model, just like in our second tutorial.

Now we go ahead and train:

Now that we’ve trained the model, we need to save the it in a temporary file so that it is ready to be served, if there’s already a saved model, we remove it from the system. A SavedModel is a directory containing serialized signatures and the state needed to run them, including variable values and vocabularies.

Note the use of versions, that’ll be of note later.

The saved_model.pb file stores the actual TensorFlow program, or model, and a set of named signatures, each identifying a function that accepts tensor inputs and produces tensor outputs.

SavedModels may contain multiple variants of the model (multiple v1.MetaGraphDefs, identified with the --tag_set flag to saved_model_cli).

The saved_model_cli contains more class information in it’s output, it’d be really great if you could look at the full output of this command.

Now go ahead and install the tensorflow-model-server package on your colab terminal using the “!”(bang) symbol. (Instructions to set up the package locally are also included in the whole code)

Now we’ll go ahead and start serving the model as a service.

This is where we start running TensorFlow Serving and load our model. After it loads we can start making inference requests using REST. There are some important parameters:

rest_api_port: The port that we'll use for REST requests.
model_name: We'll use this in the URL of REST requests. It can be anything.
model_base_path: This is the path to the directory where we've saved our model.

The nohup parameter is required so the service runs uninterrupted. Let’s just check our logs so as to see that everything went without a hitch

If no errors are logged, it means that our model has started serving and we can send requests to it to get predictions.

Now, let’s look at a random example from our dataset

Now we’ll create a request, to be passed in a JSON format, it should contain the inputs, properly formatted, exactly like we passed the data for training, because our model is trained to recognize only preprocessed data.

As you can clearly see, we’re passing in 3 instances to get inferences for.

Now, we package this in a request and send it to out model for predictions, we get a ping back, predictions namely, in the same JSON format.

The predictions are in softmax format, a probabilty corresponding to every class an instance belongs to. So what we can do is find the index each instance belongs to by taking argmax of each prediction and mapping them back to string labels.

Now let’s see how they matched up with actual predicitions

Pretty good results!

Now, we look into versioning our models, let’s see how to make and select models by version

Pay ectra attention to the last couple parts of the url to which we’re making a post request, specifically versions/1:predict, by default if there’s only one model, that’s v1 and that’s the only one being served.

We’ll now go ahead and make another version of our model, compile and fit it on our dataset again.

Now save the second version of our model appropriately

The coolest thing is, you don’t need to restart the server for the model to recognize the second version, it has already been saved as an asset and is ready to be served. We again make a post request, but this time to the second version of the model