A Practical Introduction to AI Deployment with TensorFlow Serving, Lite and JavaScript

Kriengkrai Jirawongaram
Super AI Engineer
Published in
5 min readMar 27, 2021
https://www.tensorflow.org/

TensorFlow is one of the best open source machine learning platforms. It has the most complete deployment options available. The target platform can be divided into 3 groups.

  1. TensorFlow Serving — for production environments
  2. TensorFlow Lite — for Embedded, IoT and mobile devices
  3. TensorFlow.js (JavaScript) — for the Web

This article will introduce you to these deployment options in practice.

To demonstrate, image classification model on Fashion-MNIST dataset will be taken as an example. The dataset consists of a training set of 60,000 examples and a test set of 10,000 examples. Each example is a 28x28 grayscale image, associated with a label from 10 classes.

Ref: Fashion-MNIST — https://www.tensorflow.org/datasets/catalog/fashion_mnist

The model chosen is based on Xception architecture with a modified top layer for 10 classes prediction.

Epoch 1/10 1875/1875 [==============================] - 93s 48ms/step - loss: 0.6178 - accuracy: 0.7816 - val_loss: 0.4304 - val_accuracy: 0.8553
Epoch 2/10 1875/1875 [==============================] - 89s 48ms/step - loss: 0.3506 - accuracy: 0.8757 - val_loss: 0.3801 - val_accuracy: 0.8715
Epoch 3/10 1875/1875 [==============================] - 89s 47ms/step - loss: 0.2998 - accuracy: 0.8934 - val_loss: 0.4448 - val_accuracy: 0.8321
Epoch 4/10 1875/1875 [==============================] - 89s 48ms/step - loss: 0.2577 - accuracy: 0.9062 - val_loss: 0.3373 - val_accuracy: 0.8890
Epoch 5/10 1875/1875 [==============================] - 89s 47ms/step - loss: 0.2369 - accuracy: 0.9161 - val_loss: 0.3002 - val_accuracy: 0.8921
Epoch 6/10 1875/1875 [==============================] - 88s 47ms/step - loss: 0.2128 - accuracy: 0.9247 - val_loss: 0.2912 - val_accuracy: 0.8961
Epoch 7/10 1875/1875 [==============================] - 89s 48ms/step - loss: 0.1839 - accuracy: 0.9336 - val_loss: 0.2781 - val_accuracy: 0.9064
Epoch 8/10 1875/1875 [==============================] - 92s 49ms/step - loss: 0.1641 - accuracy: 0.9406 - val_loss: 0.2751 - val_accuracy: 0.9084
Epoch 9/10 1875/1875 [==============================] - 91s 49ms/step - loss: 0.1552 - accuracy: 0.9434 - val_loss: 0.2556 - val_accuracy: 0.9124
Epoch 10/10 1875/1875 [==============================] - 91s 49ms/step - loss: 0.1358 - accuracy: 0.9509 - val_loss: 0.2702 - val_accuracy: 0.9134

The trained model is then saved into ‘fashion_mnist/1/’ path where ‘1’ is the version of this model needed by TensorFlow Serving. It is then packed into a single ‘fashion_mnist.txz’ file for portability.

tar Jcvf fashion_mnist.txz fashion_mnist

Next, this trained model is deployed into all platforms supported by TensorFlow.

TensorFlow Serving

https://www.tensorflow.org/tfx/

The most common way is to deploy a full TensorFlow model through the TensorFlow Serving which is a part of TensorFlow Extended (TFX). The application is packaged into a docker image which is very easy to use.

To deploy the model, first, the fashion_mnist model is put into /opt/data/fashion_mnist/<version>:

mkdir -p /opt/data
tar Jxvf fashion_mnist.txz -C /opt/data/
-----------------------------------------------------------------
fashion_mnist
└── 1
├── assets
├── saved_model.pb
└── variables
├── variables.data-00000-of-00001
└── variables.index

By default, TensorFlow Serving serves the latest version of the model. However, multiple model versions can also be served by setting the configuration file (read more at: https://www.tensorflow.org/tfx/serving/serving_config).

After that, the docker is run to serve the model:

docker pull tensorflow/servingdocker run -t --rm -p 8501:8501 -v /opt/data/fashion_mnist/:/models/fashion_mnist -e MODEL_NAME=fashion_mnist tensorflow/serving

That is all! The model is then ready to be used with a RESTful API (read more at: https://www.tensorflow.org/tfx/serving/api_rest).

To test the model, a request must be submitted to the TensorFlow Serving service port: 8501 as defined above.

With all these available tools, TensorFlow Serving is a very complete and scalable solution. It is undoubtedly the best option when deploying a full TensorFlow model on production environment.

TensorFlow Lite

https://www.tensorflow.org/lite/

With the advancement of Embedded, IoT and mobile platforms, TensorFlow model can be deployed and run at the edge with limited resources. For low computing power, TensorFlow Lite makes the deployment possible by trading its accuracy with the speed.

It is very easy to convert a full TensorFlow model into a TensorFlow Lite model by using the provided tools.

https://www.tensorflow.org/lite/convert
tflite_convert \
--saved_model_dir=fashion_mnist/1 \
--output_file=fashion_mnist.tflite

Our full Fashion-MNIST TensorFlow model size is 244 MB while the Lite model size is just 80 MB.

With TensorFlow Lite, it is possible to deploy the model into Heroku or other limited resource service provider.

Since I have a good Thai MNIST (Number) Prediction model trained on an assignment in Super AI Engineer Level 1, it is very easy to convert it into a TensorFlow Lite model, package it as a docker image and deploy it into a free resources limited Heroku service (Try it at: https://warm-brushlands-75891.herokuapp.com/ — It is slow to start but it is still working ! Please wait and try to reload the page).

TensorFlow.js (JavaScript)

https://www.tensorflow.org/js

For the web, another model format is needed. Similar to TensorFlow Lite, the-easy-to-use conversion tool is provided.

pip install tensorflowjstensorflowjs_converter \
--input_format=tf_saved_model \
--output_format=tfjs_graph_model \
--signature_name=serving_default \
--saved_model_tags=serve \
fashion_mnist/1/ \
fashion_mnist_web

Our original Fashion-MNIST TensorFlow model is: 244 MB while the Web version is just 80 MB (as small as the Lite version).

Be noted that, TensorFlow.js does not support Keras’s experimental layers so make sure not to train the model with unsupported layers.

A simple HTML file with JavaScript for TensorFlow.js is created.

Now, the model is accessible on the web and run just inside the web browser on user machine. That is why the inference time is quite high.

However, TensorFlow.js is very powerful. It is more than just a deployment platform. There are various options for running, fine-tuning and developing the model independently.

With these wide deployment options, TensorFlow is surely an extensible platform that can be run anywhere and it brings AI powered applications around you.

by Kriengkrai Jirawongaram <kk@jira.org> ID: 22p25c0762@EXP

--

--