Deploying a TensorFlow Model to Kubernetes

An versatile approach to using AI in a microservice architecture

Maxime Moreillon
Geek Culture
Published in
5 min readMay 1, 2021

--

Let’s imagine that you’ve just finished training your new TensorFlow model and want to start using it in your application(s). One obvious way to do so is to simply import it in the source code of every application that uses it. However, it might be more versatile to keep your model in one place as standalone and simply have applications exchange data with it through HTTP calls. This article will go through the steps of building such a system and deploy the result to Kubernetes for maximum availability.

The TensorFlow model

First, we need a model to work with. Here, we’ll use the MNIST classifier of the TensorFlow getting started guide:

# Importing TensorFlow
import tensorflow as tf
# Loading the data
mnist = tf.keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
# Data preprocessing (here, normalization)
x_train, x_test = x_train / 255.0, x_test / 255.0
# Building the model
model = tf.keras.models.Sequential([
tf.keras.layers.Flatten(input_shape=(28, 28)),
tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(10)
])
# Loss function declaration
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
# Model compilation
model.compile(optimizer='adam',
loss=loss_fn,
metrics=['accuracy'])
# Training
model.fit(x_train, y_train, epochs=5)

Once trained, the model can be used to make predictions. Here is an example where the first two test images are fed to the model:

model(x_test[:2])

In some cases, one might want to completely separate the development of the AI model from the that of the applications using it. A typical example would be a microservice architecture.

For such cases, the model can be exposed using a REST API which can be done using modules like Flask. However, TensorFlow provides a premade option for this exact purpose: TensorFlow Serving. TensorFlow Serving consists of a Docker container that contains all the necessary logic to expose an exported TensorFlow model to HTTP requests.

As a first step towards using TensorFlow serving, we will start by exporting the model using the save function of Keras:

model.save('./mymodel/1/')

The exported model now resides in the folder ./mymodel/1/. Note that here mymodel is the name of the model and 1 becomes its version.

The TensorFlow Serving Docker container

The model has now been exported and can be copied in a TensorFlow serving container. To do so, we start by pulling an empty container from Docker Hub and run it locally:

docker run -d --name serving_base tensorflow/serving

With the container running, the exported model can be copied over:

docker cp ./mymodel serving_base:/models/mymodel

The container now contains our model and can be saved as a new image. This can be done by using the docker commit command:

docker commit --change "ENV MODEL_NAME mymodel" serving_base my-registry/mymodel-serving

This command also sets the MODEL_NAME environment variable to our model name. Note here that my-registry is the URL of the docker registry to push the image to.

Once done, we can get rid of the original TensorFlow Serving image

docker kill serving_base
docker rm serving_base

Here, it might be a good idea to check if the container is actually working by running it:

docker run -d -p 8501:8501 my-registry/mymodel-serving

Note that 8501 is the port TensorFlow serving uses for its REST API.

A Get request to http://<docker host IP>:8501/v1/models/mymodel Should return the following JSON:

{
"model_version_status": [
{
"version": "1",
"state": "AVAILABLE",
"status": {
"error_code": "OK",
"error_message": ""
}
}
]
}

If everything is successful so far, the container can be pushed to the registry, which will make it available for Kubernetes to pull:

docker push my-registry/mymodel-serving

Deploying the container to Kubernetes

Now that the container has been pushed to a registry, it can be deployed to our Kubernetes cluster. This is achieved by creating two resources in the cluster: a deployment and a service. The deployment is basically the application itself while the service is here to allow users to reach the deployment from outside the cluster. Here, we will use a NodePort service so that our TensorFlow serving container can be accessed from outside the cluster by simply using a dedicated port. We will choose 30111 for this matter.

Creating those resources is done simply by applying the content of a YAML manifest file with the kubectl command. In our case, here is the content of our kubernetes_manifest.yml file:

apiVersion: apps/v1
kind: Deployment
metadata:
name: mymodel-serving
spec:
replicas: 1
selector:
matchLabels:
app: mymodel-serving
template:
metadata:
labels:
app: mymodel-serving
spec:
containers:
- name: mymodel-serving
image: my-registry/mymodel-serving
ports:
- containerPort: 8501
---
apiVersion: v1
kind: Service
metadata:
name: mymodel-serving
spec:
ports:
- port: 8501
nodePort: 30111
selector:
app: mymodel-serving
type: NodePort

The resources can then be created by executing

kubectl apply -f kubernetes_manifest.yml

The container should now be deployed in the Kubernetes cluster

Using the TensorFlow serving API

The AI model deployed in Kubernetes can now be used in prediction. This can be done by sending a POST request to prediction API of the TensorFlow Serving container. The body of the request consists of a JSON containing the input data to be fed to the model. The model will then reply with its prediction, also in JSON format. Here is an example of how this can be implemented in Python, using the requests module

# Import the necessary modules
import requests
import numpy as np
import json
import tensorflow as tf
# Loading data
mnist = tf.keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
# Data preprocessing (here, normalization)
x_train, x_test = x_train / 255.0, x_test / 255.0
# Format the image data so as to be sent as JSON
payload = json.dumps( { 'instances': x_test[:2].tolist() } )
# URL of the TensorFlow Serving container's API
url = 'http://<cluster IP>:30111/v1/models/mymodel:predict'
# Send the request
response = requests.post(url, data=payload)
# Parse the response
prediction = response.json()["predictions"]
# Print the result
print(prediction)

Conclusion

Thanks to Tensorflow serving, AI models can be containerized easily, turning them into own standalone applications that can be interacted with through HTTP calls. This provides increased separation of concern and modularity, especially compared to embeddeding the model directly in the source code of the application relying on it.

--

--

Responses (1)