Automated Canary Release of TensorFlow Models on Kubernetes

Srinivasan Parthasarathy

Published in

iter8-tools

7 min readOct 5, 2020

Learn how to perform safe, automated, and statistically robust canary release of TensorFlow models on Kubernetes.

When should you perform canary release of ML models?
Safe, automated, and statistically robust canary release of TensorFlow models on Kubernetes.
Enhance the canary release experiment.
References.

When should you perform canary release of ML models?

Canary release is a technique for safely introducing a new version of your software in a production environment. You expose the new version to a small portion of user traffic to begin with, ensure that it satisfies various criteria that you specify, progressively expose the new version to a larger fraction of the traffic, and finally replace the current version with the new version once you are confident that the new version behaves well.

In the context of ML models in production, it is good practice to perform canary rollouts whenever you create a new model (candidate) to replace the current model (baseline) deployed in production. Perhaps, you observed that the accuracy of your baseline model has degraded over time due to concept drift and have created a more accurate model; you want to replace the baseline with the candidate after ensuring that key performance indicators (KPIs) (such as prediction serving latency) for the candidate are within acceptable limits in production. Automated canary release can help you accomplish this goal in a safe, robust, confident, and repeatable manner.

In this article, we will perform an automated canary release of a TensorFlow model on Kubernetes. The scenario we will exercise is as follows.

Figure 1: All traffic flows to the baseline at the start of the canary release experiment. #1 **Assuming the candidate satisfies the criteria you specify,** increasingly more traffic will flow to the candidate during the experiment, and the candidate will replace the baseline after the experiment. #2 **If the candidate fails to satisfy the criteria you specify,** it will receive only a small fraction of traffic during the experiment, and the baseline will be retained after the experiment. Our scenario will exercise case #1.

Safe, automated, and statistically robust canary release of TensorFlow models on Kubernetes.

The scenario we will exercise involves five steps. The first three steps mimic a production Kubernetes cluster running the baseline model. The last two steps show how to create a canary release experiment and deploy the candidate.

Step 1: Setup Kubernetes with Istio, Prometheus, and iter8.
Step 2: Deploy baseline model and externalize it.
Step 3: Send traffic (prediction requests).
Step 4: Create canary release experiment.
Step 5: Deploy candidate model.

Step 1: Setup Kubernetes with Istio, Prometheus, and iter8.

Kubernetes: Make sure you have access to a Kubernetes cluster; the kubectl command must be available and should work on your terminal.

kubectl versionClient Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.2", GitCommit:"52c56ce7a8272c798dbc29846288d7cd9fbae032", GitTreeState:"clean", BuildDate:"2020-04-16T11:56:40Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"darwin/amd64"}Server Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.3", GitCommit:"2e7996e3e2712684bc73f0dec0200d64eec7fe40", GitTreeState:"clean", BuildDate:"2020-05-20T12:43:34Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}

Istio: Istio service mesh makes it possible to observe metrics for microservices running on a Kubernetes cluster and to control traffic between various versions of a microservice. Download and install Istio on your Kubernetes cluster as follows.

curl -L https://istio.io/downloadIstio | sh -cd istio-1.7.2export PATH=$PWD/bin:$PATHistioctl install

Note: As of the time of writing this article, the current version of Istio is 1.7.2. If the version of Istio you downloaded using the curl command above is different (for example, 1.8.0), you will need to change the cd command appropriately.

Prometheus: Prometheus is a time series database that can be used for tracking a variety of metrics associated with microservices in Kubernetes. Install Istio’s Prometheus addon: in Istio versions 1.7+, you can do so using the following command.

kubectl apply -f https://raw.githubusercontent.com/istio/istio/master/samples/addons/prometheus.yaml

Iter8: Iter8 is a continuous experimentation platform for Kubernetes and enables a rich variety of automated and human-in-the-loop experiments for microservices and ML models deployed on Kubernetes. Install iter8 using the following command.

curl -L -s https://raw.githubusercontent.com/iter8-tools/iter8/v1.0.0-rc3/install/install.sh | /bin/bash -

Step 2: Deploy baseline model and externalize it.

The baseline and candidate models we will use in our scenario are both deep learning CNN models for image classification that are trained using the Fashion MNIST clothing dataset. We will focus on ML serving in this article; however, if you are interested in the ML training steps and the related artifacts used to create the baseline and candidate model deployments, you can examine this GitHub repo, and this Jupyter notebook in particular; the latter is based on this TFServing tutorial.

First, enable Istio sidecar injection for the default namespace using the following command:

kubectl label namespace default istio-injection=enabled

Next, deploy the baseline model using the following command.

kubectl apply -f https://raw.githubusercontent.com/iter8-tools/mlops/master/modelv1.yaml

Finally, externalize the model (i.e., make it possible for the model to receive prediction requests from outside the cluster) using the following command:

kubectl apply -f https://raw.githubusercontent.com/iter8-tools/mlops/master/externalize.yaml

At this point, the state of your application resembles the top portion of Figure 1 above titled “Start of the experiment”.

Step 3: Send traffic (prediction requests).

We will use this Jupyter notebook to send prediction requests to our models. Clone the repo, set up a Python 3 virtual environment, install dependencies, and fire up the notebook using the following commands.

git clone git://github.com/iter8-tools/mlops.gitcd mlopspython3 -m venv .venvsource .venv/bin/activatepip install tensorflow jupyterlab notebook matplotlib requestsjupyter notebook tfserving.ipynb

Execute the first two cells in the notebook which Import Python modules and Load Fashion MNIST data. Ensure that the baseline model can accept prediction requests by executing the two cells under Send an image to the deployed model. This step requires knowing where to send the prediction requests, which in turn requires determining and setting the gateway_url variable in the notebook. If your Kubernetes cluster is setup over Minikube, this is already handled for you in the notebook. For other Kubernetes environments, refer to Istio’s documentation on how to determine ingress_host and ingress_port, and set the gateway_url appropriately within the notebook. Correct execution of this cell will yield an output similar to the following:

Figure 2: Classifying an image using the default model deployed on Kubernetes.

Execute the next cell in the notebook titled Send Serialized Images for Classification to the Model Service. This will send a continuous stream of prediction serving requests to the model (again, you have ensure that gateway_url is set correctly in this cell). This traffic is used to assess the behavior of baseline and candidate versions.

Step 4: Create canary release experiment.

Create the canary release experiment as follows.

kubectl apply -f https://raw.githubusercontent.com/iter8-tools/mlops/master/fashionmnist-v2-rollout.yaml

This creates an iter8 experiment object, which you can watch as follows.

kubectl get experiment --watchNAME                      TYPE     HOSTS              PHASE   WINNER FOUND   CURRENT BEST   STATUSfashionmnist-v2-rollout   Canary   ["fashionmnist"]   Pause                                 TargetsError: Missing Candidate

Notice that the experiment is paused since there is no candidate available to experiment with as of now.

Step 5: Deploy candidate model.

Deploy the candidate model using the following command.

kubectl apply -f https://raw.githubusercontent.com/iter8-tools/mlops/master/modelv2.yaml

If you watch the experiment object, you will see the experiment progressing towards completion as follows.

kubectl get experiment --watchNAME                      TYPE     HOSTS                                 PHASE         WINNER FOUND   CURRENT BEST      STATUSfashionmnist-v2-rollout   Canary   ["fashionmnist","fashionmnist.com"]   Progressing   false          fashionmnist-v2   IterationUpdate: Iteration 1/8 completedfashionmnist-v2-rollout   Canary   ["fashionmnist","fashionmnist.com"]   Progressing   true           fashionmnist-v1   IterationUpdate: Iteration 2/8 completedfashionmnist-v2-rollout   Canary   ["fashionmnist","fashionmnist.com"]   Progressing   true           fashionmnist-v1   IterationUpdate: Iteration 3/8 completedfashionmnist-v2-rollout   Canary   ["fashionmnist","fashionmnist.com"]   Progressing   true           fashionmnist-v2   IterationUpdate: Iteration 4/8 completedfashionmnist-v2-rollout   Canary   ["fashionmnist","fashionmnist.com"]   Progressing   true           fashionmnist-v2   IterationUpdate: Iteration 5/8 completedfashionmnist-v2-rollout   Canary   ["fashionmnist","fashionmnist.com"]   Progressing   true           fashionmnist-v2   IterationUpdate: Iteration 6/8 completedfashionmnist-v2-rollout   Canary   ["fashionmnist","fashionmnist.com"]   Progressing   true           fashionmnist-v2   IterationUpdate: Iteration 7/8 completedfashionmnist-v2-rollout   Canary   ["fashionmnist","fashionmnist.com"]   Completed     true           fashionmnist-v2   ExperimentCompleted: Traffic To Winner

During the iter8 experiment, the state of your application progressed from start to middle to end as shown in Figure 1. During each iteration of the experiment, iter8 assessed how the baseline and candidate models were behaving in terms of the criteria you specified (the specific experiment we used in this article placed an upper limit on mean latency). Based on metrics collected in Prometheus, iter8 determined with a high degree of statistical confidence that the candidate satisfied the criteria and progressively shifted the traffic from baseline to candidate, and finally rolled out the candidate and cleaned up the baseline. For an in-depth look at how iter8 operates under the hood, read this blog article and this description of iter8’s online Bayesian estimation and multi-armed bandit algorithms.

Enhance the canary release experiment.

You can significantly enhance the canary release experiment described in this article using iter8’s continuous experimentation capabilities. Below are a few examples.

Use only a portion of the traffic, for instance, only requests emanating from a particular geography, for experimenting; the candidate will not receive any traffic from other geographies during the experiment. Once the experiment succeeds and the candidate is rolled out, it will receive all traffic. Iter8 enables you to use a match clause in the experiment for such scenarios, where matching is based on HTTP header fields in requests.
Compare two versions based on a reward metric (for example, user engagement or conversion rate) while constraining other metrics (for example, mean/tail latencies, error rates). Iter8 enables you to perform an A/B-rollout experiment in this scenario.
Compare more than two versions in the same experiment and select a winner based on reward and constrains. Iter8 enables you to perform an A/B/n-rollout experiment in this scenario.