Manage ML Deployments Like A Boss: Deploy Your First AB Test With Sklearn, Kubernetes and Seldon-core using Only Your Web Browser & Google Cloud

Published in

Analytics Vidhya

16 min readOct 17, 2018

#ModelBoss: https://github.com/SeldonIO/seldon-core

Read This First — Summary

If you’re interested in deploying machine learning models as REST APIs but simply serving up endpoints isn’t good enough any more, you might be ready for model management.

Just as Rocky had Mick, you, my friend, have seldon-core.

If you want an easy way to deploy and then route traffic to models using things like AB tests, then model management is for you. Of course, no one could explain the need for management better than Mick himself:

Seldon-core is a lot nicer than Mick and you won’t have to drink raw eggs

This article is an end-to-end tutorial covering how to:

Train a (very) basic machine learning model with scikit-learn
Serve and route that model as an AB test with seldon-core
Deploy the whole kit and caboodle on kubernetes

In other words, we’ll be spending most of our time stepping through seldon-core. But as your data science dilettante, I’ll include every step to get this running. Just don’t ask me to explain why it works.

Wait!!! I’ve Never Deployed A Model As A REST API Before. Should I Still Read This?

If you’ve never deployed a ML model as a REST API before, that’s 100% okay. Prior experience is NOT required. Seldon-core handles that for us. Hooray.

That said, if you want a super basic example of simply deploying a ML model as REST API on kubernetes, read anything other than my blog post on the subject and you’ll be up to speed.

Wait!!! I Don’t Want A Basic Example. I Want A Real Production Example

Then read this LEGIT example from Daniel Rodriguez. That’s how the pros do it.

Moving on.

All You Need To Follow Along:

Modern web browser
Good attitude

Just like in my last post, I will run everything on Google Cloud because it’s super easy and I still have a bunch of free credits. In addition, we can use the Google Cloud shell which means we don’t even have to run a terminal session locally. This might seem like a silly constraint, but I know how hard it is for me to computer (that’s a technical term, folks), so I want to minimize the cognitive overhead as much as possible. Feel free to judge me, see if I care…

Quick Introduction To Our Topic

To borrow from someone smarter than myself:

If a machine learning model is trained in a Jupyter notebook but never deployed, did it ever really exist?

And while that sinks in let me drop another bomb on you:

If you don’t manage your models, your models will manage you…

I know how asinine that sounds and I’m (mostly) kidding. But there is some truth amidst my nonsense. Deploying machine learning models is not only difficult but woefully insufficient. And while managing deployed models is even harder, it is absolutely required. So what’s a girl to do?

Well, there is no“free-range modeling” here — we have to parent and direct our models in the wild lest they deliver little to no ROI for our business unit. Luckily, if we act boldly there are some sweet open source projects that are ready to fly our aid.

If you’re willing to type some commands in a terminal, this tutorial will teach you how to deploy and manage ML models like a boss (or at least like a cool middle manager). So let’s get started.

Here is what we will cover (in order):

Spin up a kubernetes cluster on Google Cloud using GKE
Install seldon-core using helm
Use s2i to build a Docker image of a basic ML model
Define and deploy our AB test as a model graph using seldon-core

1. Start & Connect To Kubernetes Cluster On GKE

The thing we use to move Docker containers around: https://kubernetes.io/

Let’s use Google Cloud to create our kubernetes cluster. With all the free credits, it hasn’t cost me anything and it has saved me the annoyance of getting a local kubernetes instance running on my laptop. Also, by using a common infrastructure it makes it easier for other to follow along. And of course, we can work entirely from their web-based cloud shell, so you don’t have to run anything on your laptop.

Using the hamburger menu on the left, select Kubernetes Engine and click “Create Cluster”

2. Leaving the default settings unchanged, click Create. Google Cloud now takes a few minutes to spin up our kubernetes cluster. This is a great time to grab yourself a delicious beverage.

3. Now click the Connect button to open our cloud shell and choose Run in Cloud Shell.

We could ssh from our local machine but I’m too lazy for that

And just like that we are connected to our running kubernetes instance — #suh-weet.

2. Install Helm And Seldon-core On GKE

How we install stuff on kubernetes: https://helm.sh/

We use helm to install stuff on kubernetes. I think of helm as the kubernetes package manager of sorts (an equivalent to pip or conda for Python folks). I’d say more about helm but that’s all I know.

How to install Helm — Enter Jonathan Campos

Lucky for us, Jonathan Campos wrote an excellent tutorial on how to install helm on GKE. I knew this was the tutorial for me when I read this gem:

“Now that you’ve seen the code necessary, you can be lazy and just run a script to do the install for you.”

I believe I will, Jonathan. I believe I will.

Here’s the relevant snippet of Jonathan’s tutorial that we need.

#!/usr/bin/env bash

echo "install helm"
# installs helm with bash commands for easier command line integration
curl https://raw.githubusercontent.com/kubernetes/helm/master/scripts/get | bash
# add a service account within a namespace to segregate tiller
kubectl --namespace kube-system create sa tiller
# create a cluster role binding for tiller
kubectl create clusterrolebinding tiller \
    --clusterrole cluster-admin \
    --serviceaccount=kube-system:tiller

echo "initialize helm"
# initialized helm within the tiller service account
helm init --service-account tiller
# updates the repos for Helm repo integration
helm repo update

echo "verify helm"
# verify that helm is installed in the cluster
kubectl get deploy,svc tiller-deploy -n kube-system

What’s going on here? Who cares? Open up a text editor, paste this code inside, and run this thing.

Here’s how I do it. I create a directory called install-helm, and then create a file with the fiendishly clever name of install-helm.sh. My shell commands are:

mkdir install-helm
cd install-helm
vim install-helm.sh
#Enter insert mode in vim with "i"
#Paste your code
#Exit vim with ":x"
bash install-helm.sh

If you see something like the image below, then your install ran successfully

Great, now that we have helm installed, let’s use it to install seldon-core

Install Seldon-core With Helm

Following the seldon-core install materials, we need to install seldon-core-crd and seldon-core components.

To install seldon-core-crd from the terminal run:

helm install seldon-core-crd --name seldon-core-crd --repo https://storage.googleapis.com/seldon-charts \
     --set usage_metrics.enabled=true

To install seldon-core components run:

helm install seldon-core --name seldon-core --repo https://storage.googleapis.com/seldon-charts

Note that if you read the install docs, they instruct you to set some options for RBAC and reverse proxy something or other. I found this confusing so I just ignored it and went with the defaults. Lucky for us everything works fine.

If you see a message like this, then seldon-core is installed successfully.

3. Use s2i To Build A Docker Image Of A Basic ML Model

Now that we have seldon-core installed, it’s time to build our model as Docker image and pass it to seldon-core for deployment and routing.

Install s2i

s2i is handy tool that takes code living in Git repo (or anywhere, I think) and turns it into a Docker image. Why is this useful? Remember: we need everything to live in Docker containers so that Kubernetes can handle all the heavy lifting of deployment and monitoring.

We could go through the steps of writing a Docker file and building a Docker image but with s2i we don’t have to do that. Three cheers!

# Download the installer
wget https://github.com/openshift/source-to-image/releases/download/v1.1.12/source-to-image-v1.1.12-2a783420-linux-amd64.tar.gz# Unpack the installer
tar -xvf source-to-image-v1.1.12-2a783420-linux-amd64.tar.gz# Add the executable - s2i - to your path
  cp s2i ../.local/bin/

Fantastic. We are almost to the Python part. Next up we need to install some dependencies.

Install Scikit-learn, GRPC Tools

We need scikit-learn, the canonical python package for machine learning. We’ll also install grpc tools just in case you feel like making RPC requests later on (I’m just going to show how to call the model over REST)

sudo pip install sklearn grpcio-tools

Easy. Now let’s find a very basic sklearn example we can use for our first seldon deployment.

Clone seldon-core example

Lucky for us, the good people at seldon have created some examples we can use. Let’s clone down their example repo using git and run one of them

# Clone the repo
git clone https://github.com/SeldonIO/seldon-core-launcher.git # cd into the example directory we want to run
cd seldon-core-launcher/seldon-core/getting_started/wrap-model

Now these examples are all provided with a Jupyter notebook. Typically, that is great news. However, I promised that we would run everything in the Google Cloud Shell (this is my whole “the only requirement is a web browser and a good attitude” thing).

Sadly, I could not access a Jupyter notebook instance running in my Google Cloud shell. If you were to run this example anywhere else, you would not have this problem.

That said, here is our simple workaround so we can keep living the browser-only dream. We write the relevant pieces of code as python scripts rather than notebooks! Boom!

Create a basic script to train our model called train_model.py

#train_model.py
import numpy as np
import os
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import Pipeline
from sklearn.externals import joblib
from sklearn import datasets

def main():
    clf = LogisticRegression()
    p = Pipeline([('clf', clf)])
    print('Training model...')
    p.fit(X, y)
    print('Model trained!')

    filename_p = 'IrisClassifier.sav'
    print('Saving model in %s' % filename_p)
    joblib.dump(p, filename_p)
    print('Model saved!')
    
if __name__ == "__main__":
    print('Loading iris data set...')
    iris = datasets.load_iris()
    X, y = iris.data, iris.target
    print('Dataset loaded!')
    main()

I can hear the complaints now. Gus — no train/test split? No standardization nor hyperparameter tuning? What machine learning savagery is this? Have you no decency, sir!?

Calm down. This isn’t a ML tutorial (and even if it were, I wouldn’t know those answers, anyway). We’re here to learn how to deploy models with fun graphs like AB testing, multi-armed bandit, and all the good stuff. Let’s not get caught up on the basic ML. It’s good enough for now.

Save the file and then run it via:

python train_model.py

You should see a new file in your local directory called IrisClassifier.sav. This is our serialized model. Well done!

Wrap Runtime Code With s2i

Now that we have a trained model, it’s time to define a function that returns predictions for our model. Here’s where things start to get fun. Rather than use a web framework like Flask or Tornado to serve our code, we can let s2i build that model serving framework for us.

All we have to do is define our model as a class (like in the example below). Create a file called IrisClassifier.py

#IrisClassifier.py
from sklearn.externals import joblib

class IrisClassifier(object):

    def __init__(self):
        self.model = joblib.load('IrisClassifier.sav')
        self.class_names = ["iris-setosa","iris-vericolor","iris-virginica"];
    # feature_names aren't needed 
    def predict(self,X,features_names):
        return self.model.predict_proba(X)

Here’s what’s going on. When the class is initialized, our code loads our previously trained model IrisClassifer.sav from disk. We then define a predict function that takes the various flower inputs (that is what the Iris Classifier does although that isn’t necessary to understand for this tutorial) and returns the probability as to the classification of the flower.

If this seems fairly basic that’s because it is. We aren’t defining routes or creating apps, we are just writing a straight forward class with a predict function. Not too shabby.

Build Docker Image With s2i

In addition to the class above, s2i also requires a configuration file. This is where we tell s2i how to serve our model. If you cloned down the seldon-core repo per the instruction earlier in the tutorial, this is all available for you. Let’s take a look at the s2i config file by running cat .s2i/environment. You should see the output below. This is how we tell s2i to serve our model as a REST API within a Docker container.

$ #cat.s2i/environmentMODEL_NAME=IrisClassifier
API_TYPE=REST
SERVICE_TYPE=MODEL
PERSISTENCE=0

Also required (and thankfully included) is the requirements.txt file. This is where we tell Docker which dependencies our model needs to run. For this example you can see it’s quite small

$ cat requirements.txtscikit-learn==0.19.0
scipy==0.18.1

Now with our environment and requirements files in hand, we can run the s2i builder tool to build a Docker image that serves our model. Here’s how to do that.

First — you need a docker image repository (this is where we will store the docker image once it is built). A great (and free) way to do that is Docker Hub. If you haven’t done so already, head over to Docker Hub and create a free account. Remember your username, because we’ll need it in a second.

Second — we need to login to docker. We do this from the command line by running docker login. Enter your username and password (same as to login to Docker Hub) and you’re good to go.

Now lets set an environment variable DOCKER_REPO to our Docker Hub username and then run the s2i build command like so:

env DOCKER_REPO=gcav66 #Replace "gcav66" with your Docker Hub unames2i build . seldonio/seldon-core-s2i-python3 ${DOCKER_REPO}/sklearn-iris:0.1

What we are doing here is using one of the pre-defined seldon-core python3 docker images and then adding in our model and dependencies. The details are less important than your fingers typing these key strokes. This is the end of our boilerplate code.

Hot diggity! We are moving right along

Push Docker Image To Docker Repo

Now the final step of this section is to push our newly built Docker image to our Docker Hub account. Here’s how to do that:

docker push gcav66/sklearn-iris:0.1 #replace "gcav66" with the DOCKER_REPO environment variable or just hard code your Docker Hub username

Boom! Now we have our model image successfully pushed to our Docker Hub repo. We are finally ready to deploy it with seldon-core. Now is when the real fun begins

4. Define Our AB test As A Model Graph Using Seldon-core

This is where the rubber meets the road.

The first thing we need to do is enable port forwarding. Open a new shell in Google Cloud Shell by clicking the “+” button, per below

In this new terminal, paste the following command

kubectl port-forward $(kubectl get pods -l app=seldon-apiserver-container-app -o jsonpath='{.items[0].metadata.name}') 8002:8080

If you see an output like (below), then it ran successfully:

Forwarding from 127.0.0.1:8002 -> 8080
Forwarding from [::1]:8002 -> 8080

Now let’s create a new python script deploy.py

Paste in the following code:

import requests
from requests.auth import HTTPBasicAuth
from proto import prediction_pb2
from proto import prediction_pb2_grpc
import grpc
try:
    from commands import getoutput # python 2
except ImportError:
    from subprocess import getoutput # python 3

API_HTTP="localhost:8002"
API_GRPC="localhost:8003"

def get_token():
    payload = {'grant_type': 'client_credentials'}
    response = requests.post(
                "http://"+API_HTTP+"/oauth/token",
                auth=HTTPBasicAuth('oauth-key', 'oauth-secret'),
                data=payload)
    print(response.text)
    token =  response.json()["access_token"]
    return token

def rest_request():
    token = get_token()
    headers = {'Authorization': 'Bearer '+token}
    payload = {"data":{"names":["sepallengthcm","sepalwidthcm","petallengthcm","petalwidthcm"],"tensor":{"shape":[1,4],"values":[5.1,3.5,1.4,0.2]}}}
    response = requests.post(
                "http://"+API_HTTP+"/api/v0.1/predictions",
                headers=headers,
                json=payload)
    print(response.text)

Now there is a lot here and I’ll admit it looks pretty scary, but there is only one key piece to which you need to pay attention — payload

payload = {"data":{"names":["sepallengthcm","sepalwidthcm","petallengthcm","petalwidthcm"],"tensor":{"shape":[1,4],"values":[5.1,3.5,1.4,0.2]}}}

All we have to do is tell seldon-core:

The names of our features, e.g, “sepallength…”
The shape of our features, e.g. [1,4](one row, 4 columns)
The actual values, e.g., [5.1, 3.5,1.4]

So as you serve other examples with seldon-core, this is the piece you’ll likely find yourself editing. If you send a payload with “names”, “shape”, and “values” (with the latter two falling under the “tensor” parent), you’ll be good to go.

One more nasty file to edit and we’re all set. The next task is define for seldon-core what our graph looks like.

Let’s start simple and just deploy our model as a vanilla REST API. After we get the hang of deployments with seldon-core, we will up the ante and build out our AB test. I know that is what you are here for and I won’t let you down.

In your current directory you’ll find a file called TMPL_deployment.json.

(And don’t worry if you get lost in your cloud shell, the directory you need to be in is * seldon-core-launcher/seldon-core/getting_started/wrap-model*)

Open TMPL_deployment.json and let’s edit one value:

{
    "apiVersion": "machinelearning.seldon.io/v1alpha2",
    "kind": "SeldonDeployment",
    "metadata": {
        "labels": {
            "app": "seldon"
        },
        "name": "sklearn-iris-example"
    },
    "spec": {
        "name": "sklearn-iris-deployment",
        "oauth_key": "oauth-key",
        "oauth_secret": "oauth-secret",
        "predictors": [
            {
                "componentSpecs": [{
                    "spec": {
                        "containers": [
                            {
                                "image": "gcav66/sklearn-iris:0.1",
                                "imagePullPolicy": "IfNotPresent",
                                "name": "sklearn-iris-classifier",
                                "resources": {
                                    "requests": {
                                        "memory": "1Mi"
                                    }
                                }
                            }
                        ],
                        "terminationGracePeriodSeconds": 20
                    }
                }],
                "graph": {
                    "children": [],
                    "name": "sklearn-iris-classifier",
                    "endpoint": {
                        "type" : "REST"
                    },
                    "type": "MODEL"
                },
                "name": "classifier",
                "replicas": 1,
                "annotations": {
                "predictor_version" : "0.1"
                }
            }
        ]
    }
}

The one value you must change is the name of your image to pull from your Docker Hub username.

#Swap "gcav66" with your username
"image": "gcav66/sklearn-iris:0.1",

That is it for changes. It is worth taking a look at the data inside the graph key. We will make some edits here shortly to change our deployment from a basic endpoint to an AB test. But first, let’s get this thing running.

We create our seldon deployment by running:

kubectl apply -f TMPL_deployment.json

Then take a 30 second break and verify that your seldon app is running by pasting this into the shell

kubectl get seldondeployments sklearn-iris-example -o jsonpath='{.status}'

In the output, if you see *replicas:1* you are good to go. This may take a minute or two but shouldn’t be longer than that. Our model is finally deployed. Time to hit the endpoint.

Now, if you recall, we are running this entire example in the Google Cloud Shell (so we can run everything using solely our web browser). One limitation of this otherwise remarkable tool is that it doesn’t easily support running jupyter notebooks (I know I mentioned this previously but it bears repeating). So instead of running the notebooks cells that the seldon team provided, we’ll just call the deploy_model.py script that we created earlier.

Here is how to do that. Open an ipython shell by typing: ipython

Then from your new ipython shell, type:

from deploy_model import rest_request

“deploy_model” corresponds to the deploy_model.py file in our local directory and rest_request is the name of the function we defined that calls our API.

Then type rest_request() to make an API call:

In [3]: rest_request()
{"access_token":"b01d867a-ebf1-4d7f-8764-c8b11ae43461",
"token_type":"bearer",
"expires_in":43199,
"scope":"read write"}
{  "meta": 
{    "puid": "j64h9tqf404rv2j3ikc8r98sdf",    
"tags": {    },    
"routing": {    }  },  
"data": 
{    "names": ["iris-setosa", "iris-vericolor", "iris-virginica"],    "tensor": {      "shape": [1, 3],      "values": [0.9974160323001712, 0.002583770255316237, 1.9744451239167056E-7]    }  }}

Given our input data, the model returns that it predicts the first value (iris-setosa) is the correct classification result (99%)

Bravo! You have just deployed and tested your first model with seldon-core! But you didn’t come all this way to stop with a basic model deployment. Let’s update our model graph for an AB test.

Exit the ipython shell and let’s delete our existing deployment by running:

kubectl delete -f TMPL_deployment.json

Now let’s create a new model graph json file ab_test.json

Paste in the following code:

{
    "apiVersion": "machinelearning.seldon.io/v1alpha2",
    "kind": "SeldonDeployment",
    "metadata": {
        "labels": {
            "app": "seldon"
        },
        "name": "sklearn-iris-example"
    },
    "spec": {
        "name": "sklearn-iris-deployment",
        "oauth_key": "oauth-key",
        "oauth_secret": "oauth-secret",
        "predictors": [
            {
                "componentSpecs": [{
                    "spec": {
                        "containers": [
                            {
                                "image": "gcav66/sklearn-iris:0.1",
                                "imagePullPolicy": "IfNotPresent",
                                "name": "classifier-1",
                                "resources": {
                                    "requests": {
                                        "memory": "1Mi"
                                    }
                                }
                            }],
                        "terminationGracePeriodSeconds": 20
                    }},
                {
                    "metadata":{
                        "labels":{
                            "version":"v2"
                        }
                    },
                        "spec":{
                            "containers":[
                            {
                                "image": "gcav66/sklearn-iris:0.1",
                                "imagePullPolicy": "IfNotPresent",
                                "name": "classifier-2",
                                "resources": {
"requests": {
                                        "memory": "1Mi"
                                    }
                                }
                            }
                        ],
                        "terminationGracePeriodSeconds": 20
                                   }
                                   }],
                "name": "classifier",
                "replicas": 1,
                "annotations": {
                    "predictor_version": "v1"
                },
                "graph": {
                    "name": "random-ab-test",
                    "endpoint":{},
                    "implementation":"RANDOM_ABTEST",
                    "parameters": [
                        {
                            "name":"ratioA",
                            "value":"0.5",
                            "type":"FLOAT"
                        }
                    ],
                    "children": [
                        {
                            "name": "classifier-1",
                            "endpoint":{
                                "type":"REST"
                            },
                            "type":"MODEL",
                            "children":[]
                        },
                        {
                            "name": "classifier-2",
                            "endpoint":{
                                "type":"REST"
                            },
                            "type":"MODEL",
                            "children":[]
                        }
                    ]
                }
            }
        ]
    }
}

Here is what is important. We tell seldon which model we want to run (our boring iris classifier both times), the graph we want to run (abtest), and how we want traffic routed between each deployed model (randomly).

As you can imagine, we can easily extend this to other model routing scenarios using different models.

Now let’s deploy our new model graph:

kubectl apply -f ab_test.json

After waiting slightly longer this time (after all, seldon has to deploy *two* models instead of just one), we verify that our models are deployed by running

kubectl get seldondeployments sklearn-iris-example -o jsonpath='{.status}'

If we see that *replicas:1* we are all set to send some requests against our AB test.

Now we fire up the ipython shell once more and repeat our previous commands. Note that you can use the up-arrow on your keyboard if you don’t feel like typing from deploy_model import rest_request and rest_request() again.

Now you should see your model results, this time annotated as being routed to one of our two deployed models

We can see that in the “routing” section seldon annotates the model number as “1” or “0”, indicating from which model our results are returned.

Pat yourself on the back. Using Kubernetes, Scikit-learn, and Seldon-core you just deployed your first ML model graph (AB test) using just Google Cloud and your web browser.

One final step. Let’s tear down our deployed model. Almost breaks my heart (but we don’t want to incur extra charges from Google Cloud)

kubectl delete -f ab_test.json 
#To same goodbye for now

Next Steps

Admittedly, this is a very basic example. But if you’ve made it this far perhaps you are willing to go a little further. I know I plan to try some different model graphs for myself. I plan to play around with building a variety of different models and routing traffic to them using not just AB tests, but multi-armed bandit as well. I hope you’ll join me — the model management world is now your oyster!

More to come on this. And as always,

Stay Beautiful