Ship ahoy! Serving machine learning models using Clipper

Published in

wehkamp-techblog

6 min readNov 26, 2018

Like in most companies nowadays, machine learning is a hot topic at Wehkamp. We have data scientists working with tools like Databricks to create all sorts of models which contribute to enhancing our customer’s experience. In order to actually make use of these models, we need to serve them somehow and somewhere. Preferably, in a simple and automated way.

Until now, each time we needed to deploy a new or updated model we went through an extensive manual process of putting the model in some sort of custom web application. Basically, wasting time which could be spent on, well… creating new models.

Meet Clipper: an open-source low-latency prediction serving system for machine learning developed by the awesome people at UC Berkeley RISE Lab, which promises to take a lot of our worries and troubles away.

We will walk you through the first steps we made in our Clipper journey.

Clipper concept

Clipper is designed to make prediction serving a lot easier. Specifically, the part where you want to take your model and put it into your production environment fast and without the hassle. Furthermore, it introduces a decoupling of front-facing applications and the actual number crunching. As a result, applications will not be slowed down because they need to make complicated predictions themselves. Instead, apps query Clipper, which goes to work and feeds back the results fast and efficient. Besides that, you can scale all the bits and pieces independently.

In short, Clipper:

Enables you to do deploy your model by running just a few lines of code and with support for a lot of machine learning frameworks.
Creates Docker containers out of your models for simple cluster and resource management.
Makes it easy to update or rollback models on the fly.
Allows you to set service level objectives for reliable query latencies.

Overall the Clipper architecture looks something like this:

First steps

Figuring Clipper out

It is very easy to start experimenting with Clipper on your local machine so that makes a good starting point. You only need Docker, Python and the clipper_admin package:

$ pip install clipper_admin

By following the excellent quick start guide on clipper.ai you can have the platform running on your machine within minutes.

Deploying your models

Things get more fun when you start trying to deploy your own machine learning models onto your Clipper cluster.

Some useful python packages to help you with this (depending on the framework) are sklearn, pyspark, scipy, matplotlib, cloudpickle, and joblib.

For example, one of our data scientists handed us a pickle file containing a scikit-learn model.

To deploy it we had to deserialize the pickle file into an object so we could hand the model off to Clipper:

>>> from sklearn.externals import joblib>>> model = joblib.load(‘./model.pkl’)

Once the model object is created you can tell the Clipper management frontend to pick it up and deploy it into its own container. But before we can really do this, we need to decide on some stuff:

How are we calling this model?
Let’s go with something original: test-model.
What version of the model are we deploying?
This will be version 1.
What type of input does the model expect (i.e. integers, strings, etc.)?
Our model expects the input query to contain strings.
The prediction function of our model:
In our case, we need to point Clipper to model.predict.
Does the model depend on any packages in order to run?
Our model is a scikit-learn model so it needs the sklearn package.

Having this information we can tell Clipper to deploy our model using the following command:

>>> from clipper_admin.deployers import python as python_deployer>>> python_deployer.deploy_python_closure(
clipper_conn,
name=”test-model”,
version=”1",
input_type=”strings”,
pkgs_to_install=[“sklearn”],
func=model.predict)

And that's it! Now Clipper will build a container image of our model and deploy it on our machine next to the other containers already running.

To get our predictions we only need to create an application on the Clipper query frontend which handles all the requests and connect it to our model:

>>> clipper_conn.register_application(
name=”pipeline”,
input_type=”strings”,
default_output=”-1.0",
slo_micros=100000)>>> clipper_conn.link_model_to_app(
app_name=”pipeline”,
model_name=”test-model”)

Finally, we can get our predictions!

Where to float the Clipper boat?

Once we figured out the basics we started to think about where we wanted our Clipper to land. As Clipper has built-in container managers for Docker and Kubernetes we had rather quickly decided that we should build our production environment in Kubernetes using Amazon EKS, and to use Terraform to provision it.

Running Clipper on Kubernetes is as easy as pointing the clipper-admin tool in your python interpreter to your cluster. In our case we also had to set up a proxy with kubectl in order to get an encrypted connection:

$ kubectl proxy --port 8080$ python>>> clipper_conn = ClipperConnection(KubernetesContainerManager
(kubernetes_proxy_addr=’127.0.0.1:8080', useInternalIP=True))>>> clipper_conn.start_clipper()

If you have chosen to run on Kubernetes you also need to have a Docker registry to push your model container images to, from which your Clipper cluster can then deploy them. To make this work you need to authenticate Kubernetes to your private Docker registry:

Create the secret key:

$ kubectl create secret docker-registry myregistrykey \
--docker-server=yourregistrydomain.io \
--docker-username=yourusername \
--docker-password=yourpassword \
--docker-email=email@server.com

2. Patch the Kubernetes service account with your key:

$ kubectl patch serviceaccount default -p ‘{“imagePullSecrets”: [{“name”: “myregistrykey”}]}’

Now you are ready to deploy models to Clipper on Kubernetes. Just make sure to point clipper_admin to your container registry in the deploy command:

>>> python_deployer.deploy_python_closure(
clipper_conn,
name=”test-model”,
version=1,
input_type=”strings”,
pkgs_to_install=[“sklearn”],
func=model.predict,
registry=”yourdomain.io/yourregistry”)

To find your Clipper query-frontend IP you can issue the following command from your terminal and look for the line saying Endpoints:

$ kubectl describe service query-frontendName:                     query-frontend
Namespace:                default
Labels:                   ai.clipper.container.label=
                          ai.clipper.name=query-frontend
Annotations:              <none>
Selector:                 ai.clipper.name=query-frontend
Type:                     NodePort
IP:                       172.20.49.64
Port:                     1337  1337/TCP
TargetPort:               1337/TCP
NodePort:                 1337  32163/TCP
Endpoints:                10.229.117.93:1337
Session Affinity:         None
External Traffic Policy:  Cluster
Events:                   <none>

Happy querying!

What’s next

Before we can really benefit from our Clipper platform we need to do some other important things. We might want to connect Clipper’s Prometheus to our own monitoring platform so SRE can easily troubleshoot in case of emergency. Furthermore, we want to instantiate a separate Redis cluster outside of Clipper to persist all the settings regarding models and apps.

Most importantly, we need to implement a proper CI/CD pipeline for the whole process of bringing a model into production. Ideally, all that the data scientist has to do is place a file containing his model into an S3 bucket. In our case, Jenkins would play a big part in this, continuously deploying and updating the models to make sure that Product teams quickly have an endpoint to query with their applications.

Skipper, full speed ahead!