Machine Learning Automation (1): Clipper AI
This article is part of a series that explores machine learning platforms, to understand the service they provide and how they can help you deliver better models in production.
What is Clipper ?
Clipper defines itself as:
a prediction serving system that sits between user-facing applications and a wide range of commonly used machine learning models and frameworks.
So Clipper is a reverse proxy. It is called via a REST Api and communicates with models through a low overhead protocol built on top of zeromq.
It consists of 3 buildings blocks:
- Admin (or management): This service holds the configuration and allows for read and update. It requires a simple Redis database. It is written in Python
- Query Frontend: This is the reverse proxy, it is written in C++ for performance and to avoid garbage collection. It implements the zeromq protocol and communicate with models that way.
- Model: Each model is packaged in a container that handles remote procedure calls coming from the query frontend.
Clipper drastically simplifies the deployment step if you use one of the supported orchestrators (Docker Cli, Kubernetes). It is relatively simple to support other orchestrators though, and I will demonstrate this below.
From Jupyter Notebook to Deployment on Nomad
This following example is based on a PR that is still in review at the moment.
Prerequisites:
- You need a Nomad cluster with Consul DNS and Fabio load balancer. You can follow this tutorial for Scaleway and read the documentations available for Nomad, Consul and Fabio.
- You need Consul DNS configured on your machine, make sure that you can resolve *.service.consul addresses.
You can find code samples and resources for nomad deployments on this Git repository.
Our goal is to train a simple model on our machine and to deploy it to our Nomad cluster.
We start by building a simple model with Scikit-Learn. We will train a simple linear regression on the diabetes dataset from Scikit Learn. This fits in a few lines of code.
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegressiondiabetes = datasets.load_diabetes()X_train, X_test, y_train, y_test = train_test_split(diabetes.data, diabetes.target, test_size=0.2, random_state=0)
clf = LinearRegression()
clf.fit(X_train, y_train)
Deploying
Now that we have our model, we want it to be served to our users through a REST API. Without Clipper, we have to go through a number of steps that are mostly done by developers.
- Create a Flask App and its API
- Create metrics and health checks
- Dockerize the application
- Write a *.nomad script to deploy
Clipper lets us avoid all of these steps with a few lines of Python code !
First we create a connection to the Nomad cluster and set a number of parameters, such as the DNS that we use (Consul) and the load balancer (Fabio). You need to configure Fabio with tcp proxy enabled on one port. That’s because the query frontend and the model containers communicate with a TCP procotol. I used the port 7000 in my example.
from clipper_admin import ClipperConnection, DockerContainerManager, NomadContainerManager, ConsulDNS, FabioLoadBalancernomad_ip_addr = ‘10.65.30.43’dns = ConsulDNS() # We use Consul for DNS resolutioncontainer_manager = NomadContainerManager(
nomad_ip=nomad_ip_addr,
dns=dns,
load_balancer=FabioLoadBalancer(
address='fabio.service.consul',
http_proxy_port=9999,
tcp_proxy_port=7000
)
)
clipper_conn = ClipperConnection(container_manager)
clipper_conn.start_clipper()# use clipper_conn.connect() later
Once you execute this code, you should see three jobs starting in the Nomad UI.
To serve a model, you have to create an ‘Application’. You can think of it as a project, which will regroup all the algorithms that complete the same goal. You specify its name, the input data type that it receive, the default data it will output if an error occurs and a Service Level Objective, which is the speed at which you expect from your predictions to be delivered.
Clipper will try to find a balance between latency and throughput, using tricks such as batching.
my_application_name = 'app-1'
clipper_conn.register_application(
name=my_application_name,
input_type="doubles",
default_output="-1.0",
slo_micros=10000000
)
Once we have created our application, we can query it. This is not really useful yet because we have no model to solve the task.
{
"query_id":13,
"output":1.0,
"default":true,
"default_explanation":"No connected models found for query"
}
To serve a new model on our application (or project), we use one of the deployers available. Clipper provides implementations for a number of frameworks such as TensorFlow, Keras, PyTorch and more. They also have a standard Python Function deployer which we will use to deploy our Scikit Learn model. You only need access to a Docker registry (you can use Docker Hub). In this example, I used a private registry hosted on Scaleway.
from clipper_admin.deployers import python as python_deployermy_model_name='scikit-learn-model'
my_model_version = 1# I use a private registry hosted on Scaleway
container_registry = 'rg.fr-par.scw.cloud/hyperplan'python_deployer.deploy_python_closure(
clipper_conn,
name=my_model_name,
version=my_model_version,
input_type="doubles",
func=clf.predict,
pkgs_to_install=['scikit-learn'],
registry=container_registry
)
If you need some preprocessing, you can modify the func argument.
def my_prediction_func(model, inputs):
return [model(x) for x in inputs]
The deployer function will build a Docker image with all the required dependencies, push it to your container registry and submit a new job to Nomad. You should see the instance in your Nomad UI.
There is one last step before we can use our model, we need to link it with our application.
clipper_conn.link_model_to_app(
app_name=my_application_name,
model_name=my_model_name
)
Let’s test it! We will call it through our fabio load balancer (which listens on port 9999 for http traffic). Clipper is served on the following address.
http://fabio.service.consul:9999/clipper/
You can create prediction on our application (app-1) by calling the route on
http://fabio.service.consul:9999/clipper/app-1/predict
Here is an example with python
import requests, json, numpy as np
headers = {"Content-type": "application/json"}
requests.post(
"http://fabio.service.consul:9999/clipper/{app_name}/predict".format(app_name=my_application_name),
headers=headers,
data=json.dumps({"input": list(np.random.random(10))})
).json()
{'query_id': 12, 'output': 248.95820966558657, 'default': False}
My advice
Clipper is a great tool and I recommend it to anyone who has simple needs to serve machine learning models. It is easy to use, extendable and performant. I just have some concerns regarding some design decisions:
- Clipper prefers to not return results than to return results that are outside the tolerated SLO. I would have preferred monitoring and alerting when this happends.
- 1:1 relationship between an application and a model, with Clipper, you cannot do AB Testing on a model. This means that you either need to add another Http proxy (most likely Flask) on top of the query frontend as stated in this issue, but Python is a garbage collected language and that would lead to unexpected latencies and overall reduced performance.
- Query Frontend container is stateful. It needs to hold a connection to at least one instance of each model. As you add a new query frontend instance to the network, you might end up in a situation where it does not hold a connection to each model. Model instances may use DNS to resolve the ip address of the query frontend, giving no guarantees that two instances of model A will connect to the two different instances of query frontends. This is a potential issue. A way to solve this would be to use a stateless protocol, such as gRPC, but it would require to make some drastic changes in the architecture of Clipper.
You can find my code on Github, with additional code samples and information regarding my infrastructure setup.