Polyaxon, Argo and Seldon for Model Training, Package and Deployment in Kubernetes

The ultimate combination of open-source frameworks for model management in Kubernetes?

Daniel Rodriguez
Analytics Vidhya
15 min readOct 16, 2018

--

n it’s simplest form, model management can be seen as training one machine learning model, then repeating this tens, hundreds, or thousands of times with different data, parameters, features and algorithms to finally deploy the “best” one. A more complete definition would be that model management involves developing tooling and pipelines for data scientists to develop, deploy, measure, improve and iterate so they can continue making better models not only for one particular problem but wider range of datasets and algorithms.

At the same time model management includes all the requirements of more traditional applications such as API development and versioning, package management, containerization, reproducibility, scale, monitoring, logging and more.

The objective of this article is to propose a repeatable pipeline to make your life easier and make iterations faster. I think this quote captures the spirit pretty well.

We don’t deploy one model, we deploy the process for repeatedly making more. When you deploy a ML model into production you are not saying “this is the best model and we should use it forever”. What it actually means is deploying the pipeline for model building and making it more repeatable.

Juliet Hougland

How Did We Get Here

This year we have seen the rise of many machine learning platforms, just at Strata, we have seen the tide change from data storage solutions, to SQL engines for data in those storage solutions, to Spark, to data science platforms, to the now popular machine learning platforms.

In the same way that Google released the MapReduce and other papers for the rest of the world to follow the years after with the Hadoop ecosystem, tooling from Google and a few other big tech companies has come up to solve the Machine Learning problem. I even connect this in a way with Kubernetes, that was so young 2 years ago and has become a key part of every cloud provider offering right now. At this point, it would be dumb not to bet on the CNCF stack. Projects also includes TensorFlow and more recently KubeFlow that provides more guidance on a combination of tools.

An ML model has a lot of different requirements, for development/training you need GPUs, the packaging is more complicated than just a JAR file since there is no one language you can use for everything, you need Python, R with other parts written in C and C++. The application went from 10s of Mb to +100s of Mb since models have a lot of data inside of them. They went from endpoints being basically database operations that took a couple of milliseconds to the smarter operations that make predictions but take longer to execute, require more CPU and more RAM.

At the same time, the traditional requirements of logs, monitoring, security, scalability, and others things that more traditional applications have are also needed for these new types of applications. If you did A/B testing for testing of sections on a website, you will now do A/B testing for all your ML models to see which one is performing better. If you scaled a Node web server you now need to scale a TensorFlow Serving server, and so on. At the same time development of the ML models is also much more complex and takes more time since it requires testing combinations of algorithms, features and what not.

You can get so much value from ML compared to traditional applications but the investment you need to do is huge in many areas.

This Experiment

This article explores the combination of a couple of new technologies for model management to provide a pipeline that solves three primary groups of problems:

  1. Distributed hyper-parameter training, that could also be used to actual distributed training: Polyaxon
  2. A container image building pipeline that uses s2i: Argo
  3. Deployment of a model that can handle single or more complex deployments: Seldon

The final output is an ML pipeline that trains multiple models, explore the metrics to (manually) pick the best, package the model as a docker image and deploys it as a REST API.

Workflow diagram

All the code needed to follow along can be found here: danielfrg/polyaxon-argo-seldon-example. Locally you won’t need much more than a couple of client CLIs and clone a couple of repos.

Infrastructure and installation

This section is a small reference from each project documentation so be sure to read that if something here doesn’t work or gets outdated.

The next few sections will walk through the installation and configuration of five components that we’ll use to build a model deployment pipeline:

  1. Kubernetes cluster,
  2. NFS for persistent storage,
  3. Polyaxon for distributed model training,
  4. Argo to build a model containerization workflow, and
  5. Seldon for model deployment.

Once we have installed and configured each of these components, we’ll train, build, and deploy a model starting in Section “Polyaxon: Training models”. So just go there if you want to skip all the installation bits.

Kubernetes cluster

I used GKE but it could be any Kubernetes cluster, either use the GCP console or with a command like this one:

Configure your local kubectl:

NFS: Single Node Filer

This is where all the code, models and data is saved. It’s super easy to create one using this GCP Single node file server template.

NFS Server template

We need to create a couple of directories in the NFS server, so SSH into the node by copying the command available in the post install screen or just clicking the “SSH to …” button.

SSH into the NFS Server

Once in the instance, create some directory structure for Polyaxon and Jupyter Lab and Argo later.

Get the (private) IP of the NFS server either with the command below or just search for it on the Google Cloud console in the VMs. In my case it is10.240.0.8

Find NFS Server IP

Finally, create some PVCs for Polyaxon and the other tools to use. Note that you need to edit the *-pvc.yml files and add the correct IP Address:

Installing Polyaxon

With the PVCs already created, it’s relatively easy to install it based on the docs. First some permissions for the tiller (helm server) service account.

Now we can start Polyaxon using Helm, the only extra thing we need is a polyaxon-config.yml config file and run Helm:

When the command finishes you will get something like this:

So execute those instructions and login using the polyaxon-cli. The default username:password pair is: root:rootpassword:

You can also visit the URL that is printed to visit the Polyaxon UI.

Polyaxon Projects

Installing Argo

Full docs here (the permissions section is important), basically:

Now we could visit the Argo UI that looks like this with a couple of workflows:

Argo workflows

Installing Seldon

There are multiple ways to install Seldon, I decided to use Helm because I honestly don’t fully understand Ksonnet.

Run this in another terminal to proxy the Ambassador service:

We have finally installed all we need, let’s train and deploy some models!

Polyaxon: Training models

Polyaxon is a tool for reproducible machine learning. It allows you to push parameterized code in for example TensorFlow or PyTorch for Polyaxon to run in what they call an experiment. Experiments can be part of an experimental group for doing hyper-parameter search.

Polyaxon takes care of executing the jobs based on imperative definitions, in a similar way as Kubernetes does, it also takes care of saving the metrics and outputs of the jobs for analysis and selection. It has some features we are not gonna use here to do distributed training or using Tensorboard.

Following the Polyaxon docs we can create a new project based on the examples.

I wanted to test the hyper-parameter search so the polyaxon file looks like this:

Now we can run the experiment:

Based on the parameter space this command will create one experiment group with 10 experiments in that group. You can see the progress, logs, parameters, environment and more in the Polyaxon UI.

Polyaxon Experiments

When the experiments are finished you’ll have 10 models that have been trained and you can use Polyaxon to view metrics for those models and pick the best-performing ones to deploy. Another option inside Polyaxon is to deploy Tensorboard server to view the metrics there if you have saved the output in that format, here I just used the native Polyaxon metrics.

Polyaxon native metrics

You can take a look and download the trained models by just looking at the NFS server we launched before and going to the group and experiment directory, for example:

Polyaxon output

From Polyaxon to Argo

Now that we have trained and serialized models we need to package it and deploy it using Seldon. This requires some manual work as you need to create a Python class for Seldon to use, create requirements.txt and move the serialized model to the right location. Finally, we need to use s2i to create the image using the base Seldon image.

All this process can be done manually locally by downloading the serialized model and using s2i but in the spirit of automating things, I decided to use Argo for this task.

I also wanted to keep most things in the Kubernetes cluster where models/data and others things are close to each other, so I used a Jupyter Lab server which you can get up and running with this Kubernetes yaml spec:

This Jupyter Lab installation will have the right mounts for you move the serialized model:

After that create the files required for Seldon: the Python class for Seldon, the .s2i directory with the environment file inside and the requirements.txt. All of this is available in the repo. At the end should look similar to this:

Jupyter Lab with Seldon code

The actual Python class that Seldon uses is this:

This basically loads the seralized model in the __init__ function to later user that in the predict function, there we have some simple PyTorch code to preprocess the inputs to what the model expects.

We now have all we need to package the model as a docker image using Argo that Seldon can use.

Argo: Creating a docker image for the model

Argo is a workflow manager for Kubernetes. We will use Argo to build a reusable container-native workflow for taking the serialized model into a container that can be later deployed using Seldon.

To support this I created a simple docker image that executes s2i and pushes an image, Dockerfile is here and the docker image is available as danielfrg/s2i.

Since we are going to push an image to Docker hub first we need to create a secret with the credentials to login to the registry.

With the image we can use Argo to manage the execution, the Argo pipeline mounts 3 things to the container:

  1. The Polyaxon volume to access the code we wrote in the previous section.
  2. The Docker socket to build the image and push
  3. The Docker credentials to push to the repository

Then just execute the argo pipeline

The pipeline uses s2i with the base Seldon image seldonio/seldon-core-s2i-python3, builds an image tagged danielfrg/seldon-mnist:0.2 and push that new image to Docker hub. Argo will handle all the execution and you can see logs and more in their UI:

Argo logs for one workflow

Now that we have an image in Docker Hub we can use Seldon to deploy the image.

Seldon: Model deployment

Seldon is a great framework for managing models in Kubernetes. Models become available as a REST APIs or as a gRPC endpoints and you can do fancy routing between the models including A/B testing and multi-armed bandits. Seldon takes care of scaling the model and keeping it running all your models with a standard API.

Seldon uses its own Kubernetes CRD and it will just use the Docker image that the Argo pipeline pushed, the Seldon deployment CRD spec looks like this:

This will create a couple of Kubernetes Pods that include the model running and handle other routing stuff.

After all this work we can finally query the model!

Since we deployed a REST API we can query this deployed model using a little bit of Python to read and image and make a HTTP request:

The output for this will be a prediction for the image, the 87th image is a 9 and the prediction is indeed that.

Seldon has a lot of other features that are not explored here, check their website.

Thoughts

This looks really hard, there must be a better way! There is probably is much better ways. Most companies with big data science teams have been building similar infrastructure and some are available to use, for example:

  1. TFX from Google/TensorFlow, paper here
  2. KubeFlow, a collection of tools also from Google, it uses Seldon
  3. Folilla from StictFix. Also, check out Juliet Hougland great talk on how Stitch Fix does Production Model Deployment
  4. IBM/FfDL
  5. Uber Michelangelo
  6. Clipper, a more low level tool for serving models
  7. Instacart Lore

There is also of course companies that have products you can buy around this:

  1. Anaconda Enterprise (disclaimer this where I work)
  2. Domino Data Lab
  3. DataBricks
  4. all cloud providers
  5. and many more.

The options are endless depending on your use case and you should pick and build in top of everything. Kubernetes is a real platform in the sense that you can actually extend it as you need, this is an example of that. There is a lot of stuff missing that you can possibly add to make the ultimate model management platform. For example, monitoring, multiple users, authentication and security, audits, a catalog for the models as docker images, central storage should be NFS or object store or something else?

Each of this features will increase the cost significantly.

Jupyter Hub and Binder. This process integrates relatively well with some previous work I have posted here on Jupyter Hub in Kubernetes for a multi-user dev environment. Great multi-user collaboration is a key part of the process. Also don’t forget that the end result is usually not an API but some sort of application, dashboard or report and deployment for those applications is also important.

Why not just use Argo to do the model training? You could, I think Polyaxon right now is better for model training since it just does that, Argo it more general in nature and that's great, but specialized tools sometimes are better. Argo’s architecture is more complex and extensible so other tooling could be build on top of it, I imagine that will happen eventually.

What to do with this? Take it as a simple experiment to show what is possible today with open-source frameworks around model management on Kubernetes. Take this and adapt it to your needs. The objective is to make your life easier, make iterations faster and models better.

This pipeline is pretty useful. However, it’s far from being complete. This will save you some time right now, but some manual parts need to be automated. You will need to work on integrating this (or anything) into your existing workflow around Git and CI/CD (Jenkins).

Other links and things I used

  1. Polyaxon docs
  2. Polyaxon examples
  3. FfDL-Seldon/pytorch-model
  4. seldon-core
  5. Argo DinD

--

--