Want to use Pycaret in production? Deploy an MLflow Tracking server with Helm

Oswaldo Gomez
5 min readSep 30, 2020

PyCaret is a low-code machine learning library in Python that allows you to go from preparing your data to deploying your model seamlessly. Perhaps the most useful production-grade feature is its ability to seamlessly connect to MLflow Tracking, which is an API an UI for logging parameters, code versions, metrics, and output files when running machine learning code and also for keeping well organized multiple experiments, comparing metrics and visualizing the results. To use MLflow, simply pass these parameters into Pycaret’s setup function like this:

Setup function of Pycaret’s anomaly module setting log_experiment to True, naming the experiment and storing the plots with via log_plots True.

With one single line, we have instantiated the anomaly module, named all the experiments my_medium_experiment and logged relevant metadata like hyperparameters, metrics and even plots into MLflow tracking. In order to visualize all the metadata from our experiments, all we have to do is open a terminal and run (or prepend an exclamation from Jupyter notebook):

After doing so, you can go to http://localhost:5000/#/ and you will see this nice UI

An empty MLflow UI running in a localhost.

This is already pretty cool, but if your team is larger than yourself, then each Machine Learning engineer would have its own personal MLflow server in their local computers which is far from ideal for sharing artifacts, competing to get the best model or simply keeping track of a production model for explainability purposes. So what do we need? MLflow Tracking Server

MLflow Tracking server on EKS using PSQL and MinIO Helm charts

There are many ways to spin up an MLflow Tracking server. One could imagine doing it via docker on an EC2 instance, right? Well the image below says it all and turns out the answer is that we need Kubernetes. As a prerequisite, you need to have the Helm CLI installed and initialized. Please find here the instructions in case you haven’t install Helm client yet.

Taken from “What Is Kubernetes: a Brief Guide”. (2020). Retrieved 9 September 2020, from https://www.sam-solutions.com/blog/what-is-Kubernetes/
For more information visit https://www.eksworkshop.com/010_introduction/eks/

Another prerequisite is having a Kubernetes cluster (e.g. Creating an Amazon EKS cluster, could be from any other cloud or even on prem).

After that, you need to install a couple of Kubernetes-ready apps using Helm on your Kubernetes cluster. First we start with PostgreSQL :

helm repo add bitnami https://charts.bitnami.com/bitnami
helm install postgres -n dev --set postgresqlPassword=<your_password_here>,postgresqlDatabase=<your_database> \
bitnami/postgresql

with two simple lines of code we now have a PSQL database running in our Kubernetes cluster. If you need to create a temporary PSQL client, you can do so with the following commands:

export POSTGRES_PASSWORD=$(kubectl get secret — namespace dev postgres-postgresql -o jsonpath=”{.data.postgresql-password}” | base64 — decode)

This will create an environment variable called POSTGRES_PASSWORD which you can conveniently use in the following command:

kubectl run postgres-postgresql-client — rm — tty -i — restart=’Never’ — namespace dev — image docker.io/bitnami/postgresql:11.9.0-debian-10-r1 — env=”PGPASSWORD=$POSTGRES_PASSWORD” — command — psql — host postgres-postgresql -U postgres -d <your_database> -p 5432
For more information visit https://hub.helm.sh/charts/minio/minio

Next is MinIO helm chart. This is a pretty convenient tool which will be use in this tutorial as a gateway into AWS S3 to store our artifacts like deployed models and plots. First add the repo to Helm like this:

helm repo add minio https://helm.min.io/

Then install like this:

helm install — namespace <your_namespace> minio2 minio/minio -f values.yaml

The values.yaml file I used is available here.

For more information visit https://www.mlflow.org/docs/latest/tracking.html

Finally, putting it all together is MLflow Tracking Helm helm chart.

helm repo add oswaldo https://papagala.github.io/charts/

Then install with the following command

helm install — namespace dev minio2 minio/minio -f values.yaml

The values.yaml file I used is available here.

Finally, we can run Pycaret’s anomaly example with a small modification to include the MLflow and MinIO servers

import mlflow
import os
# you can set your tracking server URI programmatically:
mlflow.set_tracking_uri(‘<your_mlflow_tracking_url>')
os.environ[‘MLFLOW_S3_ENDPOINT_URL’] = ‘<your_minio_url>'
# The following line worked for setting your user on Ubuntu.
# It will otherwise pick it up from the running OS
os.environ[‘LOGNAME’] = ‘oswaldo’

Also, make sure you call the setup function with logging of experiment and plots set to True.

from pycaret.anomaly import *
ano1 = setup(data, session_id=123, log_experiment=True, experiment_name=’my_medium_experiment’,log_plots=True)

Note: You need to set your AWS credentials as environment variables from your running notebook (AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY using your MinIO username and password as their values). You never need a real pair of AWS access tokens since you rely on IAM Role for Kubernetes Service Accounts for getting access into S3.

That’s it! You now have a centralized place to store all your ML relevant metadata, compare them against your peers and even store your plots on S3 seamlessly. The best thing is that this can all be accomplished behind a VPN and in non-public IP range on your virtual private cloud, allowing people across teams to access relevant metadata without compromising security.

MLflow Tracking UI
Artifacts stored on S3 buckets displayed on MLflow UI

I hope you found this as valuable as me, let me know how your deployment goes and happy learning!

References

--

--

Oswaldo Gomez

Working as a Machine Learning Engineer in S&P Global MI AI Eng team, building production grade, highly available, scalable, AI powered applications