Want to use Pycaret in production? Deploy an MLflow Tracking server with Helm
PyCaret is a low-code machine learning library in Python that allows you to go from preparing your data to deploying your model seamlessly. Perhaps the most useful production-grade feature is its ability to seamlessly connect to MLflow Tracking, which is an API an UI for logging parameters, code versions, metrics, and output files when running machine learning code and also for keeping well organized multiple experiments, comparing metrics and visualizing the results. To use MLflow, simply pass these parameters into Pycaret’s setup function like this:
With one single line, we have instantiated the anomaly module, named all the experiments my_medium_experiment
and logged relevant metadata like hyperparameters, metrics and even plots into MLflow tracking. In order to visualize all the metadata from our experiments, all we have to do is open a terminal and run (or prepend an exclamation from Jupyter notebook):
After doing so, you can go to http://localhost:5000/#/ and you will see this nice UI
This is already pretty cool, but if your team is larger than yourself, then each Machine Learning engineer would have its own personal MLflow server in their local computers which is far from ideal for sharing artifacts, competing to get the best model or simply keeping track of a production model for explainability purposes. So what do we need? MLflow Tracking Server
MLflow Tracking server on EKS using PSQL and MinIO Helm charts
There are many ways to spin up an MLflow Tracking server. One could imagine doing it via docker on an EC2 instance, right? Well the image below says it all and turns out the answer is that we need Kubernetes. As a prerequisite, you need to have the Helm CLI installed and initialized. Please find here the instructions in case you haven’t install Helm client yet.
Another prerequisite is having a Kubernetes cluster (e.g. Creating an Amazon EKS cluster, could be from any other cloud or even on prem).
After that, you need to install a couple of Kubernetes-ready apps using Helm on your Kubernetes cluster. First we start with PostgreSQL :
helm repo add bitnami https://charts.bitnami.com/bitnami
helm install postgres -n dev --set postgresqlPassword=<your_password_here>,postgresqlDatabase=<your_database> \
bitnami/postgresql
with two simple lines of code we now have a PSQL database running in our Kubernetes cluster. If you need to create a temporary PSQL client, you can do so with the following commands:
export POSTGRES_PASSWORD=$(kubectl get secret — namespace dev postgres-postgresql -o jsonpath=”{.data.postgresql-password}” | base64 — decode)
This will create an environment variable called POSTGRES_PASSWORD which you can conveniently use in the following command:
kubectl run postgres-postgresql-client — rm — tty -i — restart=’Never’ — namespace dev — image docker.io/bitnami/postgresql:11.9.0-debian-10-r1 — env=”PGPASSWORD=$POSTGRES_PASSWORD” — command — psql — host postgres-postgresql -U postgres -d <your_database> -p 5432
Next is MinIO helm chart. This is a pretty convenient tool which will be use in this tutorial as a gateway into AWS S3 to store our artifacts like deployed models and plots. First add the repo to Helm like this:
helm repo add minio https://helm.min.io/
Then install like this:
helm install — namespace <your_namespace> minio2 minio/minio -f values.yaml
The values.yaml
file I used is available here.
Finally, putting it all together is MLflow Tracking Helm helm chart.
helm repo add oswaldo https://papagala.github.io/charts/
Then install with the following command
helm install — namespace dev minio2 minio/minio -f values.yaml
The values.yaml
file I used is available here.
Finally, we can run Pycaret’s anomaly example with a small modification to include the MLflow and MinIO servers
import mlflow
import os# you can set your tracking server URI programmatically:
mlflow.set_tracking_uri(‘<your_mlflow_tracking_url>')
os.environ[‘MLFLOW_S3_ENDPOINT_URL’] = ‘<your_minio_url>'
# The following line worked for setting your user on Ubuntu.
# It will otherwise pick it up from the running OS
os.environ[‘LOGNAME’] = ‘oswaldo’
Also, make sure you call the setup function with logging of experiment and plots set to True.
from pycaret.anomaly import *
ano1 = setup(data, session_id=123, log_experiment=True, experiment_name=’my_medium_experiment’,log_plots=True)
Note: You need to set your AWS credentials as environment variables from your running notebook (AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY using your MinIO username and password as their values). You never need a real pair of AWS access tokens since you rely on IAM Role for Kubernetes Service Accounts for getting access into S3.
That’s it! You now have a centralized place to store all your ML relevant metadata, compare them against your peers and even store your plots on S3 seamlessly. The best thing is that this can all be accomplished behind a VPN and in non-public IP range on your virtual private cloud, allowing people across teams to access relevant metadata without compromising security.
I hope you found this as valuable as me, let me know how your deployment goes and happy learning!
References
- Home — PyCaret. (2020). Retrieved 29 September 2020, from https://pycaret.org/
- MLflow Tracking — MLflow 1.11.0 documentation. (2020). Retrieved 29 September 2020, from https://www.mlflow.org/docs/latest/tracking.html