Serving ML models at scale using Mlflow on Kubernetes

Part 1 — How to deploy Mlflow tracking instance on Kubernetes?

Kais LARIBI
Artefact Engineering and Data Science

--

Source: unsplash.com — @ikukevk

TLDR

MLflow is a commonly used tool for machine learning experiments tracking, models versioning, and serving. In our first article of the series “Serving ML models at scale”, we explain how to deploy the tracking instance on Kubernetes and use it to log experiments and store models.

Introduction

Mlflow is a widely used tool in the data science/ML community to track experiments and manage machine learning models at different stages. Using it, we can store metrics, models, and artifacts to easily compare models’ performances and handle their life cycles. Besides, Mlflow provides a module to serve models as an API endpoint which facilitates their integration to any product or web app.

That being said, using machine learning in products online is cool, but depending on model size, nature (ML, deep learning,… ), and load (users’ requests) it could be challenging to dimension the needed resources and guarantee a reasonable response time. Therefore, using a scalable infrastructure such as Kubernetes clusters is key to maintain service availability and performance in the inference phase.

--

--

Kais LARIBI
Artefact Engineering and Data Science

Engineer/Data Scientist, passionate about aviation. I write mostly about data science, software engineering and technology.