Serving ML models at scale using Mlflow on Kubernetes

Part 2— How to serve a model as an API on Kubernetes?

Kais LARIBI

Published in

Artefact Engineering and Data Science

3 min readOct 14, 2021

TLDR

This article is the second part of a series in which we go through the process of logging models using Mlflow, serving them as an API endpoint, and finally scaling them up according to our application needs. We encourage you to read our previous article in which we show how to deploy a tracking instance on k8s and check the hands-on prerequisites (secrets, environment variables…) as we will continue to build upon them here.
In the following, we show how to serve a machine learning model that is already registered in Mlflow and expose it as an API endpoint on k8s.

Introduction

It is obvious that tracking and optimizing models’ performance is an important part of creating ML models. Once done the next challenge is to integrate them into an application or a product in order to use their predictions. This is what we call models serving or inference. There are different frameworks and techniques that allow us to do it. Yet, here we will focus on Mlflow and we will show how efficient and straightforward it could be.

Build and deploy the serving image

Serving ML models at scale using Mlflow on Kubernetes

Part 2— How to serve a model as an API on Kubernetes?

TLDR

Introduction

Build and deploy the serving image

Written by Kais LARIBI