MLOps with Databricks: (I) process flow design

Philippe de Meulenaer, PhD
7 min readApr 5, 2022

In this blog post series, I plan to navigate through the main principles of MLOps, which is the process of developing, deploying and monitoring machine learning models, using the Databricks platform. In this first post, the idea is to define the process flow, i.e. identify what are the different ingredients to incorporate and how logically line them up when designing a deployment process.

In the second blog post, I plan to flesh out this process flow using a CI/CD pipeline inspired by DatabricksLabs, using a simple batch use case as an example.

1. What ingredients for a mature MLOps setup?

This question was brilliantly addressed in the famous Google blog post MLOps: Continuous delivery and automation pipelines in machine learning (March 2020). That blog post identifies three levels (from “level 0” to “level 2”) of maturity regularly observed in MLOps setups in many companies. Hereafter is a (slightly adapted) vue of the maturity “level 2”:

Adaptation of the MLOps maturity “level 2” from Google

Without repeating the Google blog post, the most important aspects of this process flow are:

  • the Feature Store: a version-controlled model should be based on version-controlled data/features. The feature store also allows to easily share the features created by different data scientists and…

--

--

Philippe de Meulenaer, PhD

I am working as a freelance consultant MLOps engineer. I am very interested in cloud technologies for Machine Learning deployments