MLOps with Databricks: (I) process flow design
In this blog post series, I plan to navigate through the main principles of MLOps, which is the process of developing, deploying and monitoring machine learning models, using the Databricks platform. In this first post, the idea is to define the process flow, i.e. identify what are the different ingredients to incorporate and how logically line them up when designing a deployment process.
In the second blog post, I plan to flesh out this process flow using a CI/CD pipeline inspired by DatabricksLabs, using a simple batch use case as an example.
1. What ingredients for a mature MLOps setup?
This question was brilliantly addressed in the famous Google blog post MLOps: Continuous delivery and automation pipelines in machine learning (March 2020). That blog post identifies three levels (from “level 0” to “level 2”) of maturity regularly observed in MLOps setups in many companies. Hereafter is a (slightly adapted) vue of the maturity “level 2”:
Without repeating the Google blog post, the most important aspects of this process flow are:
- the Feature Store: a version-controlled model should be based on version-controlled data/features. The feature store also allows to easily share the features created by different data scientists and…