MLOps with Databricks Roadmap & Course Announcement
As experienced Databricks users, we prepared a curated list of resources to help you navigate different aspects of MLOps, particularly focusing on Databricks. The resources are categorized by topic for easy reference. Whether you’re just getting started with Databricks or looking for advanced features that can be useful for your MLOps implementation, this guide will serve as a useful reference.
1. MLOps principles and components
Understanding the core principles and components of MLOps is a must for successfully deploying and maintaining machine learning projects. Below are some essential resources to get you started.
Resources by Marvelous MLOps:
- MLOps maturity assessment — a checklist for ML models before they go to production
- The minimum set of must-haves for MLOps — understand the the minimal MLOps set up to deploy ML models to production
- The Ultimate Must-Haves and Nice-to-Haves for MLOps & LLMOps — extended version of MLOps Toolbelt with additional LLM components
- MLOPs Roadmap — A comperehensive guide to taking you from beginner to expert in MLOps, covering everything from foundational machine learning principles and programming skills to advanced operational practices.
Other recommended resources:
- MLOps Org by INNOQ — Collection of a wide range of articles. Perfect for staying updated on the latest in MLOps
- 5 Levels of MLOps maturity — The journey of MLOps maturity into five levels, helping you understand where you stand and what’s needed to get more advanced.
Our MLOps Maturity Assessment is created with inspiration from the approaches developed by both Google and Microsoft.
2. Developing on Databricks
Resources by Marvelous MLOps:
- Handy Databricks Features for Development — start developing your ML code like a pro already on day one with these cool features from Databricks.
- Bridging the Gap: Converting Data Science Notebooks into Production ML Code — Notebooks are not designed for production deployment and can be difficult to maintain. This article will show you how to convert your notebook into a production-ready code.
Other recommended resources:
- Developing Production Databricks Pipelines
- Advancing Spark — Local Development with Databricks — A nice demo to learn Databricks Connect. (also mentioned in our article)
- Bridging the Production Gap: Develop and Deploy Code Easily — A nice demo from Databricks to increase productivity by integrating IDEs with Databricks, utilizing tools like code linters, AI assistants, and CI/CD integrations.
3. Databricks asset bundles
Databricks Asset Bundles enable the adoption of software engineering best practices like source control, code review, and CI/CD for data and AI projects by describing Databricks resources as source files, streamlining project structure, testing, and deployment for easier collaboration. It’s also a great approach used in ML model deployments.
Resources by Marvelous MLOps:
- Getting started with Databricks Asset Bundles — Easy way to deploy Databricks workflows and manage dependencies
- Dealing with private packages — Different ways to manage private packages using asset bundles
Other recommended resources:
4. Mlflow experiment tracking & registering models in Unity Catalog
MLflow is one of the most popular tools for model registry and experiment tracking. As an open-source platform, it integrates easily with different tools and platforms. We highly recommend learning and practicing with MLflow to gain hands-on experience in model tracking.
Resources by Marvelous MLOps:
- Find your way to MLflow without confusion MLflow has an extensive support and a lot of options, this article is beginner friendly, focusing on the fundamentals to help you get started
- Lessons learned from migrating models to Unity Catalog: Changes introduced due to the Unity Catalog and some tips.
- MLflow Cheatsheets Handy reference sheets to quickly find key information on MLflow. Great for quick consultations during development.
Other recommended resources:
- What Exactly Is a Model in MLflow Clarifies the concept of a model within the MLflow ecosystem, helping you understand its various components
- Algorithm-Agnostic Model Building with Mlflow
- Dive into Databricks Model Deployment -1 Nice article to explain a cyclical process involving six steps that guide the development, deployment, and ongoing maintenance of models to ensure their effectiveness.
5. Model serving architectures
Resources by Marvelous MLOps:
• Model Serving Architectures on Databricks An overview of various model serving architectures available in Databricks. A good starting point for understanding your options and choosing what’s best for your use case.
- Getting Started with Databricks Feature Store Focuses on the Databricks Feature Store, explaining how to use it for model serving.
- Going Serverless with Databricks — Part 1, Going Serverless with Databricks — Part 2: An introduction to the serverless model endpoints in Databricks, highlighting the benefits and how to get started. Includes examples of custom model deployment. Perfect for those exploring serverless architectures.
Other resources:
6. Inference tables and lakehouse monitoring
- Drifting Away: Testing ML Models in Production | Databricks Lakehouse Monitoring
- Learn How to Reliably Monitor Your Data and Model Quality shows how to monitor and manage ML models to maintain their performance.
Resources covering end-to-end MLOps on Databricks
There are not so many resources covering end-to-end MLOps on Databricks. Here are some we recommend:
- MLOps Gym series on Databricks Community Technical blog
- Big Book of MLOps by Databricks A comprehensive guide that covers everything from the basics of MLOps to advanced concepts, tailored for the Databricks environment.
Course announcement
Courses on the topic (except private training) barely exist. Materials provided by Databricks Academy only cover the basics and are notebook-heavy. This is what inspired us to start writing about Databricks in the first place.
Now we are proud to announce that we created our End-to-end MLOps with Databricks course, packed with condensed knowledge that comes from many years of experience with MLOps and Databricks.
Enroll now and use code MARVELOUS for a 10% discount.