Databricks Freaky Friday Pills #4: Model Governance

Published in

SDG Group

12 min readApr 19, 2024

Welcome back to our series of articles! We’re halfway through our journey of building a Machine Learning (ML) Solution with Databricks. Today, we’re gearing up to explore model governance within the Databricks platform. But before we dive into that, let’s quickly recap what we’ve covered up to this point.

By now, you should have a good grasp of Databricks’ overall setup. Working with the platform to manage assets like tables and schemas should feel pretty straightforward. Hopefully, you’re also feeling comfortable using Databricks notebooks for tasks like setting up workflows and running them smoothly. Remember the “expectations” framework we talked about? It helps ensure your data meets the quality standards you need (Databricks Freaky Friday Pills #3).

If you remember from our previous article (Databricks Freaky Friday Pills #2) also, we already built a model to predict some key outcomes for our project. In the upcoming sections, we’ll walk you through the steps of getting your model ready, training it, testing it out, keeping an eye on its performance, and serving it to your audience. Ready to dive into Databricks’ Model Governance solutions? Let’s go!

Quick note: As you may have noticed, we have kept AutoML out of the scope of this article. We will dive into this framework in the following chapter. Stay tuned!

1. Introduction to model governance and model lifecycle

In this first section, we will introduce ourselves to the model governance basic concepts: model, model version, experiments, and run.

Let’s start with a quick introduction to what we consider a model. A model is a software application designed to identify patterns or render decisions based on new data it hasn’t encountered before. They’re designed to solve a bunch of problems from a variety of use cases. So, when we refer to a model, we’re talking about the paradigm that is being solved by an artifact. ML artifacts refer to the outputs generated by ML pipelines essential for executing subsequent pipelines or ML applications. Common ML artifacts include models, features, training data, and inference data. Unlike metadata for ML pipelines, ML artifacts are typically not stored in a database, but stored and governed by a model registry.

Throughout its existence, a model undergoes periodic updates influenced by various factors. These factors may include data drift, the introduction of new features, fluctuations in model performance, evolving business requirements, use case intricacies, and an array of other considerations. To maintain the tracking of these periodic updates or changes, different model versions can be introduced. A model version determines some specificities of the model regarding parameters, weights, or any possible configuration affecting the model itself.

To ensure the seamless integration of model versions, it is imperative to monitor various attributes intrinsic to the model itself. In this regard, we introduce the concept of “Experiments”. An experiment constitutes a comprehensive collection of runs conducted on a model to test different configurations. Each run yields an artifact encapsulating the model along with its associated assets indispensable for generating outputs or making inferences on novel datasets. Thus, when we decide a run is valid enough to generate a new version of a model, the run is registered and tracked along with all the artifacts required to simulate the same environment where the run was created.

In the following image we can follow the interaction between these four concepts differentiating between what is in the scope of an experiment and what is in the scope of a model:

As you may notice, we start our model exploration journey with “Model tracking”. During this phase, we explore different models under the same Experiment, which is indeed, composed of multiple runs. Throughout model exploration, we do find a model run that meets our requirements to be “registered”. After a run meets the desired requirement, we can move on to the “Model registry” part. It is here where the model version stands out, derived from a valid run. Now that we have registered our model, the incoming iterations over the experiment associated with this model are going to update the intrinsic artifact associated with the model. Whenever we decide to register the model again, a new version will be available.

Lastly, in the model lifecycle, there’s a crucial step: model monitoring. Here, we keep a close eye on how the model performs when faced with new data. Although this article won’t dive into the details of model monitoring, we’ll cover it separately, especially in our discussion on Databricks Lakehouse Monitoring. Stay tuned!

2. Model governance with MLflow

The concepts introduced in the previous section lead us to the way Databricks govern the model lifecycle using MLflow. MLflow is an open-source platform for managing ML solutions along its whole life cycle (excluding monitoring). Databricks has complete integration with this open-source platform. Retrieving two main concepts of our prior presentation, let’s explore how Databricks facilitates the integration of MLflow’s experiment and run functionalities.

Runs, experiments, and also… flavors

A run in MLflow encompasses a comprehensive set of parameters, metrics, labels, and artifacts tied to the training procedure of a machine learning model. An experiment, on the other hand, stands as the fundamental organizational unit within MLflow. All runs within MLflow are associated with an experiment. Within each experiment, users can evaluate and contrast the outcomes of diverse runs and effortlessly access metadata artifacts for further analysis through downstream tools. These experiments are managed on an MLflow tracking server. As you can see in the image below, We are running multiple runs for our experiment “use case”. By analyzing each of these runs, we can track the behavior of our model. It’s on us and our capabilities as Data Scientists to empower ourselves with this tool and register the best model that will solve our use case scenario (based on a metric, an attribute, a concrete set of features, or whichever drives the final decision).

To launch all these runs it does not require any complications on our side. Thanks to the MLflow integration within the Databricks platform, it is a matter of a couple of lines to associate our runs with the experiment. Thus, when we trigger our training, all our runs will be stored under this experiment.

experiment = mlflow.set_experiment("Databricks_ML_Solution")
xgb.fit(X_train, y_train)

All these runs contain the model artifacts, a series of files that contain all necessary components to reproduce predictions consistently across different environments.

Once we hit the perfect model, we jump to the next stage, the model registry. Before digging into MLflow and the proper steps to register a model, is it worth mentioning how MLflow manages the storing of artifacts related to these models. We’re talking about “flavors”.

MLflow flavors serve as standardized serialization formats for trained machine-learning models, ensuring compatibility across various frameworks and environments. These flavors simplify model serialization and streamline deployment across different frameworks and environments. MLflow’s flavor design guarantees a high level of consistency. Each MLflow flavor aligns with a specific library and dictates how the loaded pyfunc behaves during inference deployment. This standardized approach ensures that each flavor adheres to a predictable format, promoting consistency while maintaining some degree of rigidity.

Model registry & inference

Following our model exploration via MLflow experiment and runs tracking, we transition to the subsequent phase in the MLOps cycle: the model registry. At this stage, we’ve selected a model to generate accurate predictions on incoming test data. Leveraging the MLflow UI offered by Databricks, registering a model, and readying it for future utilization becomes a straightforward task.

In the following images you can see how we navigate through the UI to register a model:

1. In the model run page, navigate to the artifacts sections and click on Register Model:

2. Write your model name:

3. After that, click Register and your model will be ready to be used to infer new data. On the model page, you will be prompted with all your models registered along with their model versions:

As you may detected, we can also control the version and the stages:

4. The last step would be to make use of our model for inference. With these previous three steps, our model is more than ready to be loaded and put to work with new data. By navigating to the model page, in the upper-right corner, we have the option to directly generate a notebook that leverages our newly registered model:

The integration of Databricks with MLflow streamlines MLOps seamlessly within a unified platform. This integration facilitates the entire ML lifecycle, from model exploration to model serving, harmonizing every aspect of the ML solution under one roof. Such consolidation accelerates development, diminishes maintenance costs, and simplifies project scalability.

3. Feature governance

Initially pioneered by Uber with Michelangelo Palette in 2017, these platforms serve as a centralized repository facilitating data scientists in discovering and sharing features. Additionally, it guarantees consistency by ensuring that identical code for computing feature values is utilized during both model training and inference stages.

Feature stores play a pivotal role in ensuring consistency and accuracy in feature engineering across different stages of the machine learning lifecycle. By providing high-throughput batch and low-latency serving APIs, they facilitate the integration of machine learning models into real-time applications, enabling organizations to make data-driven decisions in near real-time. Additionally, feature stores promote feature reuse, allowing organizations to leverage existing feature sets across multiple machine-learning projects. This not only saves time and resources but also enhances collaboration and knowledge sharing among data scientists and machine learning engineers.

The Feature Store lifecycle

In the same way we exposed the model lifecycle in the previous section, in the image below we show you an example of a feature lifecycle throughout an ML solution:

In the initial phase, the data engineering team undertakes the task of creating, refining, and managing the data pipelines responsible for generating one or more definitive datasets. These datasets are then utilized by subsequent processes tasked with extracting the necessary features into the Feature Store. It is in this stage that the feature store assumes its pivotal role as a centralized repository for all potential features that may serve as input for the machine learning models. Serving as a fundamental element of our machine learning infrastructure, it facilitates scalability and efficiency throughout our solution. Finally, we arrive at the model exploration phase, which becomes significantly more streamlined due to the coordinated and aligned feature serving point provided by the feature store. As its name indicates, it resembles a shopping experience for features, allowing us to extract the most valuable ones for our model.

4. Databricks Feature Store, the new store in town!

We jump now into the Databricks Feature Store specific characteristics. The core principles align with those of feature stores in general: they serve as a centralized repository for features. Within this repository, all members of the development team collaborate and synchronize their efforts regarding any procedures that impact the features. This ensures consistency and cohesion in feature management across the entire organization.

Feature Store main concepts in Databricks

The Databricks Feature Store seamlessly integrates with the platform’s ecosystem, offering:

Discoverability: Users can easily find and explore existing features via the Feature Store UI within the Databricks workspace.
Lineage: The Feature Store records and provides access to the data sources used in creating feature tables, along with associated models, notebooks, jobs, and endpoints for each feature.
Integration with model deployment: Models trained using Feature Store features automatically include feature metadata, simplifying deployment. During scoring or inference, models retrieve features from the Feature Store without additional logic.
Point-in-time: Supporting time series and event-based scenario look-ups, the Feature Store ensures accurate data retrieval for historical analysis or real-time decision-making.

The integration between Feature Store and Unity Catalog is even more practical and smooth if your environment has Unity Catalog enabled and Databricks Runtime 13.2 and above. With these conditionals, you can make use of any Delta table or Delta Live Table in Unity Catalog with a primary key as a feature table for model training or inference. The offline feature store, leveraging Delta tables, is essential for tasks like feature discovery, model training, and batch inference, catering to needs within offline environments. Conversely, the online store, designed for low-latency operations, specializes in facilitating real-time model inference. While the offline store is optimized for batch processing and model training, the online store excels in serving models swiftly in real-time applications.

An implementation example on a use case

1. Create Feature Store

The very first step is to declare a feature store instance before start storing our dataframes there:

from databricks import feature_store
fs = feature_store.FeatureStoreClient()

2. Populate Feature Store

Once our feature store is created, we can start filling it up with the desired tables generated after our data engineering pipelines. To do that, we can make use of “create_table” function. This function will create a new feature store table under the properties specified in its parameters:

fs.create_table(
name=table_name,
primary_keys=["date", "client_id"],
df=features_df,
schema=features_df.schema,
description="first feature store table"
)

3. UI on Feature Store

To access the Feature Store UI, click on the Feature Store icon in the left navigation bar. Once you’ve clicked on the feature store you created, you’ll see several characteristics displayed:

Then, drill down to see more details on the features regarding type and endpoint consumptions:

4. Retrieve from Feature Store

The last step, retrieving our features from the store, is made easy with FeatureLookup. We just need to provide the table name and the keys. In addition, we could ask for some of the features in our lookup specifying those in feature_names:

model_feature_lookups = [FeatureLookup(table_name=table_name, lookup_key=lookup_key, feature_names =[features])]

Conclusions

Our exploration of model governance through Databricks shows its critical role in ensuring the reliability and adaptability of machine learning models. By understanding concepts like model, model version, experiment, and run, we establish a structured framework for the model lifecycle, accommodating changes by phases such as the model tracking and the model registry. Integrated with MLflow, Databricks provides model governance, from experimentation to deployment, offering consistency and collaboration across the different phases involved in the ML lifecycle.

Additionally, our journey extends to feature governance, emphasizing the importance of centralized feature repositories in maintaining consistency and accuracy throughout the ML process. Databricks’ Feature Store, along with Unity Catalog integration, offers a comprehensive solution for managing both offline and online feature stores, catering to diverse requirements from batch processing to real-time inferences. These components help us to build and deploy robust machine learning solutions efficiently and reliably within the Databricks ecosystem.

Stay tuned for the following chapter dedicated to Databricks Lakehouse Monitoring and CICD in Databricks!

References

For this article, we have used the following references:

Databricks. (n.d.). Model registry overview. Databricks Documentation. Retrieved from https://docs.databricks.com/en/machine-learning/manage-model-lifecycle/workspace-model-registry.html
Databricks. (n.d.). Feature store. Databricks Documentation. Retrieved from https://docs.databricks.com/en/machine-learning/feature-store/index.html
Databricks. (n.d.). Track model development. Databricks Documentation. Retrieved from https://docs.databricks.com/en/machine-learning/track-model-development/index.html
Databricks. (n.d.). MLflow. Databricks Documentation. Retrieved from https://docs.databricks.com/en/mlflow/index.html
MLflow. (n.d.). Models. MLflow Documentation. Retrieved from https://mlflow.org/docs/latest/models.html
MLflow. (n.d.). Creating custom PyFunc model flavors (part 1: Named flavors). MLflow Documentation. Retrieved from https://mlflow.org/docs/latest/traditional-ml/creating-custom-pyfunc/part1-named-flavors.html

Who we are

Gonzalo Zabala is a consultant in AI/ML projects participating in the Data Science practice unit at SDG Group España with experience in the retail and pharmaceutical sectors. Giving value to business by providing end-to-end Data Governance Solutions for multi-affiliate brands. https://www.linkedin.com/in/gzabaladata/
Ángel Mora is an ML architect and specialist lead participating in the architecture and methodology area of the Data Science practice unit at SDG Group España. He has experience in different sectors such as the pharmaceutical, insurance, telecommunications, and utilities sectors, managing different technologies in the Azure and AWS ecosystems. https://www.linkedin.com/in/angelmoras/