Lessons learned from migrating models to Unity Catalog

Vechtomova Maria
Marvelous MLOps
Published in
4 min readJul 24, 2024

Since Databricks introduced Unity Catalog, all features require Unity Catalog-enabled workspaces to work properly. You should now also register models in Unity Catalog since the Workspace model registry is marked as “legacy”. MLflow experiments are still at the workspace level.

In this article, we will discuss changes introduced due to the Unity Catalog and how they impacted us. But first, let’s explain our requirements.

  • There are 3 different Databricks workspaces: development, staging, and production. If model training is costly, we want to avoid retraining the model in each environment when there are code changes.
  • Some projects require manual approval before the model can be deployed to production.
  • Most models we have are custom models (we use pyfunc to store them in the Model Registry). Some have nontrivial outputs, such as a list of dictionaries with string keys and string values.
  • It must be possible to trace back registered models based on the git commit hash and Databricks run_id. This is used for roll-back scenarios.

What did we learn while trying to fulfill these requirements in the new setup?

1. Sharing models between environments became much easier.

In the old setup, there were several ways to share models between environments. Copying over or writing directly to the model registry from another environment required sharing credentials across environments, and it was not ideal.

With the introduction of Unity Catalog, it is recommended to have 3 catalogs: Dev, Staging, and Prod. Higher environments can read from one environment below (users/ principals in the staging workspace can copy models from the Dev catalog, and users/principals in the production workspace can copy models from the Staging catalog).

See documentation explaining how to promote models across environments.

2. Manual approval requires additional tools outside of Databricks.

In the old setup, MLflow models had stages None, Staging, Production, and Archived. Starting with MLflow 2.9, stages are marked as deprecated and will be removed in the future major release.

Stages (even though they received criticism from the community for their rigidness) were used for Model Registry webhooks, which are not available in Unity Catalog. The process of model retraining in production consists of the following steps:

  • Retrain the model and register with stage “None”.
  • Complete automated checks, and transition to stage “Staging”. Request to transition model to stage “Production”.
  • The request would trigger a webhook. An approver would receive notification (for example, in Slack), complete manual checks, and approve.
  • The approval would trigger a Databricks job that deploys the model behind a model-serving endpoint.

In Unity Catalog, we could use job webhooks and an external CI/CD system to achieve the same, which significantly complicates the process.

3. Signatures are enforced to register models in the Unity Catalog, and not all data types are supported.

Before Unity Catalog, it was possible to register models in the Model registry without providing model signature. Even though providing model signature is always a good idea, we could not do that because model output in some of our custom models had complex data types which were not supported. Support for Objects and Arrays was introduced with MLflow 2.10 (released in January 2024), which resolved our problem. See example of code below.

import mlflow
from mlflow.models.signature import infer_signature

mlflow.set_registry_uri("databricks-uc")

catalog_name = "mlops_test"
schema_name = "feature_serving"

class Model(mlflow.pyfunc.PythonModel):
def predict(self, context, model_input):
return {"customerID": model_input['customerID'],
"topProducts": [{"productId":"23166", "productName":"Quaker Muesli"}]}

input_data_example = {"customerID": "a", "type": "TP"}
output_data_example = {"customerID": "a", "topProducts": [{"productId":"23166", "productName":"Quaker Muesli"}]}

signature = infer_signature(model_input=input_data_example, model_output=output_data_example)

model = Model()
mlflow.pyfunc.log_model("model", python_model=model,
registered_model_name=f"{catalog_name}.{schema_name}.user_recommendations_model",
signature=signature)

However, if you try to register your model using the feature engineering package (even the latest version 0.6.0 at the moment of writing this article), it will fail to register it with error related unsupported model signature. Databricks is working on resolving the issue.

Check documentation on model signatures and supported data types for more details.

4. Searching registered models by custom tags is not supported.

In one of the previous articles, we explained how custom tags can be useful for traceability and reproducibility: https://medium.com/marvelous-mlops/traceability-reproducibility-62fbb4454f39.

MLflowClient has search_registered_models() method, and in the old situation it was possible to search by tag. Unfortunately, Some search API fields and operators are not supported for models in Unity Catalog, like tag-based filters, order_by parameter, and operators rather than exact equality.

See documentation for more details.

We can still tag models with run_id and git commit hash, and have the code release version as an alias (search based on aliases is supported).

Conclusions

Depending on your setup and features you use, migrating models to Unity Catalog can be quite an endeavor and may require rethinking the whole process around model lineage and governance.

Good luck!

Useful resources:

--

--