Exploring the Machine Learning Model Lifecycle with Databricks and MLflow

Published in

Version 1

7 min readNov 5, 2020

Developing and managing machine learning models is significantly different and much harder than traditional software developing. Challenges arise not only during the development stage, but also in production phases of the development. This Medium post will explore the machine learning model cycle and some of the key differences I have come across working with these models.

The differences in the development phase between traditional software and machine learning can be summarised by what we are trying to achieve, what affect our quality and what tooling do we use.

Key Differences

Goal: The goal of traditional software development is to meet specification, functional and non-functional, and it is straightforward to confirm if an application meets the needs. In machine learning, the goal is to improve model metrics over time as degradation or improvement in the model performance means losses or gain for the business. It also means that models are in constant flux and never really done - a moving target.

Quality: The quality of the product or application in software development largely depends on the quality of the code produced. For machine learning development, quality depends on the input and train data, as well as hyperparameters that need to be tune based on the data available.

Stack: In software development, the teams typically pick one software stack with added libraries that they use to achieve the goal. For machine learning development, several stacks, models and algorithms could be combined into one solution. The deep learning stack could be combined with the traditional ML libraries as well as different languages.

Production Challenges: The challenges in the production for machine learning does not diminish after going live for the product and the main challenges related to new data, the complexity around the development process and the model lifecycle.

Data: The nature of machine learning requires historical data and as time progresses, so does the data. Following some time, the new data accumulated becomes historical data, and as a consequence of new data acquisition, the product risks becoming obsolete with the models required to be rebuilt.

In contrast to traditional software development, the product will work as longas nobody touches it or updates the system it is relying upon — the stability of the product is rarely in question.

Process: The process of developing a machine learning model is a complex process and involves several people with different skillsets. The design of the model is managed by a data scientist, data preparation can be overseen by data engineer, the development of the end product is handled by an application developer. The process for developing traditional software development is generally handled by a homogeneous group of developers with similar skillsets.

Model Lifecycle

There are unique challenges in the machine learning model management, such as consistency of the results and codebase modification control, the model lineage, model metrics, hyperparameters optimisation, and model staging.

During our runs, we need consistent results from a run, as well as a history of runs and modifications. With the model lineage, we need to keep a model versioning system and link the versioning of the model to metrics produced by a model. It would be desirable to store hyperparameters used for the record and for use in further automation and analysis. Also, we would like to have programmatic access to all the mentioned functionality and perform model staging from test to production within our code without human intervention.

ML Management Platform

Over several years, many ML management platforms have come into existence to tackle the problems outlined above. There are products such as Airflow and Luigi that are general orchestration platforms, and there are more specialised platforms such as Kuberflow and MLflow that specialise in machine learning orchestration. The section below will focus on the combination of Databricks and MLflow, and the benefits they could bring.

Experiments Management

One of the pillars in MLflow is experiment tracking. Integrated into Azure with Databricks, this allows us to record and query the experiments, manage changes in the codebase, track input data, compare parameters consumed, metrics emitted, and results produced.

This component allows us to track experiment executions, the hyperparameters that are used for the experiment, the metrics model produced and the model the experiment produced. We can track changes graphically or programmatically as well as access all attributes of this concept programmatically.

This part of the system allows us to track the version of the code, as well as who and when modified the code and allows us to revert to the desired version if required. The codebase is not stored externally and the access to codebase is managed by Databricks and Azure security.

This part of the system allows us to manage the model manually, where we can compare the model based on the metrics, manage if the model goes to production.

In this part of the system, we can compare the experiment runs and evaluate manually of programmatically what worked well and what did not.

Applicable to all parts of the system, all functionality, access, modifications and comparison can be done programmatically within the other notebooks.

Reproducible execution

The second pillar of the MLflow is an introduction of a format for packaging data science code in a reusable way. At its core, it is well-defined metadata for organising and describing the code to allow automated tools to run it. This component includes API for many languages as well as CLI interface thus enabling this as an orchestrator that chains multiple projects together. The underlining message from all of this, MLflow and Databricks build an easy way to get to the code and artefacts from Git repo and run it on the Databricks.

Model packaging

Currently, many vendors of machine learning libraries, such as TensorFlow, Sklearn, Spark, etc. are using proprietary formats to store data. This approach works fine if you do not use different frameworks to find the best model. However, once the different machine learning stack is used, the problem on interoperability will emerge. To mitigate this MLflow has created a wrapper format that supports most of the vendors and abstract inner representation of the model in favour of more generic one MLflow flavour. The abstraction of the model representation is an invaluable move on the path of industrialisation of the model deployments and lifecycle of such, as one week we may have Sklearn mode producing a reasonable result and then after a while, TensorFlow could produce a better result but for the rest of applications and product change in a model become transparent and the model agnostic.

Conclusion

The combination of Azure, Databricks, and the MLflow provides an outstanding framework and set of tools to manage many aspects of machine learning projects and in particular the management of the lifecycle of the model. To summarise available functionality from the symbiosis of Azure, Databricks, and the MLflow:

· Code versioning — code versioning integrated into notebooks.

· Experiments run tracking — experiments persisted and searchable.

· Hyperparameter tracking — the model parameters are tracked by the system.

· Model metric tracking — the metrics emitted by the model tracked by the system.

· Model tracking — model produced by the experiment tracked by the system.

· Programmatic access — we have programmatic access to all above.

· MLOps capable — allows orchestrating multiple projects together.

· Format agnostic models — the models produce are agnostic to vendor format.

Perhaps one disadvantage of this trio is the fact that Databricks and Azure is a cloud-only solution and for some, this is not an option. Nevertheless, MLflow is capable to be installed on Kubernetes and OpenShift platforms.

The second disadvantage of this trio is the pricing, and if you use it without any care, the cost can escalate very quickly. Thankfully, there are auto-shutdown functions in Databricks that allows you to shut down cluster after a particular idle period.

However, it would take significant effort to implement an experiment and model tracking functionality as they exist now in the frameworks above. Taking into account that this overview only touched on some aspects of the environments and functionality available, it is safe to say this is a good product to invest time into.

About the Author

Alexander Suvorov is a Senior Data Scientist and Senior Developer who has been working in Version ‘s 1 Innovation Labs since 2019.