Azure MLOps: Doing DevOps one better
MLOps? What’s that?
MLOps is everything that DevOps is, plus the part where it takes care of your ML model training along with dataset and model management.
Not more than a couple of weeks ago, I had just zeroed in on and trained a model required for a project which I had been working on. The model worked as required and expected, and it was time to deploy the application which made use of the model. I looked towards Azure DevOps as usual, to version control my application’s source code as well as package and deploy it. But like all the other times, I wondered,
What would I do when the client comes up with new data and retraining is required? Do I manually retrain the model and tediously maintain its versions along with datasets’? What if I automated the model training and versioning process and added that to my current DevOps process?
Acting on my musings this time around, I started looking for a solution. And with me already working on Azure platform, I soon stumbled upon their MLOps solution.
What did I want to do?
Set up a pipeline which:
- Prepared my data for model training
- Saved the prepared data as a versioned dataset
- Trained the model
- Maintained the model’s version
How did I do it?
I used the Azure Machine Learning SDK for Python and wrote a script which created an MLOps pipeline. The pipeline took care of everything that I wanted from this solution and could be triggered via UI on Azure’s ML web portal, through a REST endpoint etc.
The Script Recipe
The script which creates the MLOps pipeline would loosely consist of the following:
Creating Workspace Object
I created a workspace object which is used to access all the resources related to a workspace in your Azure subscription. It required information about your workspace and subscription as well as credentials to grant the access.
Allocate Compute Clusters
For the steps of the pipeline to run, there needs to be a machine which they can be executed on. This machine was acquired in the form of a cluster which could scale from 0 to n nodes according to the requirement.
Pipeline Parameters and Pipeline Data
When triggering the pipeline, certain parameter values can be specified. They are called pipeline parameters and work just like function parameters do in programming languages.
In the case when 2 steps of the pipeline have different clusters as their compute target, sharing the data between them is done via pipeline data. This is because the different clusters won’t have access to each other’s local storage and therefore require a common storage point for data sharing.
Dependencies
For execution of a step on the compute target, certain dependencies might be required to be installed on it. These dependencies need to be defined beforehand.
Data Preparation Step
This step would run makedata.py on the compute target.
Within that script the final data after all the processing could be registered and versioned as follow:
Training Step
train.py is executed on the compute target using the EstimatorStep. This isn’t the only way of doing it and the script could be executed in different manners. This file would contain your code relevant to model training and evaluation, using the earlier registered dataset. You can download the artifacts of the registered dataset to the local storage of the compute target.
In this step, we have also used the previously defined pipeline data, “training_step_output” which is basically a path, to store the output and share it with a different pipeline step.
Model Registration Step
register.py is executed on a different compute target and the pipeline data, “training_step_output” in which the training output was stored is used.
register.py would look something like this:
Pipeline Publishing
In the end, we establish the sequence of the previously defined steps for the pipeline, validate it (check for circular dependencies of pipeline data etc.) and publish it.
The Finale
Once published, you can view your pipeline by visiting your workspace on https://ml.azure.com.
For reference, my pipeline ended up looking something like this.
You can submit a run of the pipeline from the UI.
Or using REST endpoint.
And last but not the least, use the workspace object via Azure Machine Learning SDK for Python.
Closing Statement
Azure MLOps is a powerful service which can be used for much more. From adding evaluation metrics to the models’ metadata while registering them, to containerizing and deploying the application which uses the model, there are many things that can be achieved using Azure MLOps. All that’s needed is exploring what all it offers.
So, dive into the documentation and find the solutions to your problems.