Build and deploy an ML pipeline on Azure ML Studio (Part 1)
Pipelines are the backbone of enterprise software development, powering efficiency, scalability, and speed. They’re a favorite among software and machine learning engineers, not just for their efficiency and ability to handle large-scale systems, but for their power to automate processes and deliver solutions faster and more consistently.
In this article, I’ll take you on a journey through the essentials of pipelines, why they matter, and how to build your own ML pipeline using Azure ML Studio. I’ll also show you how to test and scale these pipelines seamlessly in the cloud, ensuring you’re equipped to deliver cutting-edge solutions with ease.
Introduction
In simple terms, a pipeline is a set of automated processes that allow developers to build, compile, and deploy their code to production platforms.
Now, Machine Learning Pipelines are essentially automated data processing workflows that streamline the entire process of building, training, and deploying machine learning models. They stitch together various stages like data collection, preprocessing, model training, evaluation, and deployment into a cohesive unit.
ML Pipelines are usually configured in such a way that the output of one process serves as the input of the other. In machine learning, there are typically two types of pipelines:
- Training Pipeline: This type of pipeline is typically used to train machine learning models to make predictions. They typically include data importation, data preprocessing, feature engineering, model training and evaluation. At the end of the training pipelines, the trained model is typically saved so it can be used for future inference.
- Inference Pipeline: This pipeline is used to deploy trained machine learning models to a production environment. A production environment can be seen as the final stage where a software solution is deployed for real-world use by the intended audience. The main purpose of any ML model is for it to make predictions when prompted. The inference pipeline simply prepares the trained model to make these predictions when prompted through a web service, API or mobile application.
The Azure ML Studio
The Azure ML Studio is a cloud-based environment for building, deploying, and managing machine learning models. It has a visual interface with drag-and-drop functionality and also provides it users with an affordable and efficient ML-as-a-service offering.
For this tutorial, we will be making use of the Azure ML Designer application to design the ML pipeline. The Azure ML Designer is one of the many AI applications available on the studio. It provides users with a drag-and-drop interface for building pipelines.
Here, pipelines are built by connecting components together. These components can either be classic pre-built components or custom components.
Steps in building an ML Pipeline
Step 1: Create an Azure ML Studio Workspace
After logging into the Azure ML Studio, you should be able to see all existing workspaces. If you haven’t created one before, you can do that by clicking on the Create workspace button or by following the steps highlighted in this tutorial.
Step 2: Log into the workspace and navigate to Designer
To log into an already created workspace, simply click on the workspace. For this tutorial, I will be making use of the practical workspace which is highlighted with a red box as shown in the illustration above.
To get started with designing a training pipeline, select the Designer button highlighted below.
Step 3: Create a new pipeline using classic prebuilt components
Now that you are now on the Azure ML Designer landing page, you would see a number of prebuilt pipelines for image classification, binary classification and recommendation systems. For this tutorial, we would be building an ML pipeline from scratch. Proceed by clicking on the Create a new pipeline using classic prebuilt components button.
Step 4: Build a simple pipeline by adding components from the Component tab to the design canvas
For this step, we would be building an ML pipeline by dragging and dropping components from the Component tab. Here, prior knowledge of the ML lifecycle is required. If you need a refresher on this, check out this guide.
The complete ML training pipeline is shown below. It contains every ML development step from data importation to model evaluation. Since this is a training pipeline, we do not include any model deployment components yet.
Step 5: Configure & Submit the pipeline
In order to train the ML model, we need to submit the pipeline as a job. In Microsoft Azure, a job is a series of steps that run sequentially as a unit. Thus, during training, each component in the pipeline would be executed from top to bottom and the output of each process would be used as the input for the next process.
After clicking on the Configure & Submit button highlighted above, you would be taken to a pop-up window where you can specify the name of the job, inputs & outputs as well as the runtime settings. After selecting a unique job name, you would then need to specify which compute to be used to execute the job.
A compute is a set of cloud computing services provided by Microsoft Azure that offer scalable and flexible solutions for running applications and workloads in the cloud. Computes are usually based on the type, processing speed, availability and other features. For this tutorial, I made use of a CPU compute instance with a virtual machine size of Standard_E4ds_v4 (4 cores, 32 GB RAM, 150 GB disk) which costs about $0.29/hour. Check out this tutorial if you require some assistance with creating a compute resource.
After submitting the pipeline job, navigate to Job details to see more information on the job and its execution. With the compute instance I used, it took about 5 minutes for the pipeline job to finish execution. This time may vary depending on the type of compute you use.
Step 6: Review the model metrics
The final step in this tutorial is to evaluate the trained model. By inspecting the Job overview, we can easily see some important metrics, such as mean absolute error, root mean squared error, and so on.
Importance/Benefits of ML Pipelines
As we come to an end, it is very important that I talk about the benefits of ML Pipelines and why you should use them either as a Data Scientist or as an ML Engineer.
- Efficiency: ML pipelines automate the flow of raw data through various stages, from collection to model deployment. This streamlines the process, saving time and effort.
- Easy Scheduling: Pipelines allow for scheduled executions and can be used to automate the periodic retraining of ML models in production environments. This is particularly important in CI/CD/CT (Continuous Integration/ Continuous Deployment/ Continuous Training).
- Consistency: By standardizing the workflow, pipelines ensure consistent data preprocessing, model training, and evaluation. Thus, the same pipeline can be used on various datasets and ML use cases. This consistency leads to more reliable results.
- Scalability: As data volumes grow, pipelines handle the complexity efficiently. They can handle large datasets without manual intervention.
Concluding Remarks
This article demonstrated the steps in building and executing a training pipeline on the Azure ML Studio and covered various important concepts in the world of ML pipelines. The second part of this article would cover the steps in deploying and consuming an ML pipeline on Azure. Please stay tuned.
If you made it to this section, I want to say a big thank you. I hope this article educated you as much as it educated me. Bye for now…