Azure Machine Learning

Shuvrajyoti Debroy
7 min readMay 29, 2023

--

End-to-End AzureML experiment steps — Notebook

Image Source: Salon

Introduction

Azure Machine Learning is a powerful cloud-based service provided by Microsoft that empowers organizations to build, deploy, and manage machine learning models at scale. With Azure Machine Learning, users can leverage its robust set of tools and services to streamline the end-to-end machine learning process. It offers a user-friendly interface that enables data scientists and developers to collaborate effectively, experiment with various algorithms, and create highly accurate predictive models.

It provides a range of features such as automated machine learning, which simplifies the model building process by automatically selecting the best algorithms and hyperparameters. Azure Machine Learning supports the deployment and management of models in various environments, including edge devices, Azure IoT, and Kubernetes clusters. This enables organizations to seamlessly transition their models from development to production, ensuring scalability and reliability.

With built-in monitoring and logging capabilities, users can continuously track model performance, detect anomalies, and retrain models as needed, ensuring optimal accuracy and efficiency. Azure Machine Learning provides a comprehensive solution for organizations looking to harness the power of machine learning and drive innovation in their data-driven endeavors.

Azure ML Notebook

In the Azure ML notebook, we are going to follow these steps to successfully run a machine learning experiment end-to-end —

Install dependencies
Import libraries
Connect to workspace
Create a compute cluster
Create a custom environment
— Create an environment file
— Register environment
Create a training script
Configure job/command
Submit the job
Output and results
Create a new online endpoint
Deploy the model to the endpoint
— Take the latest model
— Create an online “blue” deployment
Test the model
— Create a test data file
— Test the “blue” deployment
Delete the online endpoint Kubernetes cluster

You can always configure the compute cluster, environment, training script, job, online endpoint, test data file as per your requirements.

Install dependencies

Azure ML core libraries needs to be installed.

!pip install --upgrade azureml-core
!pip install azure-ai-ml
!pip install azure-identity

Import libraries

We need several libraries to interact with the Azure ML workspace, compute, environment, model, data, job and endpoint.

from azure.ai.ml import MLClient                          # Handle to the workspace
from azure.identity import DefaultAzureCredential # Authentication package
from azure.identity import InteractiveBrowserCredential # Authentication package
from azure.ai.ml.entities import AmlCompute # Compute
from azure.ai.ml.entities import Environment # Environment
from azure.ai.ml.entities import Model # Model
from azure.ai.ml import command # Job/command
from azure.ai.ml import Input # Data input
from azure.ai.ml.entities import ManagedOnlineEndpoint # Manage endpoint
from azure.ai.ml.entities import ManagedOnlineDeployment # Manage endpoint
import uuid # Create UUID
import os # System

Connect to workspace

First, we authenticate and connect to AzureML workspace with DefaultAzureCredential and MLClient.

# Authenticate
credential = DefaultAzureCredential() # default credential

# Get a handle to the workspace
ml_client = MLClient(
credential=credential,
subscription_id="<subscription-id>",
resource_group_name="<resource-group>",
workspace_name="<workspace>",
)

Create a compute cluster

Azure ML Compute Cluster is a managed service in Microsoft Azure that enables users to create and manage clusters for executing machine learning workloads. It offers scalability, customization, and auto-scaling capabilities, optimizing resource utilization. The cluster simplifies distributed training and integrates with popular machine learning frameworks, providing an efficient infrastructure for model development and experimentation.

# Compute cluster name
cpu_compute_target = "cpu-cluster"

try:
# Check if the compute target already exists
cpu_cluster = ml_client.compute.get(cpu_compute_target)
print(
f"Cluster {cpu_compute_target} already exists! Reusing it..."
)

except Exception:
# Create the Azure ML compute object with the intended parameters
cpu_cluster = AmlCompute(
name = cpu_compute_target,
type = "amlcompute", # Azure ML Compute is the on-demand VM service
size = "STANDARD_DS3_V2", # VM Family
min_instances = 0, # Minimum running nodes when there is no job running
max_instances = 4, # Nodes in cluster
idle_time_before_scale_down = 180, # How many seconds will the node running after the job termination
tier="Dedicated", # Dedicated or LowPriority. The latter is cheaper but there is a chance of job termination
)

print(f"Creating new AzureML compute cluster {cpu_cluster.name} with compute size {cpu_cluster.size} ...")
cpu_cluster = ml_client.compute.begin_create_or_update(cpu_cluster)

Create a custom environment

Azure ML custom environment allows users to create tailored runtime environments for their machine learning projects. It enables customization of dependencies, packages, and configurations, ensuring reproducibility and flexibility. It supports various use cases such as training, inference, and script execution, enhancing collaboration and control over the software stack.

# Create a new directory for environment file
dependencies_dir = "./dependencies"
os.makedirs(dependencies_dir, exist_ok=True)

Create an environment file

# Create an environment file
%%writefile {dependencies_dir}/conda.yml
name: model-env
channels:
- conda-forge
dependencies:
- python=3.8
- numpy=1.21.2
- pip=21.2.4
- scikit-learn=0.24.2
- scipy=1.7.1
- pandas>=1.1,<1.2
- pip:
- inference-schema[numpy-support]==1.3.0
- xlrd==2.0.1
- mlflow== 1.26.1
- azureml-mlflow==1.42.0
- psutil>=5.8,<5.9
- tqdm>=4.59,<4.60
- ipykernel~=6.0
- matplotlib

Create and register environment

# Create and register environment
custom_env_name = "aml-scikit-learn"

pipeline_job_env = Environment(
name=custom_env_name,
description="Custom environment",
tags={"scikit-learn": "0.24.2"},
conda_file=os.path.join(dependencies_dir, "conda.yml"),
image="mcr.microsoft.com/azureml/openmpi3.1.2-ubuntu18.04:latest",
)

pipeline_job_env = ml_client.environments.create_or_update(pipeline_job_env)
print(f"Environment with name {pipeline_job_env.name} is registered to workspace, the environment version is {pipeline_job_env.version}")

Create a training script

The training script is our machine learning model training code including data manipulation, encoding, train/test splitting, scaling and hyperparameter initialization.

# Create a new directory for training script
train_src_dir = "./src"
os.makedirs(train_src_dir, exist_ok=True)

Either we can write our training script in the notebook itself or we can also upload the training script in the above specified directory directly.

%%writefile {train_src_dir}/main.py

Sample training script …

# Import libraries
import os
import argparse
import pandas as pd
import mlflow
import mlflow.sklearn
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.metrics import classification_report
from sklearn.model_selection import train_test_split

# Main function
def main():
"""Main function of the script."""

# input and output arguments
parser = argparse.ArgumentParser()
parser.add_argument("--data", type=str, help="path to input data")
parser.add_argument("--test_train_ratio", type=float, required=False, default=0.25)
parser.add_argument("--n_estimators", required=False, default=100, type=int)
parser.add_argument("--learning_rate", required=False, default=0.1, type=float)
parser.add_argument("--registered_model_name", type=str, help="model name")
args = parser.parse_args()

# Start logging
mlflow.start_run()

# Enable autologging
mlflow.sklearn.autolog()

# Read data
print(" ".join(f"{k}={v}" for k, v in vars(args).items()))
print("input data:", args.data)
df = pd.read_excel(args.data, header=1, index_col=0)

# Log metrics
mlflow.log_metric("num_samples", df.shape[0])
mlflow.log_metric("num_features", df.shape[1] - 1)

# Split data
train_df, test_df = train_test_split(df, test_size=args.test_train_ratio)

# Extracting the label column
y_train = train_df.pop("<column-name>")

# Convert the dataframe values to array
X_train = train_df.values

# Extracting the label column
y_test = test_df.pop("<column-name>")

# Convert the dataframe values to array
X_test = test_df.values

# Train model
print(f"Training with data of shape {X_train.shape}")
clf = GradientBoostingClassifier(
n_estimators=args.n_estimators, learning_rate=args.learning_rate
)
clf.fit(X_train, y_train)

# Predict results
y_pred = clf.predict(X_test)

# Classification report
print(classification_report(y_test, y_pred))

# Registering the model to the workspace
print("Registering the model via MLFlow")
mlflow.sklearn.log_model(
sk_model=clf,
registered_model_name=args.registered_model_name,
artifact_path=args.registered_model_name,
)

# Saving the model to a file
mlflow.sklearn.save_model(
sk_model=clf,
path=os.path.join(args.registered_model_name, "trained_model"),
)

# Stop Logging
mlflow.end_run()

if __name__ == "__main__":
main()

Configure job/ command

Lets create a job with hyperparameters and other variables to train our model.

# Job configuration
registered_model_name = "<model-name>"

job = command(
inputs=dict(
data=Input(
type="uri_file",
path="<file-path>",
),
test_train_ratio=0.2,
learning_rate=0.25,
registered_model_name=registered_model_name,
),
code="./src/", # location of source code
command="python main.py --data ${{inputs.data}} --test_train_ratio ${{inputs.test_train_ratio}} --learning_rate ${{inputs.learning_rate}} --registered_model_name ${{inputs.registered_model_name}}",
environment="aml-scikit-learn@latest",
compute="cpu-cluster",
experiment_name="<experiment-name>",
display_name="<display-name>"
)

Submit the job

Submit the job with ml_client.create_or_update.

# Submit job
ml_client.create_or_update(job)

Outputs and results

Check the output of the model and prediction result at your end.

Create a new online endpoint

Azure ML online endpoint is a feature in Azure Machine Learning that allows you to deploy and host machine learning models as web services, making them accessible for real-time predictions and inferences. It provides a scalable and reliable infrastructure to serve model predictions through HTTP or REST endpoints.

# Creating a unique name for the endpoint
online_endpoint_name = "<name>-endpoint-" + str(uuid.uuid4())[:8]

# Create an online endpoint
endpoint = ManagedOnlineEndpoint(
name=online_endpoint_name,
description="this is an online endpoint",
auth_mode="key",
tags={
"key_1": "value_1",
"key_2": "value_2",
},
)
endpoint = ml_client.online_endpoints.begin_create_or_update(endpoint).result()

# Check endpoint status
print(f"Endpoint {endpoint.name} provisioning state: {endpoint.provisioning_state}")

# Retrieve endpoint
endpoint = ml_client.online_endpoints.get(name=online_endpoint_name)
print(f'Endpoint "{endpoint.name}" with provisioning state "{endpoint.provisioning_state}" is retrieved')

Deploy the model to the endpoint

By following these steps, we can deploy our trained model to an online endpoint in Azure Machine Learning, to access and utilize the model’s predictions through an HTTP-based interface.

Take the latest model

# Get latest model version
latest_model_version = max(
[int(m.version) for m in ml_client.models.list(name=registered_model_name)]
)

# Latest model to deploy
model = ml_client.models.get(name=registered_model_name, version=latest_model_version)

Create an online deployment

# Create an online deployment.
blue_deployment = ManagedOnlineDeployment(
name="blue",
endpoint_name=online_endpoint_name,
model=model,
instance_type="Standard_DS3_v2",
instance_count=1,
)
blue_deployment = ml_client.begin_create_or_update(blue_deployment).result()

Test the model

After deployment, we can send HTTP or REST requests to the endpoint URL, providing input data to get predictions or inferences in real-time. Create a test data file in a JSON format and invoke the online endpoint.

Sample test data file …

Create test data file

# Create a new directory for deployment
deploy_dir = "./deploy"
os.makedirs(deploy_dir, exist_ok=True)

# Create the sample request json
%%writefile {deploy_dir}/sample-request.json
{
"input_data": {
"columns": [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22],
"index": [0, 1],
"data": [
[20000,2,2,1,24,2,2,-1,-1,-2,-2,3913,3102,689,0,0,0,0,689,0,0,0,0],
[10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 10, 9, 8]
]
}
}

Test the blue deployment

# Test the blue deployment with some sample data
ml_client.online_endpoints.invoke(
endpoint_name=online_endpoint_name,
request_file="./deploy/sample-request.json",
deployment_name="blue",
)

Clean up resources

Delete the online endpoint due to the associated cost with the running Kubernetes cluster.

# Delete the endpoint
ml_client.online_endpoints.begin_delete(name=online_endpoint_name)

Full Code at GitHub

You can get the full code in my GitHub repository.

--

--