Azure Machine Learning

7 min readMay 29, 2023

End-to-End AzureML experiment steps — Notebook

Introduction

Azure Machine Learning is a powerful cloud-based service provided by Microsoft that empowers organizations to build, deploy, and manage machine learning models at scale. With Azure Machine Learning, users can leverage its robust set of tools and services to streamline the end-to-end machine learning process. It offers a user-friendly interface that enables data scientists and developers to collaborate effectively, experiment with various algorithms, and create highly accurate predictive models.

It provides a range of features such as automated machine learning, which simplifies the model building process by automatically selecting the best algorithms and hyperparameters. Azure Machine Learning supports the deployment and management of models in various environments, including edge devices, Azure IoT, and Kubernetes clusters. This enables organizations to seamlessly transition their models from development to production, ensuring scalability and reliability.

With built-in monitoring and logging capabilities, users can continuously track model performance, detect anomalies, and retrain models as needed, ensuring optimal accuracy and efficiency. Azure Machine Learning provides a comprehensive solution for organizations looking to harness the power of machine learning and drive innovation in their data-driven endeavors.

Azure ML Notebook

In the Azure ML notebook, we are going to follow these steps to successfully run a machine learning experiment end-to-end —

Install dependencies
Import libraries
Connect to workspace
Create a compute cluster
Create a custom environment
— Create an environment file
— Register environment
Create a training script
Configure job/command
Submit the job
Output and results
Create a new online endpoint
Deploy the model to the endpoint
— Take the latest model
— Create an online “blue” deployment
Test the model
— Create a test data file
— Test the “blue” deployment
Delete the online endpoint Kubernetes cluster

You can always configure the compute cluster, environment, training script, job, online endpoint, test data file as per your requirements.

Install dependencies

Azure ML core libraries needs to be installed.

!pip install --upgrade azureml-core
!pip install azure-ai-ml
!pip install azure-identity

Import libraries

We need several libraries to interact with the Azure ML workspace, compute, environment, model, data, job and endpoint.

from azure.ai.ml import MLClient                          # Handle to the workspace
from azure.identity import DefaultAzureCredential         # Authentication package
from azure.identity import InteractiveBrowserCredential   # Authentication package
from azure.ai.ml.entities import AmlCompute               # Compute
from azure.ai.ml.entities import Environment              # Environment
from azure.ai.ml.entities import Model                    # Model
from azure.ai.ml import command                           # Job/command
from azure.ai.ml import Input                             # Data input
from azure.ai.ml.entities import ManagedOnlineEndpoint    # Manage endpoint 
from azure.ai.ml.entities import ManagedOnlineDeployment  # Manage endpoint
import uuid                                               # Create UUID
import os                                                 # System

Connect to workspace

First, we authenticate and connect to AzureML workspace with DefaultAzureCredential and MLClient.

# Authenticate
credential = DefaultAzureCredential()                     # default credential

# Get a handle to the workspace
ml_client = MLClient(
    credential=credential,
    subscription_id="<subscription-id>",
    resource_group_name="<resource-group>",
    workspace_name="<workspace>",
)

Create a compute cluster

Azure ML Compute Cluster is a managed service in Microsoft Azure that enables users to create and manage clusters for executing machine learning workloads. It offers scalability, customization, and auto-scaling capabilities, optimizing resource utilization. The cluster simplifies distributed training and integrates with popular machine learning frameworks, providing an efficient infrastructure for model development and experimentation.

# Compute cluster name
cpu_compute_target = "cpu-cluster"

try:
    # Check if the compute target already exists
    cpu_cluster = ml_client.compute.get(cpu_compute_target)
    print(
        f"Cluster {cpu_compute_target} already exists! Reusing it..."
    )

except Exception:
    # Create the Azure ML compute object with the intended parameters
    cpu_cluster = AmlCompute(
        name = cpu_compute_target,    
        type = "amlcompute",                  # Azure ML Compute is the on-demand VM service
        size = "STANDARD_DS3_V2",             # VM Family
        min_instances = 0,                    # Minimum running nodes when there is no job running
        max_instances = 4,                    # Nodes in cluster
        idle_time_before_scale_down = 180,    # How many seconds will the node running after the job termination
        tier="Dedicated",                     # Dedicated or LowPriority. The latter is cheaper but there is a chance of job termination
    )
    
    print(f"Creating new AzureML compute cluster {cpu_cluster.name} with compute size {cpu_cluster.size} ...")
    cpu_cluster = ml_client.compute.begin_create_or_update(cpu_cluster)

Create a custom environment

Azure ML custom environment allows users to create tailored runtime environments for their machine learning projects. It enables customization of dependencies, packages, and configurations, ensuring reproducibility and flexibility. It supports various use cases such as training, inference, and script execution, enhancing collaboration and control over the software stack.

# Create a new directory for environment file
dependencies_dir = "./dependencies"
os.makedirs(dependencies_dir, exist_ok=True)

Create an environment file

# Create an environment file
%%writefile {dependencies_dir}/conda.yml
name: model-env
channels:
  - conda-forge
dependencies:
  - python=3.8
  - numpy=1.21.2
  - pip=21.2.4
  - scikit-learn=0.24.2
  - scipy=1.7.1
  - pandas>=1.1,<1.2
  - pip:
    - inference-schema[numpy-support]==1.3.0
    - xlrd==2.0.1
    - mlflow== 1.26.1
    - azureml-mlflow==1.42.0
    - psutil>=5.8,<5.9
    - tqdm>=4.59,<4.60
    - ipykernel~=6.0
    - matplotlib

Create and register environment

# Create and register environment
custom_env_name = "aml-scikit-learn"

pipeline_job_env = Environment(
    name=custom_env_name,
    description="Custom environment",
    tags={"scikit-learn": "0.24.2"},
    conda_file=os.path.join(dependencies_dir, "conda.yml"),
    image="mcr.microsoft.com/azureml/openmpi3.1.2-ubuntu18.04:latest",
)

pipeline_job_env = ml_client.environments.create_or_update(pipeline_job_env)
print(f"Environment with name {pipeline_job_env.name} is registered to workspace, the environment version is {pipeline_job_env.version}")

Create a training script

The training script is our machine learning model training code including data manipulation, encoding, train/test splitting, scaling and hyperparameter initialization.

# Create a new directory for training script
train_src_dir = "./src"
os.makedirs(train_src_dir, exist_ok=True)

Either we can write our training script in the notebook itself or we can also upload the training script in the above specified directory directly.

%%writefile {train_src_dir}/main.py

Sample training script …

# Import libraries
import os
import argparse
import pandas as pd
import mlflow
import mlflow.sklearn
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.metrics import classification_report
from sklearn.model_selection import train_test_split

# Main function
def main():
    """Main function of the script."""

    # input and output arguments
    parser = argparse.ArgumentParser()
    parser.add_argument("--data", type=str, help="path to input data")
    parser.add_argument("--test_train_ratio", type=float, required=False, default=0.25)
    parser.add_argument("--n_estimators", required=False, default=100, type=int)
    parser.add_argument("--learning_rate", required=False, default=0.1, type=float)
    parser.add_argument("--registered_model_name", type=str, help="model name")
    args = parser.parse_args()
   
    # Start logging
    mlflow.start_run()

    # Enable autologging
    mlflow.sklearn.autolog()

    # Read data
    print(" ".join(f"{k}={v}" for k, v in vars(args).items()))
    print("input data:", args.data)
    df = pd.read_excel(args.data, header=1, index_col=0)

    # Log metrics
    mlflow.log_metric("num_samples", df.shape[0])
    mlflow.log_metric("num_features", df.shape[1] - 1)

    # Split data
    train_df, test_df = train_test_split(df, test_size=args.test_train_ratio)

    # Extracting the label column
    y_train = train_df.pop("<column-name>")

    # Convert the dataframe values to array
    X_train = train_df.values

    # Extracting the label column
    y_test = test_df.pop("<column-name>")

    # Convert the dataframe values to array
    X_test = test_df.values

    # Train model
    print(f"Training with data of shape {X_train.shape}")
    clf = GradientBoostingClassifier(
        n_estimators=args.n_estimators, learning_rate=args.learning_rate
    )
    clf.fit(X_train, y_train)

    # Predict results
    y_pred = clf.predict(X_test)

    # Classification report
    print(classification_report(y_test, y_pred))

    # Registering the model to the workspace
    print("Registering the model via MLFlow")
    mlflow.sklearn.log_model(
        sk_model=clf,
        registered_model_name=args.registered_model_name,
        artifact_path=args.registered_model_name,
    )

    # Saving the model to a file
    mlflow.sklearn.save_model(
        sk_model=clf,
        path=os.path.join(args.registered_model_name, "trained_model"),
    )
    
    # Stop Logging
    mlflow.end_run()

if __name__ == "__main__":
    main()

Configure job/ command

Lets create a job with hyperparameters and other variables to train our model.

# Job configuration
registered_model_name = "<model-name>"

job = command(
    inputs=dict(
        data=Input(
            type="uri_file",
            path="<file-path>",
        ),
        test_train_ratio=0.2,
        learning_rate=0.25,
        registered_model_name=registered_model_name,
    ),
    code="./src/",  # location of source code
    command="python main.py --data ${{inputs.data}} --test_train_ratio ${{inputs.test_train_ratio}} --learning_rate ${{inputs.learning_rate}} --registered_model_name ${{inputs.registered_model_name}}",
    environment="aml-scikit-learn@latest",
    compute="cpu-cluster",
    experiment_name="<experiment-name>",
    display_name="<display-name>"
)

Submit the job

Submit the job with ml_client.create_or_update.

# Submit job
ml_client.create_or_update(job)

Outputs and results

Check the output of the model and prediction result at your end.

Create a new online endpoint

Azure ML online endpoint is a feature in Azure Machine Learning that allows you to deploy and host machine learning models as web services, making them accessible for real-time predictions and inferences. It provides a scalable and reliable infrastructure to serve model predictions through HTTP or REST endpoints.

# Creating a unique name for the endpoint
online_endpoint_name = "<name>-endpoint-" + str(uuid.uuid4())[:8]

# Create an online endpoint
endpoint = ManagedOnlineEndpoint(
    name=online_endpoint_name,
    description="this is an online endpoint",
    auth_mode="key",
    tags={
        "key_1": "value_1",
        "key_2": "value_2",
    },
)
endpoint = ml_client.online_endpoints.begin_create_or_update(endpoint).result()

# Check endpoint status
print(f"Endpoint {endpoint.name} provisioning state: {endpoint.provisioning_state}")

# Retrieve endpoint
endpoint = ml_client.online_endpoints.get(name=online_endpoint_name)
print(f'Endpoint "{endpoint.name}" with provisioning state "{endpoint.provisioning_state}" is retrieved')

Deploy the model to the endpoint

By following these steps, we can deploy our trained model to an online endpoint in Azure Machine Learning, to access and utilize the model’s predictions through an HTTP-based interface.

Take the latest model

# Get latest model version
latest_model_version = max(
    [int(m.version) for m in ml_client.models.list(name=registered_model_name)]
)

# Latest model to deploy
model = ml_client.models.get(name=registered_model_name, version=latest_model_version)

Create an online deployment

# Create an online deployment.
blue_deployment = ManagedOnlineDeployment(
    name="blue",
    endpoint_name=online_endpoint_name,
    model=model,
    instance_type="Standard_DS3_v2",
    instance_count=1,
)
blue_deployment = ml_client.begin_create_or_update(blue_deployment).result()

Test the model

After deployment, we can send HTTP or REST requests to the endpoint URL, providing input data to get predictions or inferences in real-time. Create a test data file in a JSON format and invoke the online endpoint.

Sample test data file …

Create test data file

# Create a new directory for deployment
deploy_dir = "./deploy"
os.makedirs(deploy_dir, exist_ok=True)

# Create the sample request json
%%writefile {deploy_dir}/sample-request.json
{
  "input_data": {
    "columns": [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22],
    "index": [0, 1],
    "data": [
            [20000,2,2,1,24,2,2,-1,-1,-2,-2,3913,3102,689,0,0,0,0,689,0,0,0,0],
            [10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 10, 9, 8]
        ]
  }
}

Test the blue deployment

# Test the blue deployment with some sample data
ml_client.online_endpoints.invoke(
    endpoint_name=online_endpoint_name,
    request_file="./deploy/sample-request.json",
    deployment_name="blue",
)

Clean up resources

Delete the online endpoint due to the associated cost with the running Kubernetes cluster.

# Delete the endpoint
ml_client.online_endpoints.begin_delete(name=online_endpoint_name)

Full Code at GitHub

You can get the full code in my GitHub repository.

GitHub - shuv50/Azure-Machine-Learning

You can't perform that action at this time. You signed in with another tab or window. You signed out in another tab or…

github.com