How to deploy ML models in Azure

Three of the countless ways for deployment in Azure

Hans Weda
rond blog
14 min readFeb 13, 2024

--

Introduction

Most of the reports and dashboards provide insights into the past: the last quarterly figures, annual reports or the number of customers in the last month. However, in many cases it is of interest to look ahead: what will probably happen in the months, quarters or years to come? It becomes even more interesting to calculate different scenarios and recommend the best strategy for your company or institution. The data analysis then moves up the value chain from descriptive, through predictive, towards prescriptive.

Most dashboarding software has no or little possibilities to integrate machine learning (ML) models. Recently options to integrate ML models in those software programs are appearing. These ML models are usually confined to rather standard types such as classification and regression models. While such standard types can be used in a lot of situations, there is also a vast group of real-world cases that require more targeted models, for instance:

These kind of more suitable models do not exist (yet) in many dashboarding software programs. In such cases the model can be custom-made and deployed through an API using a cloud service. The API can subsequently be consumed by the dashboard. In this blog, I am focussing on Azure, which is a frequently used cloud service, and discuss how a ML model can be deployed using an API hosted by Azure.

Azure cloud options

There are many options to deploy an ML model through an API in Azure, all with their own options, efficiency and cost. For this blog I have explored three options:

  1. Azure functions
  2. Azure webapp (by docker)
  3. Azure databricks

This list is by no means exclusive, many other options, among which Azure Machine Learning, are still on my to-do list.

Approach

We need to start building a ML model before we can actually deploy it. The run-of-the-mill regression or classification models are just too easy, and besides that, they are sometimes already existing in dashboarding software programs. Therefore I have chosen a Weibull Accelerated Failure Time model to model customer churn of the infamous Telcom dataset. This dataset includes information the demographics, account details and services of customers of a fictional telecom company:

  • Demographic info about customers, such as gender, age range, and if they have partners;
  • Customer account information such as the duration of being a customer, type of contract, payment method, and monthly charges;
  • Services that each customer has signed up for such as multiple phone lines, internet, online security, online backup, and device protection;
  • Whether the customer has left within the last month (churn).

The goal of the ML model is to predict when a customer is likely to churn and understand the driving factors for churn. Ultimately, as a company, one would like to tune the driving factors to minimize churn and retain the customer.

I skip the exploratory data analysis for now, as this has been done many times already for this dataset, and it is not the focus of the current blog. An example of building and determining the optimal model was detailed in a previous blog.

The code below shows the Python function to build the model. The pipeline consists of a sklearn OnehotEncoder followed by the lifelines Weibull model adapted towards sklearn.

def build_model(df: pd.DataFrame):
"""
Building a Weibull churn model

:param df: pd.Dataframe to build the model from
:return: trained model
"""

# splitting
random_state = 468

df_train, df_test = train_test_split(df, test_size=0.2, random_state=random_state)

# One-hot encoding
ohe = OneHotEncoder(sparse_output=False, drop="first")

dummies = [
'partner',
'onlinesecurity',
'onlinebackup',
'contract',
'paymentmethod',
'churn'
]

ct = ColumnTransformer(
transformers=[('encode_cats', ohe, dummies), ],
remainder='passthrough',
verbose_feature_names_out=False
).set_output(transform="pandas")

# Weibull fitting
# Please note that the sklean_adapter ceases to exist in lifelines version >= 0.28
# In that case the preprocessing and model building probably need to be applied separately,
# without use of the sklearn pipeline.
waft = sklearn_adapter(WeibullAFTFitter, event_col="churn_Yes", predict_method="predict_percentile")

# create pipeline
pl = Pipeline([
("preprocessing", ct),
("waft", waft())
])

pl.fit(df_train.drop("tenure", axis=1), df_train['tenure'])
print("Concordance index Weibull AFT: {}".format(pl.score(df_test.drop("tenure", axis=1), df_test["tenure"])))

# create final model
pl.fit(df.drop("tenure", axis=1), df["tenure"])

return pl

The resulting model is subsequently saved and served through an API using the Python FastApi package as follows:

# main.py

import pickle
from enum import Enum
from typing import List

import pandas as pd
from fastapi import FastAPI, Query
from fastapi.responses import RedirectResponse
from pydantic import BaseModel, Field
from lifelines.utils.sklearn_adapter import sklearn_adapter
from lifelines import WeibullAFTFitter


app = FastAPI(
title="Churn ML api from docker",
description=(
"An api to deploy a simple survival regression model on the Telcom dataset. "
),
version="0.1"
)

filename = 'churn_model.pickle'

# need to reload the lifelines fitter
sklearn_adapter(WeibullAFTFitter, event_col="churn_Yes", predict_method="predict_percentile")


class Partner(str, Enum):
yes = "Yes"
no = "No"


class OnlineSecurity(str, Enum):
yes = "Yes"
no = "No"


class OnlineBackup(str, Enum):
yes = "Yes"
no = "No"


class Churn(str, Enum):
yes = "Yes"
no = "No"


class Contract(str, Enum):
one_year = 'One year'
two_year = 'Two year'


class PaymentMethod(str, Enum):
creditcard = "Credit card (automatic)"
electronic = "Electronic check"
mailed = "Mailed check"


class Customer(BaseModel):
partner: Partner
onlinesecurity: OnlineSecurity
onlinebackup: OnlineBackup
contract: Contract
paymentmethod: PaymentMethod
churn: Churn
monthlycharges: float = Field(ge=0)
tenure: int = Field(ge=0)


@app.get("/", include_in_schema=False)
def home():
return RedirectResponse("/docs")


@app.post("/model_predict")
async def get_predictions(
customers: List[Customer],
percentile: float = Query(title="prediction", default=0.8, ge=0, le=1)
) -> List[float]:
"""
Return predictions for the Telcom dataset. The request body consists of a list of customers.
"""

# load the model from disk
loaded_model = pickle.load(open(filename, 'rb'))

# create data-frame from load
df = pd.DataFrame([a.dict() for a in customers])

# calculate predictions

# When the data.frame consists of one row, the prediction fails.
# therefore, a dirty trick:
if df.shape[0] == 1:
preds = loaded_model.predict(pd.concat([df]*2, ignore_index=True), p=percentile)[0]
else:
preds = loaded_model.predict(df, p=percentile)

# make sure preds is always a pandas Series
preds = pd.Series(preds)

# replace infinity
with pd.option_context('mode.use_inf_as_na', True):
preds = preds.fillna(-999)

return preds.tolist()

Note that this code defines the request body of the POST endpoint as a list of customers, defined by the Customer class. The Customer class in turn depends on many subsequent classes. This structure enforces the end-user to supply suitable data and returns an informative error in case of input mismatches.

Note also the dirty trick that is needed in case of a request body consisting of a single customer. The trick is needed because of a bug in lifelines.

When the API is started (locally) using the command uvicorn main:app --reload, the following documentation appears.

Now we have a functioning Fastapi-app, we can work out different ways to deploy this API in Azure.

Approach 1: Azure Web App

In this approach we create a docker container containing the Fastapi which is subsequently placed in an Azure Web App. This solution is portable; one could also easily deploy the docker container in any other cloud service, which is the big advantage of this approach.

For this approach we need two extra files: requirements.txt and Dockerfile. The requirements.txt specifies the extra Python packages that need to be installed in the Docker container. The exact versions to install need to match the list below rather precisely, otherwise our approach does not work.

# requirements.txt
fastapi>=0.100.0,<0.101.0
pandas>=1.5.0,<1.6.0
lifelines>=0.27.0,<0.28.0
uvicorn>=0.22.0,<0.23.0
scikit-learn==1.2.0

The Dockerfile specifies how the Docker container needs to be built.

# Dockerfile
FROM python:3.10

WORKDIR /code

COPY ./requirements.txt /code/requirements.txt

RUN pip install --no-cache-dir --upgrade -r /code/requirements.txt

# copy all files
COPY . /code/

# start the application
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "80"]

To get started

You need Docker and Azure CLI to get started here

  1. Install Docker
  2. Install Azure CLI

Setting up Azure

Before deploying to Azure, one needs to setup the Azure environment.
The Docker container containing the API-code will be constructed later.

  1. Start the Azure CLI (type `CMD` in the windows search bar) in the directory that contains the python code.
  2. Log in using the command az login with your own user credentials.
  3. Create a resource group using
    az group create — name great_resource_group_name — location westeurope
  4. Create an Azure container registry, which is needed to position the docker container into
    az acr create -n greatnewcontainerregistry -g great_resource_group_name — sku Basic — admin-enabled true
  5. Create an appservice plan. This is needed for the costs related to deploying the API.
    az appservice plan create -g great_resource_group_name -n great_appservice_plan — sku FREE — is-linux
  6. Create the webapp that makes the container available to the outside world
    az webapp create -g great_resource_group_name -p great_appservice_plan -n churn-api -i greatnewcontainerregistry.azurecr.io/churn

Note that the creation of the Azure resources can also be done using the Azure portal in a web browser. However, using the Azure CLI makes it easier to communicate and reproduce the needed steps.

Building docker

  1. Move to the deployment_docker folder.
  2. Build a docker image using docker build -t churn ..
  3. Start a docker container based on the survival docker image
    using docker run -dp 8000:80 churn.
  4. Verify the container works properly by accessing http://localhost:8000/.

Deploy to Azure

Steps needed to deploy on Azure (assuming the web-app and container registry have been created already):

  1. Retag the image by docker tag churn greatnewcontainerregistry.azurecr.io/churn.
  2. Log in at az login using your own credentials at the newly opened webpage
  3. Also login at the container registry (if needed) using az acr login — name greatnewcontainerregistry.
  4. Push to Azure container registry by docker push greatnewcontainerregistry.azurecr.io/churn.
  5. Verify that the container is successfully deployed by accessing https://churn-api.azurewebsites.net/

Notes

The version of sklearn and lifelines need to match pretty precisely, otherwise the model does not load and run well.

Approach 2: Azure Functions

In this approach we create a serverless Azure function to deploy the API.

For this approach, we need a couple of extra files as well. It is useful to split the Fastapi code and the Azure function code. The Fastapi code (in main.py, see above) can then be tested and debugged separately, e.g. by running uvicorn main:app --reload. The Fastapi code can then be imported in a rather short function_app.py. This script is the default starting point for Azure functions.

"""
function_app.py
Used for creating functionapp
"""

import azure.functions as func
from main import app as fast_app

# create the app as Azure function
app = func.AsgiFunctionApp(app=fast_app, http_auth_level=func.AuthLevel.ANONYMOUS)

The Azure function also needs the host.json file. Please note in particular to set the variable "routePrefix": "", for the API to run correctly.

{
"version": "2.0",
"logging": {
"applicationInsights": {
"samplingSettings": {
"isEnabled": true,
"excludedTypes": "Request"
}
}
},
"extensionBundle": {
"id": "Microsoft.Azure.Functions.ExtensionBundle",
"version": "[4.*, 5.0.0)"
},
"extensions": {
"http": {
"routePrefix": ""
}
}
}

To test the Azure function locally, the local.settings.json file needs to be created. Secret keys can also be set in here. Note that this file should be excluded for version control and also not be included in the files submitted to Azure function. This can be done in .funcignore:

.git*
.vscode
__azurite_db*__.json
__blobstorage__
__queuestorage__
local.settings.json
test
.venv

Finally we need a requirements.txt file to specify the extra Python packages that need to be installed.

# DO NOT include azure-functions-worker in this file
# The Python Worker is managed by Azure Functions platform
# Manually managing azure-functions-worker may cause unexpected issues

azure-functions>=1.15.0
fastapi>=0.100.0,<0.101.0
pandas>=1.5.0,<1.6.0
lifelines>=0.27.0,<0.28.0
uvicorn>=0.22.0,<0.23.0
scikit-learn==1.2.0

To get started

You need Azure CLI and Azure func to get started:

  1. Install Azure CLI.
  2. Install the Azure Functions Core Tools Azure func

Setting up Azure

Before deploying to Azure, one needs to setup the Azure environment.

  1. Start with the CLI (type CMD in the windows search bar) in the directory deployment_function.
  2. Log in using the command az login with your own user credentials.
  3. Create a resource group usingaz group create --name super_resource_group_name --location westeurope
  4. Create an Azure storage account in the resource group, which is needed for storing the files.az storage account create -n superstorageaccount9 -g super_resource_group_name --location westeurope --sku Standard_LRS

Building environment

  1. Move to the deployment_function folder.
  2. Build a pip environment using py -m venv .venv.
  3. Activate the environment using .venv\Scripts\activate.
  4. Install the needed packages by pip install -r requirements.txt.
  5. Check whether the function works locally using func start, the locally deployed API can be found here http://localhost:7071/

Deploy to Azure

Steps needed to deploy on Azure Functions (assuming the resource group and storage account are already available):

  1. The function app can be created using
    az functionapp create --resource-group super_resource_group_name --consumption-plan-location westeurope --runtime python --runtime-version 3.10 --functions-version 4 --name superfunctionname --os-type linux --storage-account superstorageaccount9
  2. The function can then be pushed to Azure by func azure functionapp publish superfunctionname
  3. Important. The Azure function has configuration settings. For the API to work, one has to set the AzureWebJobsFeatureFlags to EnableWorkerIndexing. Alternatively, one can use the --publish-local-settings option when you publish the app to replace the configuration in Azure with the local settings in local.settings.json.

Notes

The version of sklearn and lifelines need to match pretty precisely, otherwise the model does not load and run well.

Approach 3: Azure Databricks

One can also use Azure databricks to build and deploy a model. Azure databricks provides an integrated development and deployment environment. The code is developed in Jupyter notebooks through a web browser. Azure databricks itself presumably creates a docker container behind the scenes for deployment.

Setup databricks

  1. Create databricks in a new resource group using the Azure portal. Note that you need at least the ‘premium’ pricing tier — deployment does not work in the trial version.
  2. Open Databricks in the portal
  3. Launch the workspace
  4. Create a personal compute (needed for training and building the model). Make sure it is a ‘ML’-compute such that the mlflow library works directly, e.g. 13.2 ML (Scala 2.12, Spark 3.4.0).
  5. Install the needed libraries in the compute (seems to be most practical to me. One can also install them in the notebook, but then they are only available in the notebook and have to be reinstalled at each new run). We need to install lifelines==0.27.8, scikit-learn==1.3.0 (at least).

Build the model within databricks

The data loading and cleaning are pretty standard, the same as in the previous approaches. The building of the final model is somewhat different; we have to use the mlflow methods to properly build and log the model.

from mlflow.models.signature import infer_signature
from mlflow.utils.environment import _mlflow_conda_env
import cloudpickle
import sklearn
import lifelines

# The predict method of lifelines sklearn_adapter needs to be recreated.
# The following code creates a wrapper function, SklearnModelWrapper, that does that.

class SklearnModelWrapper(mlflow.pyfunc.PythonModel):
def __init__(self, model):
self.model = model

# need to reload the lifelines fitter
sklearn_adapter(WeibullAFTFitter, event_col="churn_Yes", predict_method="predict_percentile")

def predict(self, context, model_input):
return self.model.predict(model_input)

with mlflow.start_run(run_name="final_waft") as run:

# create final model
pl.fit(df.drop("tenure", axis=1), df["tenure"])

# Save the run information to register the model later
waft_uri = run.info.artifact_uri

# Log the model with a signature that defines the schema of the model's inputs and outputs.
# When the model is deployed, this signature will be used to validate inputs.
signature = infer_signature(df.drop("tenure", axis=1), pl.predict(df.drop("tenure", axis=1)))

# create the wrapped model
wrappedModel = SklearnModelWrapper(pl)

# MLflow contains utilities to create a conda environment used to serve models.
# The necessary dependencies are added to a conda.yaml file which is logged along with the model.
conda_env = _mlflow_conda_env(
additional_conda_deps=None,
additional_pip_deps=[
"cloudpickle=={}".format(cloudpickle.__version__),
"scikit-learn=={}".format(sklearn.__version__),
"lifelines=={}".format(lifelines.__version__),
"pandas=={}".format(pd.__version__)
],
additional_conda_channels=None,
)

# Log model
mlflow.pyfunc.log_model("waft_model_final", python_model=wrappedModel, conda_env=conda_env, signature=signature)

Now we need to register the model needs to be deployed later on.

By registering this model in Model Registry, the model can easily be referenced from anywhere within Databricks. Although the registering can also be done in the UI as well, for reproducibility reasons we will register the model programmatically in this blog.

import time

print(mlflow.search_runs(filter_string='tags.mlflow.runName = "final_waft"'))

# take the latest run
run_id = mlflow.search_runs(filter_string='tags.mlflow.runName = "final_waft"').iloc[0].run_id

# If you see the error "PERMISSION_DENIED: User does not have any permission level assigned to the registered model",
# the cause may be that a model already exists with the same name. Try using a different name.
model_name = "churn_prediction"
model_version = mlflow.register_model(f"runs:/{run_id}/waft_model_final", model_name)

# Registering the model takes a few seconds, so add a small delay
time.sleep(15)

The model should now be visible in the Models tab of the UI — click the Models icon in the left sidebar.

Next, we can transition this model to production and load it into this notebook from Model Registry. This transition seems to be optional, not absolutely necessary for the next steps, but nice to do.

from mlflow.tracking import MlflowClient

client = MlflowClient()
client.transition_model_version_stage(
name=model_name,
version=model_version.version,
stage="Production",
)

Deploy the model from Databricks

Now we can deploy the model to an endpoint such that other entities can make use of it using a REST API. To do so one needs a key or token to issue requests to your model endpoint. You can generate a token from the User Settings page (click Settings in the left sidebar). The token needs to be copied to the environment as shown below.

import os
os.environ["DATABRICKS_TOKEN"] = "<replace by secret token>"

To deploy the model, take the following steps

  1. Click Models in the left sidebar and navigate to the registered waft model. Click the serving tab, and then click Enable Serving.
  2. Then, under Call The Model, click the Python button to display a Python code snippet to issue requests. Copy the code into this notebook. It should look similar to the code below.
import os
import requests
import numpy as np
import pandas as pd
import json

def create_tf_serving_json(data):
return {'inputs': {name: data[name].tolist() for name in data.keys()} if isinstance(data, dict) else data.tolist()}

def score_model(dataset):
url = 'https://adb-3435937677882703.3.azuredatabricks.net/serving-endpoints/waft/invocations'
headers = {'Authorization': f'Bearer {os.environ.get("DATABRICKS_TOKEN")}',
'Content-Type': 'application/json'}
ds_dict = {'dataframe_split': dataset.to_dict(orient='split')} if isinstance(dataset, pd.DataFrame) else create_tf_serving_json(dataset)
data_json = json.dumps(ds_dict, allow_nan=True)
response = requests.request(method='POST', headers=headers, url=url, data=data_json)
if response.status_code != 200:
raise Exception(f'Request failed with status {response.status_code}, {response.text}')

return response.json()

Notes

While for more standard ML models one can use themlflow.sklearn, this seems to fail for our churn model. To deploy a custom model one needs to use the mlflow.pyfunc method with a custom class, see this link.

Resources

The code and text for this example using Databricks have been inspired by the following links:

Comparison

The solutions for deployment can be compared on different aspects, such as cost, speed, and ease of implementation and deployment. These aspects are not independent. More compute power generally results in faster API responses but also comes at higher cost. Integrated environments like data bricks facilitate easier deployment and saves you the trouble of building a docker container yourselves. The downside is that they tend to be more expensive.

From a performance perspective, there does not seem to be a lot of difference between the approaches. Of course, as said, the response time depends heavily on the chosen options of the compute for the deployment, which also come at different costs. For the options chosen in this blog Data Bricks provides the fastest response.

The main advantages of Azure functions are the rather simple and fast way of deployment. The downside is that the deployment can not be easily moved to another cloud service provider. Containerized solutions in Azure Web App are somewhat more difficult to implement but can be ported to other places if needed. The downside is that for Azure function and Azure Web App options you have to work out the API yourselves, including key or password based security options. For the Azure Data Bricks option that is already provided out-of-the-box, but comes with higher costs.

If you have to deploy a non-standard ML solution, as in the case described in this blog, each solution requires some extra tweaking to get it working. In my opinion I find the Azure function and Azure Web App somewhat easier to work with. On the other hand, Azure Data Bricks comes with extra build-in features such as version control of the deployed models.

Closing words

The code for the explored methods has been made available on Github. Please let me know if you have any questions or suggestions regarding this blog!

--

--