End-to-End Machine Learning from Databricks to Containerized API Hosting

This article will describe an end-to-end machine learning example involving Databricks and containerized model hosting with Azure Container Apps or in simpler terms, bringing MLOps to the people.

Viktor Sjölind
If Technology
8 min readDec 21, 2022

--

Introduction

Machine Learning Operations (MLOps) is a set of practices that aim to improve the efficiency and reliability of deploying machine learning models. Some of the focus areas for these practices are automations, reproducibility, scalability, deployments, and collaboration between people involved in the processes. The aim of this article is to demonstrate a technical implementation of MLOps practices where a machine learning model is trained in Databricks, packaging of the model and a REST API into a Docker image, and finally deploying everything to a container orchestration service for online inference.

Worth noting is that this article is not intended to demonstrate e.g. extensive access management and network security mechanisms that would be required for a production grade solution. It is, however, possible for the eager reader to further develop this example implementation.

Overview

Components of the project

The components of this project will consist of:

  • Data for training and evaluating the model.
  • Data analysis, data preparation, model training, model evaluation, and model storage inside Databricks.
  • REST API developed with Python and the FastAPI framework.
  • Azure DevOps Git Repositories for version control of the code bases.
  • Azure DevOps Pipelines for packaging and deploying the model and REST API into a container image.
  • Azure Container Registry for storing the container images.
  • Azure Container Apps for hosting containers.

Model Development

Databricks is a platform that provides tools for data analysis and model development among other data related usage areas. Databricks provides a managed MLflow service and collaborative Python platform, which will be used for data preparation, model training, and model evaluation.

The data we will work with is from Capital Bikeshare and we will develop a model that predicts the number of bike rentals. The first step is to load the data and start analyzing the different columns. The value we want to predict (label) is the rentals column. We also have numeric columns related to e.g., temperature and categorical columns such as the weather situation.

# Explore and analyze the data, understand what are the important features
# that correlate to the label
import pandas as pd

data = pd.read_csv('../daily-bike-share.csv')
data.head()
Daily biking data set

The goal of the analysis is to find features that are useful for the prediction. The Pandas framework can be used to calculate a correlation value between columns. We can also create a scatter plot to visually confirm that to some extent there is a correlation between e.g. higher temperature and higher number of bike rentals.

import matplotlib.pyplot as plt

fig = plt.figure()
ax = fig.gca()
correlation = data['temp'].corr(data['rentals'])
ax.set_title(f'Correlation: {correlation}')
plt.scatter(x=data['temp'], y=data['rentals'])
plt.xlabel('Temperature')
plt.ylabel('Rentals')
plt.show()
Correlation between temperature and rentals

Performing a similar analysis on one of the categorical features; the weather situation, we can draw a conclusion that clear weather also correlates with higher number of bike rentals.

# Weathersit: 1 = clear, 2= mist/cloud, 3 = light rain/snow
fig = plt.figure(figsize=(10, 6))
ax = fig.gca()
bike_data.boxplot(column = 'rentals', by = 'weathersit', ax = ax)
ax.set_title('Label by Weather situation')
ax.set_ylabel("Bike Rentals")
plt.show()

Some of the columns have different ranges and could therefore cause unnecessary skewness to the predictions. This can be avoided by performing e.g. normalization of the numeric values using the scikit-learn framework.

After exploring and modifying the data set, we can start training a model. MLflow can be used to log metrics and files (e.g., plots of metrics) related to the model performance to an MLflow Experiment, where different model training runs can be compared to see which, one performs better. Once we are happy with a model, it can be registered to the Databricks Model Registry.

from sklearn.linear_model import LinearRegression
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_absolute_error, r2_score
import mlflow

# Train Linear Regression Model
def train_linear_regression_model(df_train_x, df_train_y, fit_intercept=True):
model = LinearRegression(fit_intercept=fit_intercept)
model.fit(X = df_train_x, y = df_train_y)
return model

# Train Random Forest Regressor
def train_random_forest_regressor_model(df_train_x, df_train_y):
model = RandomForestRegressor()
model.fit(X = df_train_x, y = df_train_y)
return model

# Log Model Performance
def log_model_performance(model, X_test, y_test, run_name):
with mlflow.start_run(run_name = run_name):
mlflow.sklearn.log_model(model, "Bike Rentals")
y_pred = model.predict(X_test)

mse = mean_absolute_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

plt.scatter(y_test, y_pred)
plt.xlabel('y_test')
plt.ylabel('y_pred')
plt.title('Bike Rentals')
plt.show()

mlflow.log_metrics({
"r2":r2,
"mse":mse
})

lr_model = train_linear_regression_model(X_train, y_train)
log_model_performance(lr_model, X_test, y_test, "Linear Regression")

rfr_model = train_random_forest_regressor_model(X_train, y_train)
log_model_performance(rfr_model, X_test, y_test, "Random Forest Regressor")

The visualization below describes a machine learning lifecycle and the building blocks that make up the flow between the different components such as Experiments and Model Registry when working with the described setup.

Machine Learning lifecycle

It is important to note the difference between the model and the model code. The model is registered in the model registry and will be downloaded to the API as a standalone component, while the code for the model development is stored in a git repository and referenced in Databricks for further model development.

REST API Development

After preparing the model, we can start developing a REST API to serve a prediction endpoint that will perform predictions of the number of rentals given an input. The Python programming language and FastAPI framework will be used for developing the REST API. FastAPI provides both good performance and ease of use to make the development phase quick.

The model will be downloaded and loaded on startup of the REST API. A prediction endpoint is defined alongside a data model for the input. The data model is simply a wrapper and definition of what features that are expected. The endpoint will accept an input that is converted to a Pandas DataFrame which is accepted as input to the MLflow predict method for scikit-learn models.

# Data model representing features/input
class Record(BaseModel):
season: int
mnth: int
holiday: int
weekday: int
workingday: int
weathersit: int
temp: float
atemp: float
hum: float
windspeed: float


@app.on_event("startup")
async def startup_load_model():
global MODEL
MODEL = mlflow.sklearn.load_model("./model")


@app.post("/predict")
async def predict(data: List[Record]):
input_df = pd.DataFrame(jsonable_encoder(data))

model_output = MODEL.predict(input_df)

response = int(model_output)

return {"prediction": response}

For downloading the model, a Powershell script will be used for running Databricks-CLI, which provides an abstraction layer for the Databricks REST API. The dbfs command can be used for downloading an artifact of the model since models registered in Databricks Model Registry are stored as artifacts in the Databricks File System (DBFS).

$URI = ".../model-versions/get-download-uri"

$parameters = @{
name = $env:MODEL_NAME
version = $env:MODEL_VERSION
}

$response = Invoke-RestMethod -Uri $URI -Body $parameters -Authentication Bearer -Token $TOKEN
dbfs cp -r $response.artifact_uri "./model"

To avoid a situation where code only runs on a specific machine configuration, Docker can be used for packaging applications and all their dependencies as container images. An image act as a template to run a container consistently in any environment. A basic image for this case would fetch Python, install the Python dependencies, copy the model, copy the source code, and lastly start the asynchronous uvicorn web server that FastAPI runs on. An example Docker file for this purpose is shown below.

FROM python:3.9

WORKDIR /code

COPY ./requirements.txt /code/requirements.txt

RUN pip install --no-cache-dir --upgrade -r /code/requirements.txt

COPY ./app /code/app

COPY ./model /code/model

CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "80"]

Deployment Pipeline

The REST API source code will be combined with the model artifact into one container image, which can be deployed to the Azure Container Registry and referenced from the Azure Container Apps. To perform these steps automatically, they will be run in an Azure DevOps pipeline. The pipeline will run towards the repository of the REST API for the model hosting. The pipeline will consist of two stages:

Build:

  • Run a Powershell script that downloads the model from Databricks.
  • Build and push the Docker image with all dependencies to Azure Container Registry.

Update:

  • Communicate to Azure Container Apps that the new image should be used.

Azure DevOps Pipelines support Powershell and Docker tasks, which makes it easy to run the download script, build an image, and upload it to a container registry. Azure Container Apps is still a relatively new service without its own pipeline task, but it is possible to use Azure CLI to perform an update to the service and reference a new version of the image.

stages:
- stage: Build
displayName: Build
jobs:
- job: BuildImage
displayName: Build Image
pool:
vmImage: $(vmImageName)
steps:
- task: Powershell@2
displayName: Download model
inputs:
targetType: filePath
filePath: scripts/downloadModel.ps1
pwsh: true
env:
DATABRICKS_TOKEN: $(DATABRICKS_TOKEN)
DATABRICKS_HOST: $(DATABRICKS_HOST)
MODEL_NAME: $(MODEL_NAME)
MODEL_VERSION: $(MODEL_VERSION)
- task: Docker@2
displayName: Build and push an image to container registry
inputs:
command: buildAndPush
repository: $(imageRepository)
dockerfile: $(dockerfilePath)
containerRegistry: $(registryServiceConnection)
tags: |
$(tag)
- stage: Update
displayName: Update Container Apps
jobs:
- job: Update
displayName: Update Container Apps with New Image
pool:
vmImage: $(vmImageName)
steps:
- task: AzureCLI@2
displayName: 'Update Container Apps'
inputs:
azureSubscription: $(azureServiceConnection)
scriptType: pscore
scriptLocation: inlineScript
inlineScript: |
az config set extension.use_dynamic_install=yes_without_prompt
az containerapp update -n $(CONTAINER_APP) -g $(RESOURCE_GROUP) --image "$(CONTAINER_REGISTRY)/$(imageRepository):$(Build.BuildId)"

Container Orchestration

Azure Container Apps is a new platform by Microsoft that provides a fully managed serverless container service for deploying applications at scale. Container Apps enables executing application code packaged in any container and provides a near plug and play experience for containers. It does not, however, provide direct access to the underlying Kubernetes API.

At the time of writing, Azure Container Apps are not fully supported by the popular Infrastructure as Code tool Terraform. It is possible to use a REST API wrapper for Azure ARM REST API as a Terraform provider, which would enable the usage of Terraform for managing the infrastructure of this project. However, to make the infrastructure deployment simpler for the purpose of this article, Azure CLI comes to the rescue again. The four commands below are used for deploying a resource group, Azure Container App Environment where Azure Container Apps are run, and an Azure Container Registry for storing the images.

az group create --name $RESOURCE_GROUP --location $LOCATION

az containerapp env create --name $CONTAINER_APPS_ENVIRONMENT \
--resource-group $RESOURCE_GROUP --location $LOCATION

az containerapp create --name $CONTAINER_APPS --resource-group $RESOURCE_GROUP \
--environment $CONTAINER_APPS_ENVIRONMENT \
--image mcr.microsoft.com/azuredocs/containerapps-helloworld:latest \
--target-port 80 --ingress 'external' --query properties.configuration.ingress.fqdn

az acr create --resource-group $RESOURCE_GROUP --name $CONTAINER_REGISTRY --sku Basic

Result

With everything in place, it is possible to run a request against the prediction endpoint and receive a response 😊.

--

--