Deploying Models in Azure (MLOps Approach) using Function Apps (Part I)

Xolani Dastile, PhD
9 min readJul 25, 2023

--

The role of a data scientist is becoming increasingly difficult. A traditional data scientist’s role (merely developing models and assessing their performances) soon will become extinct. It is now expected that data scientists should not only develop models locally on their machines (or in a vacuum per se) but put models into production somewhere in a server or in the cloud. This is cumbersome for data scientists who have not been previously exposed to model deployment, specifically, data scientists who have quantitative degrees (e.g. mathematics and statistics). This cohort of data scientists struggles to secure job opportunities in companies that already practice machine learning operations (MLOps). In simplicity, MLOps (i.e., machine learning lifecycle) entail how a machine learning model is developed, deployed, and monitored. To learn more about MLOps, there is a myriad of articles on the web.

The “quantitative” data scientists are usually overlooked for roles that involve MLOps or even feel discouraged to apply for roles where in job specs you see words like; software development principles, git version control, CI/CD pipelines, containerization, Kubernetes, cloud computing, and etc. Please, as I am politely begging…lol, recruiters and managers, remove such words on job specs, you are making us (me and other quantitative data scientists) shy away from applying for those jobs (I am kidding)…LOL

Usually, those jobs are secured by a clique of data scientists who are referred to as machine learning engineers. These individuals are also known as full-stack data scientists, end-to-end data scientists, and unicorns. They are named unicorns because it is hard to find them, and once found, companies don’t easily lose them. Companies are so clever these days, they employ people who can do multiple jobs and pay them what they pay them…LOL…I guess it is cost-saving on another level.

This article aims to help “quantitative” data scientists or data scientists who have not yet been exposed to model deployment or MLOps. Do not feel despondent, let me hold your hand and take you step by step.

Now enough rambling, let’s roll our sleeves and get our hands dirty. Oooh!!! before we start; The requirement for this article is an attitude to learn and a basic Python knowledge and that’s all. Please note that the article only covers model deployment, not model development and monitoring.

You can download the dataset here.

From the technical requirements, you need to have an account with Azure, and GitHub. Have Docker installed on your machine, and use either VSCode or your preferred IDE.

To follow along, the first thing you need to do is to install poetry on your machine. Poetry is a framework that helps with dependency issues or conflicts, it is way much better than pip, conda, and the famous requirements.txt file. To learn more on how to download and install poetry please see for macOS (installing using Brew) here and for other operating systems here.

$ brew install poetry
$ poetry --version

Poetry (version 1.5.1)

Once poetry is installed and the version displays, create a new directory/folder (using your command-line-interface (CLI)) and name the folder model-deploy using mkdir command and cd to that directory and initialize poetry:

$ mkdir model-deploy
$ cd model-deploy
$ poetry init

Keep pressing enter on your CLI until you reach the part where you are asked about the description, please provide a description (e.g. Deploying a Machine Learning model using Azure Functions.). You keep pressing enter and there will be another requirement for changing the Python version, please change it from Python 3.11 to Python 3.8 by typing ^3.8.

$ Description []: Deploying a Machine Learning model using Azure Functions.
$ Compatible Python versions [^3.11]: ^3.8

Two critical files will be created in your directory (model-deploy), i.e., poetry.lock and pyproject.toml . This is how your folder structure will look like now:

model-deploy
|__ poetry.lock
|__ pyproject.toml

The next thing is to install all dependencies for your project using poetry add _library_name_:

$ poetry add scikit-learn pandas uvicorn fastapi

Once poetry and all the dependencies are installed, please create a repository on GitHub and give the repo a name, for example, MLOPsDemo. Once that is done, go back to your CLI, still inside the directory model-deploy and do the following:

$ git switch -c main
$ git add .
$ git commit -m "first commit"
$ git remote add origin main https://github.com/<username>/MLOPsDemo.git
$ git push -u origin main

Please note that to be able to use git commands, you need to install git on your machine, download and install using this link. Please replace <username> with your GitHub username. Now you will be able to push changes to the main branch from your local repo. Please note that it is not advisable to push changes directly to the main branch. Normally, you would create another branch and name it dev, that is where you would normally push changes. If everything works well on the dev branch, then you can do a pull request (PR) to merge your changes to the main branch. For the sake of this illustration, we will use the main branch. Every time you make changes to your code locally, you would push those changes to GitHub using:

$ git add .
$ git commit -m "your comment here"
$ git push -u origin main

Now that poetry and GiHub repo are sorted, we need to add folders and other files in our repo. The folders we are going to add are .github/workflows, src, and model. The .github/worfkflows has three yaml files which are basically our pipelines, i.e, IaC (Infrastructure as Code) that deploys the required resources in Azure, acr_push (Azure Container Registry) that builds and pushes the docker image to a container registry in Azure, and funcapp.yml (Function App) that deploys a function app to Azure. For now, don’t worry about the contents of these pipelines, we will discuss the contents in Part II. It will be wise to just create the yaml files and leave them blank.

The src folder contains the fast API endpoint code, i.e., app.py. The model folder contains a serialized pickle file of the model. Since it is mentioned that the article does not focus on model development, it is presumed that the model has been trained and saved in your machine as a .pkl file. You need to include the saved .pkl file in the model folder. The other files that need to be included in the root folder (model-deploy) are Dockerfile and docker-entrypoint.sh. A friendly reminder that you only name the docker file as Dockerfile not as dockerfile or docker_file, otherwise things will break. The Dockerfile allows deployment and shipping of containerized applications, resolving the issue of “it worked on my machine, why is it not working on yours?”. Read more about docker here and also here. The docker-entrypoint.sh is a bash script that starts the uvicorn for our fast API endpoint.

This is how the structure of our project should look like:

model-deploy
|__ .github/workflows
|__ IaC.yml
|__ acr_push.yml
|__ funcapp.yml
|__ src
|__ app.py
|__ model
|__ model.pkl
|__ poetry.lock
|__ pyproject.toml
|__ Dockerfile
|__ docker-entrypoint.sh

Please don’t forget to push the changes to your GitHub repo.

Just as a side note, if you are working with large models, sending the model to the GitHub repo won’t work due to GitHub file size limit of 100MB. To resolve this issue, install dvc (data version control), this tool helps with data versioning and also with model versioning. To install and read more about dvc, please see here. An interesting article about hands-on dvc can be found here.

$ pip install dvc
$ pip install dvc-azure or pip install dvc-s3 or pip install dvc-gdrive

Let’s have a quick view on what is inside our app.y , the Dockerfile and the docker-entrypoint.sh.

The app.py file:



# import the necessary packages
from fastapi import FastAPI
from typing import List, Dict, Any
from pydantic import BaseModel
import pickle
import numpy as np
import pandas as pd
import os
import json
import requests
import io
import logging

#CORS
from fastapi.middleware.cors import CORSMiddleware



# create a fastapi instance
app = FastAPI()

app.add_middleware(
CORSMiddleware,
allow_origins=["*"],
allow_methods=["*"],
allow_headers=["*"])



# create a class for the input data
class InputData(BaseModel):
data: List[Dict[str, Any]]

# Usage:
sample_data = [{
"Account_Balance": 1,
"Duration_of_Credit_monthly": 18,
"Payment_Status_of_Previous_Credit": 4,
"Purpose": 2,
"Credit_Amount": 1049,
"Value_Savings_Stocks": 1,
"Length_of_current_employment": 2,
"Instalment_per_cent": 4,
"Sex_Marital_Status": 2,
"Guarantors": 1,
"Duration_in_Current_address": 4,
"Most_valuable_available_asset": 2,
"Age_years": 21,
"Concurrent_Credits": 3,
"Type_of_apartment": 1,
"No_of_Credits_at_this_Bank": 1,
"Occupation": 3,
"No_of_dependents": 1,
"Telephone": 1,
"Foreign_Worker": 1
}]


# create a class for the output data
class OutputData(BaseModel):
prediction: List[List[float]]

# Usage:
sample_prediction = [[0.7, 0.3]]


# create a class for the model
class Model:
def __init__(self):

self.model = pickle.load(open('model/model.pkl', 'rb'))

def predict(self, input_data):
# Convert the input data to a DataFrame
input_data = pd.DataFrame(input_data)

output = self.model.predict_proba(input_data)

# Return the output
return output

# create an instance of the model
model = Model()

# Define the root endpoint
@app.get('/')
async def root():
# Return a welcome message
return 'Welcome to the API'

# Define the predict endpoint
@app.post('/predict', response_model=OutputData)
async def predict(data: InputData):
# Get the input data
input_data = data.data

# Predict the output
output = model.predict(input_data)

# Return the output
return {'prediction': output}

The Dockerfile :

FROM python:3.8-slim as builder

ENV POETRY_VERSION=1.5.1

# Install dependencies required for installing packages
RUN export DEBIAN_FRONTEND=noninteractive \
&& apt-get -qq update \
&& apt-get -qq install -y curl build-essential cmake libboost-all-dev \
&& rm -rf /var/lib/apt/lists/*

# Create a virtualenv and install poetry
RUN python -m venv /venv \
&& /venv/bin/pip install -U pip setuptools \
&& /venv/bin/pip install poetry==${POETRY_VERSION}

ENV PATH="${PATH}:/venv/bin"

WORKDIR /app

COPY pyproject.toml ./pyproject.toml
COPY poetry.lock ./poetry.lock
RUN . /venv/bin/activate && poetry install

FROM python:3.8-slim as app

WORKDIR /app

COPY --from=builder /venv /venv
COPY ./src ./src
COPY ./model ./model
COPY docker-entrypoint.sh docker-entrypoint.sh

# Turns off buffering for easier container logging
ENV PYTHONUNBUFFERED 1

RUN chmod +x docker-entrypoint.sh

CMD ["./docker-entrypoint.sh"]

The docker-entrypoint.sh file:

#!/bin/bash
set -e
. /venv/bin/activate
uvicorn src.app:app --host 0.0.0.0 --port 5000

Let’s now test and see what our model endpoint looks like when we build the docker image and run it locally. It is a good practice to first run and test the endpoint locally. To build the image, go to the CLI and ensure that you are cd to the model-deploy directory. Also, ensure that docker has installed successfuly and it is opened, you do not have to sign into docker, as long as docker is up. On the CLI, first build the docker image by running:

$ docker build --rm -t modelimage .

This will take a couple of minutes if you run it for the first time. This builds an image called modelimage and the intermediate containers are removed (using — — rm). The -t means tagging and in this case the tag name is latest. So in essence the image will be modelimage:latest where latest denotes the tag of the image. The dot (.) at the end means that the Dockerfile is at the root directory and in our case the root directory is model-deploy. Once the image is built successfully, you should see the following:

Docker build output

The next thing is to run the image. To do so, you go to the CLI and do the following:

$ docker run -p 5000:5000 modelimage

The above command uses port 5000. The results are as follows:

Now the container is running, to interact with the model, you need to type on your browser, localhost:5000 and you will see the following output:

Now the api is working as we see the output of our GET method “Welcome to the API”. Next we want to use our POST method and obtain results from our model. To do that, please type on your browser as follows localhost:5000/docs. The output you will see is the following:

API Swagger UI

Click the downward arrow on POST and click Try it out. Thereafter, you provide the input data in a json format:

Provide input data for the endpoint

Now click Execute to see how our model predicts.

Model response with status 200

Voila!!! Our endpoint returns status 200 meaning the endpoint works OK. In the next article, Part II, we will show you how to deploy our model on an Azure function.

Deploy with me!!! Please lets hook up on LinkedIn, here is my profile.

--

--

Xolani Dastile, PhD

Holds a PhD in Computer Science (Interests: Machine Learning, MLOps, Responsible AI)