Deploying a ML model to Azure using Aztk and Azure Functions

Samantha Rouphael
Sep 4, 2018 · 7 min read

In this article I will be describing the steps needed to operationalize and deploy a machine learning (ML) model to azure. If you are wondering how to deploy your model or how to build a pipeline for retraining and managing your model on Azure, this post would help you get started.

Assumptions

  • Your code is in Python
  • You have the code that will generate values based on a trained ML model given new data. I will refer to mine as scoring.py
  • You need this code to run programmatically on Azure
  • This tutorial is using aztk version 0.8.1; if you decide to use later version there might be some discrepancies.

Aztk

First let’s talk a bit about the Azure Distributed Data Engineering Toolkit. It’s a python CLI application for provisioning on-demand Spark on Docker clusters in Azure. This is an Open Source CLI that you can find it here.

Note : aztk is still under active development and version 1 hasn’t shipped as of today.

  1. First install aztk on your local machine
    pip install aztk==0.8.1
    or directly from the repo follow these steps
  2. Initialize the project in a directory
    aztk spark init
    This will create a .aztk folder that contains the config files. .aztk/secrets.yaml contains the ids of the resources that you need to create.
    Note: You only need to fill this secrets.yaml if you want to use CLI.
  3. To finish up the setup of aztk you need to create the needed resources in azure. The steps to do so are well explained in this section. (You don’t have to go over the Using Shared Keys section)

At this point you have registered an Azure Active directory Application, created a Storage Account, created a batch account, gave your app access to these resources and took note of the credentials. If you filled the secrets.yaml file you can verify the setup by running:

aztk spark cluster list

expected output:

Cluster | State | VM Size | Nodes
— — — — — — — — —| — — — — — -| — — — — — — — — — — -| — — — -

Aztk will allow you to manage clusters and to run applications on them. You can do this through the command line but to deploy your model and run it programmatically we will write a script that does that for us.

Once you have all the resources setup in Azure, it is time to look at scoring_scheduler.py which is the main module that I used to create a cluster and submit an app to it. More details can be found in the Code section below

Code

  1. Copy the content of scoring_scheduler.py to a local file; this script will be responsible for scheduling the scoring. This file contains the aztk code that creates a cluster and runs code on it. Now I will go over the code section by section. Look for the #UPDATE for sections that you will need to fill.

2. We need to specify the secrets configurations. Here’s a brief description of each one:

Custom Scripts:

Custom scripts are bash scripts that are executed in the docker container in which the spark environment is run. For example, you can use it to assign environment variables or install additional packages on the nodes.

Note: Custom scripts are deprecated and will be removed after 0.8 if you are using 0.8+ you should use plugins instead. You can use it like in this example

You need to create a cluster by specifying the size which represents the numbers of nodes in the cluster, the vm_size which specifies the type of VM you want to run on ( for the list of VMs available in Azure please refer to this link) If you need a GPU machine this is the place to define it. You also need to define the spark configuration, sparkconf for each of the nodes. You can also add the user_config if you need to ssh into the cluster.

Note: It is important that the VM specified in the cluster config earlier has enough memory for your app, otherwise you will run into errors.

Submitting app to cluster

This code submits the app to the cluster and waits for it to be done. If you are outputting logs in your app you will be able to see those in the blob storage you created under the name of your app.

You should delete the cluster after all the applications running on it are completed so that you don’t pay for the cluster without using it.

Run the script locally:

python scoring_scheduler.py

Sometime you’ll need to debug some issue with the cluster. You can run the following command to get the logs needed.

aztk spark cluster debug -- id {cluster-id} --output path/to/output-dir

Azure Functions

Azure Functions allow you to create scheduled or triggered pieces of code implemented in a variety of programming languages. We will create a python function and trigger it based on a timer. I want to run scoring.py every 24hours to predict. For my use case we will create a function that will run one scoring_scheduler.py for scoring (every 24 hours)

  1. Create a Function App: In the Marketplace look for function app
Create function app

Note: If your code needs more than 10 minutes to run, you need to use the App Service Plan otherwise choosing consumption plan is fine. To learn more about the App Service Plan click here

2. Create a Python function (since our code is written in python) Hit on the + sign next to functions and then create your own custom function

Create custom function
  • Enable Experimental Language Support
  • Select Language to be Python
  • Pick the HTTP Trigger (we will change this later to a timer trigger). To change the trigger type go to Integrate, delete the HTTP trigger, add a New Trigger and select Timer. Take a look at the different types of triggers that you could use to run code based on your scenario.

Since my scoring code will run every 24 hours I will specify the schedule to be:

3. Install third party modules Under Platform Features select Advanced Tools (Kudu):

Install third party modules under platform features select Advanced Tools (Kudu) and go to CMD and run the following commands

Note: the execution of each step can take some time and you might need to refresh the page to be able to run the commands

> nuget.exe install -Source https://www.siteextensions.net/api/v2/ -OutputDirectory D:\home\site\tools python352x64> mv /d/home/site/tools/python352x64.3.5.2.6/content/python35/* /d/home/site/tools/> python -m pip install --upgrade pip (to install pip)
note: to use pip you should now run
> python -m pip …
> python -m pip install aztk~=0.8 (or whatever is the minor version you want)

In here you will upload (through drag and drop) the python files that your application will need. For my scenario it will be scoring_scheduler.py, scoring.py, utils.py, datastorage.py, aztk_cli folder that contains jar file and custom-script folder. Note that since I am using version 0.8 in which custom scripts are still supported.

Go back to your function and hit View file on the right side, delete the run.py file and create run.ps1 and write

D:\home\site\tools\python.exe D:\home\site\wwwroot\scoring_scheduler.py

Resources

· Aztk git repo: https://github.com/Azure/aztk. This repo contains documentation, code and you can file issue if you hit any bugs

· Useful aztk command line instructions:

> aztk spark cluster list
To lists all the clusters associated with your account as described in secret.yaml file
> aztk spark cluster delete –id clusterId
To delete the specific clusterId
> aztk spark cluster view –id clusterId
To view the state of the nodes on the specified cluster
> aztk spark cluster debug –id clusterId –output path/to/logs.txt

Thanks to Abhishek Gupta

Samantha Rouphael

Written by

Software Engineer at Microsoft

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade