How to Setup MLflow (On Azure)

Samuel Dylan Trendler King
The Startup
Published in
9 min readMay 12, 2020
MLflow

If you’ve ever recorded the results of your Machine Learning experiments in a spreadsheet, MLflow might just be for you!

What is MLflow?

MLflow is a tracking tool to organize and record your Machine Learning (ML) experiments. It is very flexible so you can record anything you want really, but the main idea is to record the parameters of your ML pipeline/model during a run and the metrics/results achieved by that run.

It also has a really nice little Frontend to display results which can be run as a service and looks like this:

MLflow GUI

Key Concepts

MLflow has two core concepts:

  1. Tracking Server: where the MLflow service runs and where parameter/metrics results are saved
  2. Artifact store: where larger objects you want to associate with a run e.g. data/code can be stored

For this setup we will run the MLflow tracking server on an Azure VM (Linux server) and for the Artifact store we will use an Azure Blob (but it is easy to configure both of these for other storage options such as a postgres DB etc).

Setup Overview

  1. Setup an Azure VM (for the Tracking Server)
  2. Setup an Azure Blob (for the Artifact Store)
  3. Setup MLflow Service (on the VM)
  4. Connect your Jupyter Notebook to the MLflow Service
  5. Run an MLflow Experiment

1.Setup an Azure VM

Log in to your Azure Portal and create an Azure VM:

Search and select ‘Virtual machines’
Select ‘+ Add’ to create a new VM

You can get your public SSH key from your local machine with this command:

## On Local terminal ### to create an ssh key if you don't have one
ssh-keygen -o
# to return your public ssh key
cat ~/.ssh/id_rsa.pub
Follow the Wizard and ‘Create’ your VM

The VM can take a few minutes to setup and should be viewable in the ‘Virtual Machines’ dashboard of your Azure portal. For details on setting up a VM see this walkthrough.

Once your VM is setup I would also suggest restricting the connectivity for added security. To do this select your VM, and select ‘Networking’ and restrict the SSH connection on Port 22 to only allow incoming from your IP address:

Select Networking for your VM
Select your inbound port rule for SSH and change the Source to ‘IP Addresses’ and put your local IP address.

This will mean only your IP can connect to the VM and only on the machine with the corresponding SSH keys, which seems decently secure.

You can then SSH directly into the the machine using the username you entered during setup and the public IP address from the top of the ‘Overview’ page of the VM:

## On Local terminal ##ssh <username>@<vm_public_ip_address>

Once you are into the VM terminal you will want to install python 3.X and setup a virtual environment:

## On VM terminal ### install python
sudo apt-get update
sudo apt-get install python3.6
# create environments folder
mkdir environments_folder
cd environments_folder
# create virtual environment
pip3 install virtualenv
python3 -m venv my_env
source ~/environments_folder/my_env/bin/activate

For further detail on installing python on linux see this walkthrough and for virtual environments see this one.

Finally you will need to install mlflow on this environment:

## On VM terminal (my_env) ##pip3 install mlflow

2.Setup an Azure Blob

Now we have a VM for our MLflow service to run on, we want to setup an Azure Blob (Object Store) to save our MLflow artifacts.

To do this we again log in to the Azure portal and now search for ‘Storage accounts’:

Select Storage Accounts
Press ‘+ Add’ to create a new storage account
Entering the relevant config details as you go
Once the Account is setup select ‘Containers’ from the Overview page
And select ‘+ Container’ to create a new blob container — giving it a name when prompted

Once the blob is created you may also want to create a ‘Directory’ inside this blob for your MLflow artifacts. To do this simply select the blob and create a new directory (e.g. ‘mlflowartifacts’):

Select ‘Add Directory’ and create a new folder in your blob

While you are here you will also want to create an ‘Access Key’ which we’ll need later. To do this go back to the the main page for your Storage Account and select ‘Access Keys’, generate a key, and copy it for later:

Generating an ‘Access Key’ for your storage account

And that is it. For more detail on setting up Azure Blob storage see this guide.

3.Setup MLflow Service

We are nearly ready to run our MLflow server, but first we will need to link our VM to our storage account. This can be done by installing the Azure CLI and logging in and then exporting the Storage Account Access Key and Connection String as environmental variables:

## On VM terminal (my_env) ### login via Azure CLI
sudo apt-get update
curl -sL https://aka.ms/InstallAzureCLIDeb | sudo bash
az login
# export storage account env vars
export AZURE_STORAGE_ACCESS_KEY="<blob_access_key>"
export AZURE_STORAGE_CONNECTION_STRING="<blob_connection_string>"

Back on the VM you can use this command to kickoff an MLflow service running on port 5000:

## On VM terminal (my_env) ### create project folder
mkdir project_folder
cd project_folder
# create mlflow tracker folder
mkdir mlflow_tracker
# launch mlflow service
mlflow server --backend-store-uri <directory_on_vm> --default-artifact-root wasbs://<blob_container_name>@<storage_account>.blob.core.windows.net/<directory_on_blob> --host 0.0.0.0 --port 5000 &

It is worth noting here the <directory_on_vm> should be an full path to a folder on your VM (e.g. /home/user/project_folder/mlflow_tracker) and the path to your blob storage needs to start with ‘wasbs://’ (e.g. wasbs://mybolbname@mystorageaccount.blob.core.windows.net/mlflowartifacts’.

To access the service from your local machine (via the VM) you can ssh tunnel your local port 5000 to the VM’s port:

## On Local terminal ##ssh -N -L 5000:localhost:5000 <username>@<vm_public_ip_address>

If you have any issues you may want to check the Azure Blob credentials are indeed stored on your VM (and available on your terminal session) using these commands:

## On VM terminal (my_env) ##echo $AZURE_STORAGE_ACCESS_KEY
echo $AZURE_STORAGE_CONNECTION_STRING

If these come back empty, go back to the start of this step and make sure you connected your VM with the Storage Account correctly.

4.Connect MLflow to your notebook

First you will need to install jupyter notebook, connect it to your virtual environment and launch it on your VM, so do that if you haven’t already:

## On VM terminal (my_env) ### install jupyter notebooks 
pip3 install jupyter
pip3 install ipykernel
# add your virtual environment
ipython kernel install --user --name=my_env
# install a few other libraries we will need
pip install sklearn
pip install azure-common
pip install azure.storage
pip install azure-storage-blob
# launch a notebook session on port 1212
jupyter notebook --no-browser --port=1212

In the same way as the MLflow GUI, you can connect to this notebook service on your local machine via ssh tunnel if you want:

## On Local terminal ##ssh -N -L 1212:localhost:1212 <username>@<vm_public_ip_address>

Once you have jupyter up and running with your python virtual environment you can open a new notebook. You can create a new notebook by selecting ‘New’ in the right corner and selecting your virtual environment:

Select ‘New’ and choose your virtual environment

Inside your notebook in the first code block, import mlflow and choose an Experiment Name to connect it to the MLflow service like this:

## In Notebook ## 
import mlflow
# set to your server URI
remote_server_uri = "http://0.0.0.0:5000"
mlflow.set_tracking_uri(remote_server_uri)
# set MLflow experiment
exp_name = "<MY_EXPERIMENT_NAME>"
mlflow.set_experiment(exp_name)

An experiment can be thought of as the ‘folder’ for a collection of runs which can be compared to each other.

5.Run an MLflow Experiment

Now your notebook is connected to MLflow you can simply use the with statement mlflow.start_run with a Run Name to begin an experiment run:

## In Notebook ## 
import mlflow
run_name = '<YOUR_RUN_NAME>'
with mlflow.start_run(run_name=run_name):
< SOME CODE >

A Run can be thought of as a single execution of your ML pipeline to get one set of results. You will have many runs per experiment.

Within your statement you can then define different values for MLflow to track while this code runs. For example to record a parameter (model/pipeline inputs):

## In Notebook ## 
import mlflow
with mlflow.start_run(run_name='myrun'):
n_estimators = 200
mlflow.log_param(key="n_estimators", value=n_estimators)

You can also metrics (results) like this:

# Function just to make a bit of mock data 
def make_data(data_size=100):
X0_var1 = [np.random.randint(0,5) for x in range(0,data_size)]
X0_var2 = [np.random.randint(0,100) for x in range(0,data_size)]
y0 = [0 for x in range(0,data_size)]
df0 = pd.DataFrame({'var1':X0_var1,'var2':X0_var2,'y':y0})
X1_var1 = [np.random.randint(4,10) for x in range(0,data_size)]
X1_var2 = [np.random.randint(80,200) for x in range(0,data_size)]
y1 = [1 for x in range(0,data_size)]
df1 = pd.DataFrame({'var1':X1_var1,'var2':X1_var2,'y':y1})
df = df0.append(df1)
df = df.sample(frac=1)
df = df.reset_index(drop=True)
return df
## In Notebook ##
import mlflow
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report
# function to make mock data defined above
df = make_data(100)
X_train, X_test, y_train, y_test = train_test_split(df[['var1','var2']], df[['y']], test_size=0.33, random_state=42)
with mlflow.start_run(run_name='myrun2'):
model = RandomForestClassifier()
model.fit(X_train,y_train)
y_predict = model.predict(X_test)
results_table = pd.DataFrame(classification_report(y_test, y_predict,output_dict=True))
weighted_avg_f1 = results_table['weighted avg'].loc['f1-score']
mlflow.log_metrics({"weighted_avg_f1": weighted_avg_f1})

You can even record Artifacts (bigger objects and files) like in this case where we save the y_predict results data to csv and save it to the Artifact Store:

## In Notebook ## 
import mlflow
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report
# function to make mock data defined above
df = make_data(100)
X_train, X_test, y_train, y_test = train_test_split(df[['var1','var2']], df[['y']], test_size=0.33, random_state=42)
run_name = 'myrun3'
with mlflow.start_run(run_name=run_name):
model = RandomForestClassifier()
model.fit(X_train,y_train)
y_predict = model.predict(X_test)
prediction_data_file = f'{run_name}_predictions.csv'
pd.DataFrame(y_predict).to_csv(prediction_data_file)
mlflow.log_artifact(prediction_data_file)

Finally we can even save fitted sklearn models using the MLflows sklearn pluggins (with many pluggins available for other popular ML model frameworks such as TensorFLow and H2O):

## In Notebook ## 
import mlflow
import mlflow.sklearn
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report
# function to make mock data defined above
df = make_data(100)
X_train, X_test, y_train, y_test = train_test_split(df[['var1','var2']], df[['y']], test_size=0.33, random_state=42)
run_name = 'myrun4'
with mlflow.start_run(run_name=run_name):
model = RandomForestClassifier()
model.fit(X_train,y_train)
y_predict = model.predict(X_test)
mlflow.sklearn.log_model(model, f"{run_name}_model")

Of course these can all be used together as we and MLflow offers buckets of other functionality but this should be a good intro to the basics, so all together now:

## In Notebook ## 
import mlflow
import mlflow.sklearn
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report
# function to make mock data defined above
df = make_data(100)
X_train, X_test, y_train, y_test = train_test_split(df[['var1','var2']], df[['y']], test_size=0.33, random_state=42)
run_name = 'myrun5'
with mlflow.start_run(run_name=run_name):
n_estimators = 200
model = RandomForestClassifier(n_estimators=n_estimators)
mlflow.log_param(key="n_estimators", value=n_estimators)
model.fit(X_train,y_train)
mlflow.sklearn.log_model(model, f"{run_name}_model")
y_predict = model.predict(X_test)
pd.DataFrame(y_predict).to_csv(prediction_data_file)
mlflow.log_artifact(prediction_data_file)
results_table = pd.DataFrame(classification_report(y_test,
y_predict,output_dict=True))
weighted_avg_f1 = results_table['weighted avg'].loc['f1-score']
mlflow.log_metrics({"weighted_avg_f1": weighted_avg_f1})

Exploring Results

The cool thing about MLflow is the GUI also makes it extremely easy to explore and compare runs.

You can also click on individual runs (highlighted in blue) to see further details per run:

In fact MLflow has loads of helpful functionality — subsetting runs with an SQL-like query bar, downloading results to CSV, plotting the results of runs to compare them, and even a system for deploying model binaries.

Conclusions

MLflow is a great tool for machine learning practitioners and should be part of every data scientists stack. Here is the full documentation if you want to look further: https://www.mlflow.org/docs/latest/index.html

--

--