First Steps with MLflow

Wagner Massayuki Nakasuga
Semantix
Published in
4 min readAug 16, 2021

Run it to organize your experiments

Photo by Lindsay Henwood on Unsplash

As a data scientist, most of the time is spent experimenting with machine learning (ML) or deep learning (DP) models, for example testing parameters, comparing metrics, saving some visualizations, etc. It can be done manually by transferring the results of experiments to a spreadsheet and organizing the saved models in folders. However, it is easy to mess up everything you’ve done. One way to figure out the situation above is managing ML or DP experiments with MLflow.

So, what is Mlflow?

In a few words, Mlflow is an open-source platform for the ML lifecycle process that can auto-log your model parameters and metrics with few lines of code. For more information you can read the official site.

In this post, I will give you a quick tip about how to track your experiments with MLflow using a basic example. So, let’s get started!

Some necessary packages are shown below:

If you do not have MLflow installed, just do it:

pip install mlflow

The dataset chosen for this post was the famous wine quality dataset. Wine is good, however, the real reason we will use this one is because this is a good dataset for a simple example as in our case. We are not looking for quality results at this moment, we are just learning how to track our experiments. Thus, the next step is loading the data and splitting it between training and validation.

Ok, MLflow starts from here!

Please, follow the code below:

We are looping the “parameters” just to avoid running this code several times. It means that our code will run 5 times, and for each run the C parameters will change following the “parameters” sequence.

The code between “mlflow.start_run()” to “mlflow.log_param()” looks familiar to you. We are just instantiating a model, fitting the training data, getting the predicted results with the trained model and then getting the predicted metric with the validation dataset.

Ok, so far so good. Now, it’s time to see what the MLflow can help you with. The following 3 lines are loggings. The first one is the model parameters, the second is the metric from the validation dataset, and the third is the model itself. After you run the code below, a directory called “mlruns” will appear and every log will be saved there. MLflow provides a handy web UI and this is the best part. To access the UI, type in your terminal “mlflow server” and then use this address in your browser: “http://127.0.0.1:5000”. The screen should be like this:

The table on the screen shows the information of each performed run. As you can see in the Parameters column, it is possible to check each parameter value with the model metric in the last column that we run. Thus, we have a summary of each training.

Next, click in one of the hyperlinks under “Start Time” and you can see more detailed the model parameters and validation metrics like above:

This post is an intro for anyone that wants to start with MLflow to manage their experiments in an organized and safe way. The next step is to work with a team, but this is for another post.

I hope you enjoy it!

Reference:

https://mlflow.org/docs/latest/index.html

--

--