Best practices for running and logging ML experiments
A step-by-step example using real data and MLflow to train, track, and compare machine learning models
You can compare machine learning models with each other by using different plots, based on the metrics you choose. But what if there are easier ways of doing comparisons? Everything has been automated these days, so why not automate the process of comparing machine learning models? Whether you are a data scientist or casually compare machine learning models to predict the human lifespan based on daily steps, this example might be too specific, but why not?
In addition, we will use a real-life dataset used during the Data Science position at Haensel AMS. Let’s get started!
A real-world project: predicting price
For this article, we will use the predicted price assignment. Here we have to predict prices, but things won’t be easy since it is a real-life dataset, let’s load the dataset and see the head first.
Data exploration
import pandas as pd
df = pd.read_csv("sample.csv")
df.head()
Let’s see.
Great, but how many columns does this dataset have? Or what about the length of these columns and data types? Let’s see.
df.info()
Here is the output.
Cleaning the mess: preparing data for modeling
At this step, we will make our dataset ready for machine learning. This step is a “must”, and without it, you can not apply a machine learning model. If you don’t want to see the process, skip to the end, where you will see the entire code.
One-hot encoding
Good, let’s focus on the object column first, because as you know, in machine learning, all columns should be either float or integer.
Let’s see the “dow” column.
df["dow"].value_counts()
Here is the output.
On weekdays, good, I know a method to turn a categorical column into numerical. It is really simple. Just give numbers for each day.
days_of_week = {
'Mon': 1,
'Tue': 2,
'Wed': 3,
'Thu': 4,
'Fri': 5,
'Sat': 6,
'Sun': 7
}
df["dow"] = df["dow"].map(days_of_week)
Alternatively, you can do one-hot encoding too, check this code.
dow_dummies = pd.get_dummies(df["dow"])
df2 = df.drop(columns="dow")
df_encoded = df2.join(dow_dummies)
Good, but what about “loc1” and “loc2” columns? Let’s check the values for each of them.
df["loc1"].value_counts()
Here is the output.
“S” and “T”? These should be removed. As you can see, all other values are numeric, not these two. But let’s check “loc2” too.
df["loc2"].value_counts()
Here is the output.
Let’s solve this issue at once with this code; We have to dropna after too.
df["loc1"] = pd.to_numeric(df["loc1"], errors='coerce')
df["loc2"] = pd.to_numeric(df["loc2"], errors='coerce')
df.dropna(inplace=True)
Outlier detection
Good, what about the outliers? That will ruin our model’s performance for sure. To check that, let’s use the describe method.
df.describe()
Here is the output.
We definitely have outliers in para1. Let’s filter it under < 10.
df_encoded = df_encoded[df_encoded["para1"] < 10]
Checking correlations
Now, let’s check the correlations.
from pandas.plotting import scatter_matrix
# Suppress the output of the scatter_matrix function
_ = scatter_matrix(df_encoded.iloc[:,0:7], figsize=(12, 8))
Here is the output.
loc2 and loc1 have a high correlation, which can cause us trouble, so we should drop one of them. Let’s check the correlation by price to see which one to drop.
pd.DataFrame(abs(df_encoded.corr()["price"])).sort_values(by = "price", ascending = False)[1::][0:6]
Here is the output.
Good, we can drop “loc2”. Here is the entire code so far.
import pandas as pd
df = pd.read_csv("sample.csv")
df = df[(df["loc1"].str.contains("S") == False) & (df["loc1"].str.contains("T") == False)]
df["loc1"] = pd.to_numeric(df["loc1"], errors='coerce')
df["loc2"] = pd.to_numeric(df["loc2"], errors='coerce')
df.dropna(inplace=True)
dow_dummies = pd.get_dummies(df["dow"])
df2 = df.drop(columns="dow")
df_encoded = df2.join(dow_dummies)
#alternative
days_of_week = {'Mon': 1, 'Tue': 2, 'Wed': 3, 'Thu': 4, 'Fri': 5, 'Sat': 6, 'Sun': 7}
df["dow"] = df["dow"].map(days_of_week)
df_encoded = df_encoded[df_encoded["para1"] < 10]
Logging with MLFlow
Now we are at the fun part. We will use mlflow, sklearn and xgboost. If you haven’t installed either one of them in your working environment, make sure to install it first and load the following libraries after.
Loading libraries
import mlflow
import mlflow.sklearn
from sklearn.linear_model import LinearRegression
from sklearn.ensemble import RandomForestRegressor
from sklearn.neural_network import MLPRegressor
from xgboost import XGBRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score
Selecting features and target variables
Great, now let’s implement a model in MLflow. At first, let’s define features and target variables and make all features float, because MLflow works best with floats.
X = df_encoded.drop(columns=["price"])
y = df_encoded["price"]
X = X.astype("float64")
Training-test sets
Let’s split the dataset into training and testing, and define the different models inside the models dictionary.
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
models = {
"RandomForest": RandomForestRegressor(n_estimators=100, max_depth=10, random_state=42),
"XGBoost": XGBRegressor(n_estimators=100, max_depth=5, random_state=42, verbosity=0),
"LinearRegression": LinearRegression(),
"MLPRegressor": MLPRegressor(hidden_layer_sizes=(64, 32), max_iter=500, random_state=42)
}
Fit the model
Now let’s set up an experiment, predict price, fit the train and test data for each model, and calculate the evaluation metrics.
mlflow.set_experiment("predicting_price")
for model_name, model in models.items():
with mlflow.start_run(run_name=model_name):
model.fit(X_train, y_train)
preds = model.predict(X_test)
mse = mean_squared_error(y_test, preds)
r2 = r2_score(y_test, preds)
mlflow.log_param("model_type", model_name)
mlflow.log_metric("mse", mse)
mlflow.log_metric("r2_score", r2)
input_example = X_test.iloc[:1].astype("float64")
mlflow.sklearn.log_model(model, "model", input_example=input_example)
print(f"{model_name} -> MSE: {mse:.2f}, R²: {r2:.2f}")
Great, now it is the testing time, where we will see the MLflow interface.
Testing the models
To do that, run this code if you are in a Jupyter notebook.
!mlflow ui
Here will be the output like this:
Great, let’s visit 127.0.0.1:5000 to see what it looks like!
Wow! It has a pretty user-friendly interface, which I am not familiar with, especially the products in Machine Learning tend to be messy, because the tech guys don’t pay much attention to the aesthetics too much.
Good, let’s select the models you want to compare. As you can see, I applied many models to write this article thoroughly, so I will select the latest four. Now, let me click on compare and see the results.
Compare the models
The part of the screen that pops up after clicking on the compare button. You can customize the plot type and the metrics here.
Parallel coordinates plot
Here is the full screen.
Box plot
Now let’s select the box plot, and of course, you can select the x and y axes.
Digging into models
When you click on some of the models, you’ll see the following screen:
Here you can see important things like duration and metrics.
In the previous screen, where you see all models together, you can sort the models based on the metrics you choose, like this:
Final thoughts
Suppose you’re into building models where even seconds or 0.1 changes in thousands matter, this tool is definitely for you. And for anyone who is really interested in machine learning, using this tool will give you a heads-up in your job interviews for sure!