Compare 29 Different ML Classifiers with a single line of code — Lazypredict

Sanskar Hasija
3 min readAug 20, 2021

--

A quick approach to compare different classification metrics from 29 different ML classifiers

Introduction

In this blog, we will evaluate classification metrics of 29 different ML data classifiers with 1 line of code. We will use the Lazypredict python library for this task and later visualize our results.

Importing Libraries

We will first install the Lazypredict library in our environment. This can be done with the help of pip package installer.

pip install lazypredict 

Next, we will import the necessary libraries for data processing and visualization.

import lazypredict 
from lazypredict import Supervised
from lazypredict.Supervised import LazyClassifier
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
plt.style.use('fivethirtyeight')
plt.rcParams["figure.figsize"] = (10,5)

Dataset

We will use the breast cancer classification dataset, which comes with sklearn.datasets library. The dataset has 2 classification labels 1 and 0 for malignant, benign cancer respectively. We will then split our data into test and training sets.

from sklearn.datasets import load_breast_cancer
data = load_breast_cancer()
X = data.data
y = data.target
X_train, X_valid, y_train, y_valid = train_test_split(X, y,test_size=.5,random_state =12)

Listing all available Classifiers

Now, we will list out all the available classifiers that are available in the lazypredict library.

for i in range(29):
print(i+1, lazypredict.Supervised.CLASSIFIERS[i][0])

We get the following 29 classifiers:

List of available classifiers

Training and Fitting the models

Now, we will create the LazyClassifier model which we imported earlier and fit our training and validation data into the model

clf = LazyClassifier(verbose=0,
ignore_warnings=True,
custom_metric=None,
random_state=12,
classifiers='all',
)
models,predictions = clf.fit(X_train, X_valid, y_train, y_valid)

The training finished in 1.78s and all the metrics were stored in the models Dataframe. The dataframe contains the name of the model, various classification metrics such as Accuracy, Balanced Accuracy, ROC AUC score, F1 score and the time taken for training.

Top 5 Classifiers

We will now see the top 5 classifiers and all the classification metrics. We will call the models.head(5) for the first 5 rows of the dataframe

Top 5 Classifiers

Visualizing Results

Now we will plot a graph for visualizing the Accuracy and ROC AUC score for all the models.

idx = [i for i in range(1,28)]
plt.plot(idx, models["Accuracy"] ,marker='o' , label = "Accuracy" )
plt.plot(idx , models["ROC AUC"] , marker ='o' , label = "ROC AUC")
plt.annotate(models.index[1] ,
(1,models["Accuracy"][1]) ,
xytext =(0.3, 00.7),
arrowprops = dict(
arrowstyle = "simple"
))
plt.annotate(models.index[26] ,
(27 , models["Accuracy"][26]) ,
xytext =(15, models["Accuracy"][26]),
arrowprops = dict(
arrowstyle = "simple"
))
plt.xlabel("Models")
plt.ylabel("Metrics")
plt.title("Comparison of 29 Different Classifiers")
plt.legend()
plt.show()

We get the following graph :

Accuracy / ROC AUC score vs Classifiers

CONCLUSION

We have successfully evaluated 29 different classifiers all together with a single line of code. The results can be further improved by hyperparameter tuning of selected models.

I hope you enjoyed it.

The code and the trained model for this blog can be accessed here — https://github.com/sanskar-hasija/lazypredict/blob/main/Classification/LazyPredict%20Classification.ipynb

--

--