Compare 29 Different ML Classifiers with a single line of code — Lazypredict
A quick approach to compare different classification metrics from 29 different ML classifiers
Introduction
In this blog, we will evaluate classification metrics of 29 different ML data classifiers with 1 line of code. We will use the Lazypredict python library for this task and later visualize our results.
Importing Libraries
We will first install the Lazypredict library in our environment. This can be done with the help of pip package installer.
pip install lazypredict
Next, we will import the necessary libraries for data processing and visualization.
import lazypredict
from lazypredict import Supervised
from lazypredict.Supervised import LazyClassifier
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_splitplt.style.use('fivethirtyeight')
plt.rcParams["figure.figsize"] = (10,5)
Dataset
We will use the breast cancer classification dataset, which comes with sklearn.datasets library. The dataset has 2 classification labels 1 and 0 for malignant, benign cancer respectively. We will then split our data into test and training sets.
from sklearn.datasets import load_breast_cancer
data = load_breast_cancer()X = data.data
y = data.targetX_train, X_valid, y_train, y_valid = train_test_split(X, y,test_size=.5,random_state =12)
Listing all available Classifiers
Now, we will list out all the available classifiers that are available in the lazypredict library.
for i in range(29):
print(i+1, lazypredict.Supervised.CLASSIFIERS[i][0])
We get the following 29 classifiers:
Training and Fitting the models
Now, we will create the LazyClassifier model which we imported earlier and fit our training and validation data into the model
clf = LazyClassifier(verbose=0,
ignore_warnings=True,
custom_metric=None,
random_state=12,
classifiers='all',
)
models,predictions = clf.fit(X_train, X_valid, y_train, y_valid)
The training finished in 1.78s and all the metrics were stored in the models Dataframe. The dataframe contains the name of the model, various classification metrics such as Accuracy, Balanced Accuracy, ROC AUC score, F1 score and the time taken for training.
Top 5 Classifiers
We will now see the top 5 classifiers and all the classification metrics. We will call the models.head(5) for the first 5 rows of the dataframe
Visualizing Results
Now we will plot a graph for visualizing the Accuracy and ROC AUC score for all the models.
idx = [i for i in range(1,28)]
plt.plot(idx, models["Accuracy"] ,marker='o' , label = "Accuracy" )
plt.plot(idx , models["ROC AUC"] , marker ='o' , label = "ROC AUC")plt.annotate(models.index[1] ,
(1,models["Accuracy"][1]) ,
xytext =(0.3, 00.7),
arrowprops = dict(
arrowstyle = "simple"
))
plt.annotate(models.index[26] ,
(27 , models["Accuracy"][26]) ,
xytext =(15, models["Accuracy"][26]),
arrowprops = dict(
arrowstyle = "simple"
))
plt.xlabel("Models")
plt.ylabel("Metrics")
plt.title("Comparison of 29 Different Classifiers")
plt.legend()
plt.show()
We get the following graph :
CONCLUSION
We have successfully evaluated 29 different classifiers all together with a single line of code. The results can be further improved by hyperparameter tuning of selected models.
I hope you enjoyed it.
The code and the trained model for this blog can be accessed here — https://github.com/sanskar-hasija/lazypredict/blob/main/Classification/LazyPredict%20Classification.ipynb