Lazy Predict for ML Models

Prajwal Mani

Published in

The Startup

5 min readAug 7, 2020

Photo by Kate Stone Matheson on Unsplash

Hey, hope you are having a wonderful day!

Whenever I work on a new ML project. These lines always pop up in my mind every time

“I need to fit the data for every model then apply metrics to check which model has better accuracy for the available dataset ,then choose best model and also this process is time-consuming and even it might not be that much effective too“

For this problem, I got a simple solution when surfing through python org, which is a small python library by name “lazypredict” and it does wonders

Let me tell you how it works:-

Install the library

pip install lazypredict

Note

lazypredict only works for python version≥3.6
It's built on top of various other libraries so if you don't have those libraries in the system, python will throw ModuleError so interpret the error properly and install the required libraries.

lazypredict comes only for supervised learning (Classification and Regression)

I will be using jupyter notebook in this article

Code

# import necessary modules
import warnings
warnings.filterwarnings('ignore')
import time 
from sklearn.datasets import load_iris,fetch_california_housing
from sklearn.model_selection import train_test_split
from lazypredict.Supervised import LazyClassifier,LazyRegressor

warnings: Package to handle warnings and ‘ignore’ is used when we need to filter out all the warnings
time: Package to handle time manipulation
sklearn.datasets: Package to load datasets, today we gonna use the classic datasets which everyone works on it that are load_iris() for classification problem and fetch_california_housing() for a regression problem
sklearn.model_selection.train_test_split:Used to split the dataset into train and split
lazypredict:this is the package we gonna learn today in lazypredict.Supervised there are two main functions LazyClassifier for Classification and LazyRegressor for Regression

LazyClassifier

# load the iris dataset
data=load_iris()
X=data.data
Y=data.target

The data is a variable with dictionary data type where there are two keys the data which contains independent features/column values and target which contains dependent feature value
X has all the independent features values
Y has all the dependent features values

# split the dataset 
X_train, X_test, Y_train, Y_test =train_test_split(X,Y,test_size=.3,random_state =23)
classi=LazyClassifier(verbose=0,predictions=True)

We will split the data into train and test using train_test_split()
The test size will be 0.3(30%) of the dataset
random_state will decide the splitting of data into train and test indices just choose any number you like!

Tip 1:If you want to see source code behind any function or object in the jupyter notebook then just add ? or ?? after the object or the function you want to check out and excute it

Next, we will call LazyClassifier() and initialize to classic with two parameters verbose and prediction
verbose: int data type, if non zero, progress messages are printed. Above 50, the output is sent to stdout. The frequency of the messages increases with the verbosity level. If it more than 10, all iterations are reported. I would suggest you try different values based on your depth of analysis
predictions: boolean data type if it is set to True then it will return all the predicted values from the models

# fit and train the model 
start_time_1=time.time()
models_c,predictions_c=classi.fit(X_train, X_test, Y_train, Y_test)
end_time_1=time.time()

we gonna fit train and test data to the classi object
classic will return two values:
models_c: will have all the models and with some metrics
predictions_c: will have all the predicted values that is ŷ

# to check which model did better on the iris dataset 
models_c

To be honest I didn't know some of these models even exist for classification until I saw this
I know that your mind would be thinking why is ROC AUC is None is this function not giving proper output nope that's not the case here, ROC AUC is None because we have taken multi-classification dataset

Tip 2: For the above dataset or multi-classification we can use roc_auc_score rather than ROC AUC

# to check the predications for the models 
predictions_c

This is just a few sample predictions from the models

LazyRegressor

So we checked out LazyClassifier it will be sad if we didn't pay some attention to LazyRegressor

The following code is similar to LazyClassifier so let's pick up the phase and skip some explanations

# load the fetch_california_housing dataset
data1=fetch_california_housing()
X1=data1.data
Y1=data1.target

data1 is dict data type with data and target as keys

# split the dataset 
X_train1, X_test1, Y_train1, Y_test1 =train_test_split(X1,Y1,test_size=.3,random_state =23)
regr=LazyRegressor(verbose=0,predictions=True)

after fitting the model next we will train

# fit and train the model 
start_time_2=time.time()
models_r,predictions_r=regr.fit(X_train1, X_test1, Y_train1, Y_test1)
end_time_2=time.time()

Note

1. Before running the above cell make sure you clear all the unnecessary background process because it takes a lot of computation power

2. I would suggest if you have low computation power(RAM, GPU) then use Google Colab, This is the simplest solution you can get

# to check which model did better on the fetch_california_housing dataset
models_r

And again I didn't know there were so many models for regression

# to check the predications for the models 
predictions_r

Time Complexity

We should talk about time complexity because that's the main goal for all us to reduce it as much as possible

# time complexity 
print("The time taken by LazyClassifier for {0} samples is {1} ms".format(len(data.data),round(end_time_1-start_time_1,0)))
print("The time taken by LazyRegressor for {0} samples is {1} ms".format(len(data1.data),round(end_time_2-start_time_2,0)))

Tip 3: Add %%time to check the execution time of the current jupyter cell

Note

Use this library in the first iteration of your ML project before hypertunning models
lazypredict only works for Python versions ≥3.6
If you don’t have the computational power just use Google colab

The Github link is here for the code.

If you want to read the official docs

That's all the things you need to know about lazypredict library for now

Hope you learned new things from this article today and will help you to make your ML projects a bit easier

Thank you for dedicating a few mins of your day

If you have any doubts just comment down below I will be happy to help you out!

Thank you!

-Mani

Lazy Predict for ML Models

Install the library

Note

Code

LazyClassifier

LazyRegressor

Note

Time Complexity

Note

Written by Prajwal Mani