Lazy Predict for ML Models

Prajwal Mani
The Startup
Published in
5 min readAug 7, 2020
Photo by Kate Stone Matheson on Unsplash

Hey, hope you are having a wonderful day!

Whenever I work on a new ML project. These lines always pop up in my mind every time

“I need to fit the data for every model then apply metrics to check which model has better accuracy for the available dataset ,then choose best model and also this process is time-consuming and even it might not be that much effective too“

For this problem, I got a simple solution when surfing through python org, which is a small python library by name “lazypredict” and it does wonders

Let me tell you how it works:-

Install the library

pip install lazypredict

Note

  1. lazypredict only works for python version≥3.6
  2. It's built on top of various other libraries so if you don't have those libraries in the system, python will throw ModuleError so interpret the error properly and install the required libraries.

lazypredict comes only for supervised learning (Classification and Regression)

I will be using jupyter notebook in this article

Code

# import necessary modules
import warnings
warnings.filterwarnings('ignore')
import time
from sklearn.datasets import load_iris,fetch_california_housing
from sklearn.model_selection import train_test_split
from lazypredict.Supervised import LazyClassifier,LazyRegressor
  • warnings: Package to handle warnings and ‘ignore’ is used when we need to filter out all the warnings
  • time: Package to handle time manipulation
  • sklearn.datasets: Package to load datasets, today we gonna use the classic datasets which everyone works on it that are load_iris() for classification problem and fetch_california_housing() for a regression problem
  • sklearn.model_selection.train_test_split:Used to split the dataset into train and split
  • lazypredict:this is the package we gonna learn today in lazypredict.Supervised there are two main functions LazyClassifier for Classification and LazyRegressor for Regression

LazyClassifier

# load the iris dataset
data=load_iris()
X=data.data
Y=data.target
  • The data is a variable with dictionary data type where there are two keys the data which contains independent features/column values and target which contains dependent feature value
  • X has all the independent features values
  • Y has all the dependent features values
# split the dataset 
X_train, X_test, Y_train, Y_test =train_test_split(X,Y,test_size=.3,random_state =23)
classi=LazyClassifier(verbose=0,predictions=True)
  • We will split the data into train and test using train_test_split()
  • The test size will be 0.3(30%) of the dataset
  • random_state will decide the splitting of data into train and test indices just choose any number you like!

Tip 1:If you want to see source code behind any function or object in the jupyter notebook then just add ? or ?? after the object or the function you want to check out and excute it

  • Next, we will call LazyClassifier() and initialize to classic with two parameters verbose and prediction
  • verbose: int data type, if non zero, progress messages are printed. Above 50, the output is sent to stdout. The frequency of the messages increases with the verbosity level. If it more than 10, all iterations are reported. I would suggest you try different values based on your depth of analysis
  • predictions: boolean data type if it is set to True then it will return all the predicted values from the models
# fit and train the model 
start_time_1=time.time()
models_c,predictions_c=classi.fit(X_train, X_test, Y_train, Y_test)
end_time_1=time.time()
  • we gonna fit train and test data to the classi object
  • classic will return two values:
  • models_c: will have all the models and with some metrics
  • predictions_c: will have all the predicted values that is ŷ
# to check which model did better on the iris dataset 
models_c
model_c output
  • To be honest I didn't know some of these models even exist for classification until I saw this
  • I know that your mind would be thinking why is ROC AUC is None is this function not giving proper output nope that's not the case here, ROC AUC is None because we have taken multi-classification dataset

Tip 2: For the above dataset or multi-classification we can use roc_auc_score rather than ROC AUC

# to check the predications for the models 
predictions_c
predictions_c output
  • This is just a few sample predictions from the models

LazyRegressor

  • So we checked out LazyClassifier it will be sad if we didn't pay some attention to LazyRegressor
  • The following code is similar to LazyClassifier so let's pick up the phase and skip some explanations
# load the fetch_california_housing dataset
data1=fetch_california_housing()
X1=data1.data
Y1=data1.target
  • data1 is dict data type with data and target as keys
# split the dataset 
X_train1, X_test1, Y_train1, Y_test1 =train_test_split(X1,Y1,test_size=.3,random_state =23)
regr=LazyRegressor(verbose=0,predictions=True)
  • after fitting the model next we will train
# fit and train the model 
start_time_2=time.time()
models_r,predictions_r=regr.fit(X_train1, X_test1, Y_train1, Y_test1)
end_time_2=time.time()

Note

1. Before running the above cell make sure you clear all the unnecessary background process because it takes a lot of computation power

2. I would suggest if you have low computation power(RAM, GPU) then use Google Colab, This is the simplest solution you can get

# to check which model did better on the fetch_california_housing dataset
models_r
models_r output
  • And again I didn't know there were so many models for regression
# to check the predications for the models 
predictions_r
predictions_r output

Time Complexity

  • We should talk about time complexity because that's the main goal for all us to reduce it as much as possible
# time complexity 
print("The time taken by LazyClassifier for {0} samples is {1} ms".format(len(data.data),round(end_time_1-start_time_1,0)))
print("The time taken by LazyRegressor for {0} samples is {1} ms".format(len(data1.data),round(end_time_2-start_time_2,0)))
time complexity output

Tip 3: Add %%time to check the execution time of the current jupyter cell

Note

  • Use this library in the first iteration of your ML project before hypertunning models
  • lazypredict only works for Python versions ≥3.6
  • If you don’t have the computational power just use Google colab

The Github link is here for the code.

If you want to read the official docs

That's all the things you need to know about lazypredict library for now

Hope you learned new things from this article today and will help you to make your ML projects a bit easier

Thank you for dedicating a few mins of your day

If you have any doubts just comment down below I will be happy to help you out!

Thank you!

-Mani

--

--

Prajwal Mani
The Startup

Just a random kid who is interested in ML|DL|DS|CS Student.books, poems and singing are my weakness, check me out at :https://linktr.ee/prajwal.mani