Lazy Predict for ML Models
Hey, hope you are having a wonderful day!
Whenever I work on a new ML project. These lines always pop up in my mind every time
“I need to fit the data for every model then apply metrics to check which model has better accuracy for the available dataset ,then choose best model and also this process is time-consuming and even it might not be that much effective too“
For this problem, I got a simple solution when surfing through python org, which is a small python library by name “lazypredict” and it does wonders
Let me tell you how it works:-
Install the library
pip install lazypredict
Note
- lazypredict only works for python version≥3.6
- It's built on top of various other libraries so if you don't have those libraries in the system, python will throw ModuleError so interpret the error properly and install the required libraries.
lazypredict comes only for supervised learning (Classification and Regression)
I will be using jupyter notebook in this article
Code
# import necessary modules
import warnings
warnings.filterwarnings('ignore')
import time
from sklearn.datasets import load_iris,fetch_california_housing
from sklearn.model_selection import train_test_split
from lazypredict.Supervised import LazyClassifier,LazyRegressor
- warnings: Package to handle warnings and ‘ignore’ is used when we need to filter out all the warnings
- time: Package to handle time manipulation
- sklearn.datasets: Package to load datasets, today we gonna use the classic datasets which everyone works on it that are load_iris() for classification problem and fetch_california_housing() for a regression problem
- sklearn.model_selection.train_test_split:Used to split the dataset into train and split
- lazypredict:this is the package we gonna learn today in lazypredict.Supervised there are two main functions LazyClassifier for Classification and LazyRegressor for Regression
LazyClassifier
# load the iris dataset
data=load_iris()
X=data.data
Y=data.target
- The data is a variable with dictionary data type where there are two keys the data which contains independent features/column values and target which contains dependent feature value
- X has all the independent features values
- Y has all the dependent features values
# split the dataset
X_train, X_test, Y_train, Y_test =train_test_split(X,Y,test_size=.3,random_state =23)
classi=LazyClassifier(verbose=0,predictions=True)
- We will split the data into train and test using train_test_split()
- The test size will be 0.3(30%) of the dataset
- random_state will decide the splitting of data into train and test indices just choose any number you like!
Tip 1:If you want to see source code behind any function or object in the jupyter notebook then just add ? or ?? after the object or the function you want to check out and excute it
- Next, we will call LazyClassifier() and initialize to classic with two parameters verbose and prediction
- verbose: int data type, if non zero, progress messages are printed. Above 50, the output is sent to stdout. The frequency of the messages increases with the verbosity level. If it more than 10, all iterations are reported. I would suggest you try different values based on your depth of analysis
- predictions: boolean data type if it is set to True then it will return all the predicted values from the models
# fit and train the model
start_time_1=time.time()
models_c,predictions_c=classi.fit(X_train, X_test, Y_train, Y_test)
end_time_1=time.time()
- we gonna fit train and test data to the classi object
- classic will return two values:
- models_c: will have all the models and with some metrics
- predictions_c: will have all the predicted values that is ŷ
# to check which model did better on the iris dataset
models_c
- To be honest I didn't know some of these models even exist for classification until I saw this
- I know that your mind would be thinking why is ROC AUC is None is this function not giving proper output nope that's not the case here, ROC AUC is None because we have taken multi-classification dataset
Tip 2: For the above dataset or multi-classification we can use roc_auc_score rather than ROC AUC
# to check the predications for the models
predictions_c
- This is just a few sample predictions from the models
LazyRegressor
- So we checked out LazyClassifier it will be sad if we didn't pay some attention to LazyRegressor
- The following code is similar to LazyClassifier so let's pick up the phase and skip some explanations
# load the fetch_california_housing dataset
data1=fetch_california_housing()
X1=data1.data
Y1=data1.target
- data1 is dict data type with data and target as keys
# split the dataset
X_train1, X_test1, Y_train1, Y_test1 =train_test_split(X1,Y1,test_size=.3,random_state =23)
regr=LazyRegressor(verbose=0,predictions=True)
- after fitting the model next we will train
# fit and train the model
start_time_2=time.time()
models_r,predictions_r=regr.fit(X_train1, X_test1, Y_train1, Y_test1)
end_time_2=time.time()
Note
1. Before running the above cell make sure you clear all the unnecessary background process because it takes a lot of computation power
2. I would suggest if you have low computation power(RAM, GPU) then use Google Colab, This is the simplest solution you can get
# to check which model did better on the fetch_california_housing dataset
models_r
- And again I didn't know there were so many models for regression
# to check the predications for the models
predictions_r
Time Complexity
- We should talk about time complexity because that's the main goal for all us to reduce it as much as possible
# time complexity
print("The time taken by LazyClassifier for {0} samples is {1} ms".format(len(data.data),round(end_time_1-start_time_1,0)))
print("The time taken by LazyRegressor for {0} samples is {1} ms".format(len(data1.data),round(end_time_2-start_time_2,0)))
Tip 3: Add %%time to check the execution time of the current jupyter cell
Note
- Use this library in the first iteration of your ML project before hypertunning models
- lazypredict only works for Python versions ≥3.6
- If you don’t have the computational power just use Google colab
The Github link is here for the code.
If you want to read the official docs
That's all the things you need to know about lazypredict library for now
Hope you learned new things from this article today and will help you to make your ML projects a bit easier
Thank you for dedicating a few mins of your day
If you have any doubts just comment down below I will be happy to help you out!
Thank you!
-Mani