Cheat ML Model Creation with PyCaret
Low-code ML modeling and deployment for all
Are you tired of spending hours on data preprocessing, model training, and interpretation? Look no further than PyCaret! PyCaret is a low-code machine learning library in Python that automates the end-to-end machine learning process. I’ll show you how to use PyCaret with the Iris dataset and deploy on either Google Cloud, an API or in a container. With PyCaret you can speed up your machine learning workflow and focus on the fun parts of machine learning. Let’s get started!
Getting Started:
Before we start using PyCaret, if we are going to use Google Cloud we’ll need to create a project on the Google Cloud Console and enable billing. We also need to install PyCaret and a few other to get us running. To install PyCaret, open your terminal and run the following command:
(Note: I am running this all through VScode in a notebook)
!pip install --pre pycaret
!pip install pycaret[analysis]
!pip install fastapi
!pip install uvicorn
!pip install pycaret[mlops]
After installing PyCaret, we can import it in our Python script using the following command:
from pycaret.classification import *
Data Preparation:
The Iris dataset is a popular dataset in machine learning that contains information about three species of Iris flowers. To load the dataset using PyCaret, we can use the get_data
function. I am using this as fast way to pull in data for the the example but load in your data into a pandas dataframe and it works the same way.
from pycaret.datasets import get_data
data = get_data('iris')
Once we have loaded the data, we can preprocess it using PyCaret’s setup
function. The setup
function automates the data preprocessing tasks such as missing value imputation, categorical encoding, feature scaling, and outlier detection.
clf = setup(data, target='species', session_id=123)
In the above example, we have specified the target
variable and a session_id
to ensure reproducibility. PyCaret automatically detects the data types of the variables and performs the necessary preprocessing tasks. We can also specify the preprocessing tasks manually using the numeric_features
, categorical_features
, and ignore_features
parameters.
Compare Models:
best_models = compare_models()
After preparing the data, let’s use pycarets compare_models() function. It’s used for model selection and performance evaluation. When we apply compare_models() to a dataset, PyCaret trains and evaluates the performance of several machine learning models on that dataset. The function automatically applies several data preprocessing techniques and hyperparameter tuning methods to each model in order to find the best-performing model for that particular dataset. The output of compare_models() is a table that shows the performance metrics of all the evaluated models, ranked in order of their performance. This function helps us to quickly compare and select the best model for our dataset, which can save us time and effort in the machine learning workflow.
From the output above you can see that Logistic Regression seems to be the best model to use so we will focus on that to create a full model.
Modeling:
We can train machine learning models using PyCaret. PyCaret provides a wide range of classification algorithms such as logistic regression, decision tree, random forest, and gradient boosting. To train a model using PyCaret, we can use the create_model
function. For example:
lr = create_model('lr')
We have now trained a logistic regression model using the create_model
function. PyCaret automatically performs hyperparameter tuning using cross-validation to find the best hyperparameters for the model. We can also specify the hyperparameters manually using the tuned_parameters
parameter.
The libarary also provides a wide range of ensemble models such as stacking, blending, and bagging. To train an ensemble model using PyCaret, we can use the ensemble_model
function. For example:
stacker = ensemble_model(lr)
In the above example, we have trained a stacking ensemble model using the logistic regression model as the base estimator.
Model evaluation:
The plot_model() function in PyCaret is used for visualizing the performance of a trained machine learning model. In the example below, we are using plot_model() to create a confusion matrix for a Logistic Regression model (lr) with the plot_kwargs parameter set to {‘percent’: True}.
plot_model(lr, plot = 'confusion_matrix', plot_kwargs = {'percent' : True})
The confusion matrix is a table that displays the number of true positives, true negatives, false positives, and false negatives for a binary or multi-class classification problem. By setting the plot_kwargs parameter to {‘percent’: True}, the values in the confusion matrix are displayed as percentages of the total number of observations, which can be useful for quickly understanding the distribution of correct and incorrect predictions made by the model.
Overall, plot_model() is a flexible function that allows us to create a variety of visualizations for a trained model, including ROC curves, feature importance plots, and more. By visualizing the performance of our models, we can gain insights into how they are making predictions and identify areas for improvement.
Deployment to GCP:
After training and interpreting the model, we can deploy it on GCP using the Google Cloud AI Platform. We need to create a model on the AI Platform and upload the model files. To deploy a model using the AI Platform, we can upload using the following command:
deploy_model(stacker, model_name='iris_stacker', platform='gcp', project='my_project', bucket='my_bucket')
In the above example, we have deployed the stacking ensemble model on GCP with the project ID and bucket name. PyCaret automatically creates a Docker container with the required dependencies and uploads it to the specified bucket.
To deploy a model on Google Cloud, the project must be created using the command-line or GCP console. Once the project is created, you must create a service account and download the service account key as a JSON file to set environment variables in your local environment.
Deployment using other methods:
The create_api() function is used to deploy a trained machine learning model as a RESTful API. In the code below, we are using create_api() to deploy a Logistic Regression model (lr) that has been trained and saved in PyCaret as an API with the name ‘lr_api’.
create_api(lr, 'lr_api')
When we create an API using create_api(), it automatically creates a Flask web application that hosts the API and handles incoming requests. The function also automatically generates the code required to load the saved model, preprocess the incoming data, and make predictions using the model. This makes it easy to deploy machine learning models as APIs without having to write a lot of boilerplate code.
The output will be a python file using fastAPI and once the API is deployed, we can send requests to it using any programming language that supports HTTP requests. For example, we can use the requests library in Python to send a POST request to the API with input data, and receive a JSON response containing the model’s prediction.
Now if you use docker containers the create_docker() function is used to containerize the same model as above, but you do need to create the api first before deployment. Finally we can use standard Docker commands to push the image to a registry and deploy it to a container cluster.
create_docker('lr_api')
PyCaret is an awesome machine learning library that makes machine learning much easier and more efficient. We walked through different functionalities of PyCaret and explained how to prepare data, train machine learning models, evaluate them, and deploy. You don’t need to be an expert in machine learning to perform complex tasks. It simplifies the whole process and makes it easy for anyone to use. By deploying models on GCP or containers, we can also scale up our models to handle large datasets and high traffic, which is super important for many real-world applications.
Overall, I love PyCaret and honestly just recently discovered it and it’s a great library to add to your toolkit if you’re a machine learning enthusiast or practitioner. It can save you a lot of time and make your workflow much more efficient. So why not give it a try and see how it can help you with your next machine learning project!