Fundamentals of MLOps — Part 3 | ML Experimentation using PyCaret

Tezan Sahu
Sep 5 · 7 min read

In Part 2 of this 4-blog series, we gained some experience with DVC, a tool to implement efficient versioning of ML artifacts. With our data in place, the next steps in an ML workflow are to first perform EDA, followed by feature engineering & model training. Although they form a reasonably small portion of the entire infrastructure of an ML system, it is evident that these steps form the heart of the ML pipeline.

With increased competition across the globe, organizations are always attempting to create and deliver better solutions more quickly. To do so, making the iterative processes in an ML pipeline quicker, more robust & efficient is essential. Through this article, we will understand some of the hype around “no-code” & “low-code” Machine Learning, primarily aimed to automate ML pipelines, & in the process, dive into a Python library named PyCaret, which can help reduce the experimentation time with ML pipelines by leaps & bounds. So, let’s get started…

Contents

ML Pipelines

Similarly in the MLOps world, ML Pipeline is essentially a technique of codifying & automating the ML workflow for a project to produce ML models for production. An end-to-end ML pipeline consists of the various sequential processes that handle everything: from data extraction & preprocessing, through model training & validation, to the final deployment.

Image Source: Pipelines for production ML systems

The major transformation that has been brought about by this concept is that now, teams do not build & maintain ML models, but rather focus & developing & maintaining an entire pipeline as a product, which serves as the blueprint for experimenting with & developing newer models with minor modifications. This ensures faster iteration cycles & allows for a greater degree of scalability.

No-Code & Low-Code ML

No-Code Platforms

Image Source: Mapping the no-code AI landscape

Although they allow rapid creation of prototype models for non-programmers, the major drawback of such platforms is the limit on functionality & the loss of granular control (& hence the degree of customizability) over the algorithms that are used, because the user cannot make any changes to the packaged code available off-the-shelf.

Low-Code Platforms

Low-code platforms will never be able to completely replace hand-coded algorithms. They can, however, assist developers in taking ownership of modular blocks that perform some tasks in the ML workflow, to speed prototyping.

Low-Code ML with PyCaret

Modularized Features of PyCaret

Image Source: PyCaret 101 — for beginners

PyCaret includes a wide range of preprocessing steps & automatic feature engineering that can be applied based on the type of task at hand & also allows ensembling of selected models using different techniques.

PyCaret Modules

In this post, we will understand the Regression module in detail. You can feel free to explore the functionalities provided by the other modules at your leisure.

Experimentation using PyCaret

Installation

When using such a package manager, it is advisable to create & enable a virtual environment to avoid potential conflicts with other packages

The latest release of PyCaret (2.3.1 at the time of writing this article)

$ pip install pycaret

Use the following commands if you wish to install PyCaret in a notebook:

pip install pycaret          # For local Jupyter notebook!pip install pycaret         # For Google Colab or Azure Notebooks

Building End-To-End ML Pipelines with PyCaret

The Basic PyCaret section will walk you through the following:

  • PyCaret Environment Setup
  • Comparison of Model Algorithms
  • Training & Fine-Tuning a Model
  • Evaluation of a Model through Plots
  • Making Predictions using Trained Model
  • Saving & Loading a Model

The Intermediate PyCaret section will involve the following:

  • Data Transformation
  • Feature Engineering
  • Model Ensembling
  • Custom Grid Search in Hyperparameter Tuning

Link to PyCaret Tutorial Notebook

Closing Remarks

The question that now remains to be answered is “How do we deployed these models & infer from them so that they can be used in the wild?” We will answer this question in our final post when we look at how to deploy our trained models using PyCaret on AWS, use MLFlow for logging our experiments & quickly spin up a web server for hosting our deployed model as an API for the users to obtain predictions.

Following are the other parts of this Fundamentals of MLOps series:

Thank you & Happy Coding!

About the Author

Website: Tezan Sahu | Microsoft
LinkedIn: Tezan Sahu | LinkedIn
Email ID: tezansahu@gmail.com

Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data…

Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data Science professionals. We are building the next-gen data science ecosystem https://www.analyticsvidhya.com

Tezan Sahu

Written by

Data & Applied Scientist at Microsoft | B. Tech in Mechanical Engineering (Minor in CS) from IIT Bombay | GSoC’20 with PEcAn Project

Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data Science professionals. We are building the next-gen data science ecosystem https://www.analyticsvidhya.com