How deep are your checks?

Suyash Srivastava
4 min readNov 12, 2022

--

Introducing Deepchecks, an open source python package used to verify and validate machine learning models and data in production.

What & Why?

Building production-ready machine learning pipelines you can trust requires several essential validations and tests. One of the trickiest areas of machine learning research has always been and probably will remain model validation, especially in commercial situations.

Moreover, not all data scientists are comfortable with conventional code testing techniques, and more crucially, data science encompasses much more than just writing code. For example, we must consider checking the dataset’s integrity, looking at its distributions, validating data splits, evaluating and comparing models, and other factors while validating pipelines.

Deepchecks to the rescue!

Deepchecks assists you with various validation and testing requirements, such as looking at your data distributions, verifying data splits, evaluating your model, and comparing it to other models.

Architecture diagram of Deepchecks, which is composed of checks, conditions and suites. Not all input types are relevant for every suite. [1]

Although each ML pipeline has distinctive qualities, all ML teams share a few common issues. Through checks, conditions and suites, Deepchecks aids in overcoming especially three of these difficulties:

  1. Data Integrity Validation.
  2. Train Test Split Validation.
  3. Model Evaluation.

Prerequisites

We’ll use deepchecks to build a movie recommender (tabular data). This article assumes the readers to have essential experience in using Jupyter Notebook/Lab, Python and packages built in it for Machine Learning/Data Science such as Pandas and Scikit-learn(Surprise used in this article). Assuming the aforementioned requirements are installed on your computer, let’s begin checking deeply!

Installation

# deepchecks for tabular data:
pip install deepchecks --upgrade
pip install numpy
pip install scikit-surprise

Data

Using pandas to get a peek at a custom movie dataset reveals that we have 294086 rows and 3 columns, i.e. User ID, Movie Name(Movie ID) & Rating of the movie provided by that user.

Dataset of 294086 rows containing user id, movie name and movie rating columns.

Data Integrity Suite

This suite includes a variety of tests, including those for mixed data types, special characters, and string mismatches.

Each check may include conditions (which will produce the outputs pass/fail/warning/error) as well as other outputs like charts or tables.

Conditions, suites, and checks can all be changed.

Running Data Integrity Suite.

We import Dataset and data_integrity modules from deepchecks to run the suite. A part of the results is shown below.

Results from running Data Integrity Suite.

As you can see, a couple of checks have failed. However, post running the suite, we can quickly identify shortcomings in the dataset to be fixed before running the pipeline to train a model. This helps us improve problems with the dataset at the very start rather than assessing it after a complete pipeline run.

Train Test Validation

The Train Test Validation suite includes a number of tests, including Multivariate Drift, Size Comparison of Datasets, Category Mismatch Train Test, etc.

Running Train Test Validation Suite.
Train Test Validation Results.

Along the numerous train test validation checks, deepchecks also helps visualize the Suite’s result. For example, the Drift score between train and test sets is presented for movie_rating column.

Model Evaluation Suite

Model Evaluation suite comprises Boosting Overfit, Weak Segments Performance, and Calibration Score, among many others. Moreover, as our task is of regression type, tests such as Model Inference Time, Train Test Prediction Drift and Regression Error Distribution become relevant and highly important here.

We will use Surprise, a scikit package, to quickly build recommender systems.

Training an SVD model using Scikit Surprise.

Running Model Evaluation Suite on the trained model with the train test datasets gives us the following result.

Results from Model Evaluation Suite

Limitations

Deepchecks only supports scikit-learn-compatible fitted estimator instances as of now. (Tested for tabular data)

Strengths & Takeaway

Deepchecks detects and addresses checks for every sensitive parameter and difficulties that any real-time data and machine learning model would encounter. This helps the machine learning model provide reliable results.

Because of this, Deepchecks is a user-friendly toolkit that machine-learning engineers and developers can use to create trustworthy machine-learning models that achieve the desired results.

--

--