Evaluating Data Drift and ML Model Performance with Evidently and Deepnote

Elena Samuylova
Deepnote
Published in
4 min readDec 14, 2021

How to quickly build beautiful interactive reports on your model.

TL;DR — You can use Evidently open-source library in Deepnote to generate interactive visual reports on your model performance and data distribution shift.

Why ML monitoring matters

The work of a data scientist does not stop when you develop and deploy a machine learning model. Once the model is up, you need to ensure that it performs as expected.

A lot of things can go wrong here.

For example, the model might receive incorrect or broken data. It would then generate an unreliable prediction. If you do not detect this on time, you will see a decline in the model performance metric: accuracy, mean error, or a business KPI such as click-through rate.

Even if the data quality is fine, the model might degrade with time. This change in real-world patterns is known as “Concept Drift.” In this case, the very meaning of what we are trying to predict evolves. For example, as new types of fraud appear, the fraud detection models trained on older data would no longer be relevant.

You can also observe new data inputs. For example, if your model was trained on user behavior in a certain geographical location, it might not perform as well when applied in a new region. This is known as “Data Drift,” which manifests itself through the shift in input data distributions.

To detect and resolve such issues on time, one needs to have visibility into model performance. You can do this by having regular performance checks on top of the model application logs.

Here is how you can do this in Deepnote, using the Evidently open-source library.

Data Drift: early monitoring of the model performance

Let’s take an example. We are predicting something but do not yet know if our predictions are accurate. We often have this lag: it might take a day, a week, or even a month until we have the ground truth labels.

In this case, we can still assess the model relevance by looking at how the input feature distribution and model predictions change.

To do that, we can use the Evidently open-source library that now has full support for the Deepnote notebooks. Here is an example notebook.

  1. Let’s open the Deepnote notebook and install Evidently:
!pip install evidently

2. Next, we import a few libraries. We want to evaluate the Data Drift, so we choose a particular Tab we need.

import pandas as pdfrom sklearn import datasetsfrom evidently.dashboard import Dashboard  from evidently.dashboard.tabs import DataDriftTab

3. As a test demonstration, we’ll use the Breast Cancer dataset from sklearn. To analyze your specific model, you can swap this for the actual model prediction logs.

breast_cancer = datasets.load_breast_cancer()breast_cancer_frame = pd.DataFrame(breast_cancer.data, columns = breast_cancer.feature_names)

4. Now, let us calculate the drift report! All we need to do is to specify what we compare to what. In practice, one might want to compare the current production data to the data you used in training or compare data for two different periods, for example, the latest month for the previous one. In our example, we will treat the first 200 rows as reference data:

data_drift_dashboard = Dashboard(tabs=[DataDriftTab()])data_drift_dashboard.calculate(breast_cancer_frame[:200], breast_cancer_frame[200:])

Let’s have a look! We can display the results of the analysis directly in Deepnote:

data_drift_dashboard.show(mode=’inline’)

With just a few lines of code, we got an interactive report that shows how the input data distribution changed. Under the hood, the tool applied the statistical tests, the K-S test for the numerical features in this case.

We can see that the distributions of the features are statistically different. In practice, this could come as an early warning that our model operates in an unfamiliar environment, and its performance might decay.

We can apply a similar approach to analyze the shift in model predictions by running the Evidently Target Drift report. If we are lucky to have the true labels, we can run the Model Performance reports to get complete visibility into model quality.

Collaborative reporting

Detecting the change is one part of the puzzle.

To investigate and resolve the root cause of the model decay or interpret the data shift, one often needs to share the model performance with other stakeholders. Since Deepnote allows easy sharing of the interactive notebooks, you can use this as a tool for collaborative debugging of the model performance or to present and share the current model results.

--

--

Elena Samuylova
Deepnote

Co-founder and CEO Evidently AI. Building open-source tools to analyze and monitor machine learning models.