Monitoring Machine Learning Model Using Evidently AI

Sanjjushri Varshini R
3 min readMay 19, 2024

--

Make ML Monitoring Efficient with Evidently AI

Evidently AI is an open-source tool developed to monitor and analyze the performance of machine learning models in production. It helps detect data and model quality issues, track changes in data distributions, and generate comprehensive reports.

In this article, I will demonstrate how to use Evidently AI to generate various reports using the California Housing dataset. We will cover installation, data preparation, and creating reports with detailed explanations.

Step 1: Installation

!pip install evidently==0.4.16 jupyter_contrib_nbextensions==0.7.0 pandas==2.2.1 scikit-learn==1.4.1.post1 jupyterlab==4.1.2 

Step 2: Importing Necessary Libraries

import pandas as pd
import numpy as np
from sklearn.datasets import fetch_california_housing

from evidently import ColumnMapping
from evidently.report import Report
from evidently.metric_preset import DataDriftPreset, TargetDriftPreset, DataQualityPreset, RegressionPreset
from evidently.metrics import ColumnSummaryMetric, ColumnQuantileMetric, ColumnDriftMetric, generate_column_metrics

from evidently.test_suite import TestSuite
from evidently.tests import generate_column_tests
from evidently.test_preset import DataStabilityTestPreset, NoTargetPerformanceTestPreset, RegressionTestPreset

Step 3: Load the Dataset

We will use the California Housing dataset, available in sklearn. Then rename the target column and add a prediction column with some noise.

# Load California Housing dataset
data = fetch_california_housing(as_frame=True)
housing_data = data.frame

# Rename the target column
housing_data.rename(columns={'MedHouseVal': 'target'}, inplace=True)

# Add a prediction column with some noise
housing_data['prediction'] = housing_data['target'].values + np.random.normal(0, 5, housing_data.shape[0])

Step 4: Split the Data

We’ll split the data into reference and current datasets for comparison purposes.

# Split the data into reference and current datasets
reference = housing_data.sample(n=5000, replace=False)
current = housing_data.sample(n=5000, replace=False)

Step 5: Generate a Data Drift Report

We’ll create a report to detect data drift between the reference and current datasets.

# Create a report to detect data drift
report = Report(metrics=[
DataDriftPreset(),
])

# Run the report
report.run(reference_data=reference, current_data=current)

# Display the report
report.show(mode='inline')

Step 6: Generate Column-Specific Reports

Generating detailed reports for specific columns to analyze their summaries, quantiles, and drifts.

# Create a report for specific columns
report = Report(metrics=[
ColumnSummaryMetric(column_name='AveRooms'),
ColumnQuantileMetric(column_name='AveRooms', quantile=0.25),
ColumnDriftMetric(column_name='AveRooms'),
])

# Run the report
report.run(reference_data=reference, current_data=current)

# Display the report
report.show(mode='inline')

Step 7: Generate Custom Column Metrics

Creating a report using custom column metrics for multiple columns.

# Create a report with custom column metrics
report = Report(metrics=[
generate_column_metrics(ColumnQuantileMetric, parameters={'quantile': 0.25}, columns=['AveRooms', 'AveBedrms']),
])

# Run the report
report.run(reference_data=reference, current_data=current)

# Display the report
report.show(mode='inline')

Step 8: Combining Multiple Metrics in a Report

We can combine multiple metrics in a single report for comprehensive analysis.

# Create a combined report with multiple metrics
report = Report(metrics=[
ColumnSummaryMetric(column_name='AveRooms'),
generate_column_metrics(ColumnQuantileMetric, parameters={'quantile': 0.25}, columns='num'),
DataDriftPreset()
])

# Run the report
report.run(reference_data=reference, current_data=current)

# Display the report
report.show(mode='inline')

Step 9: Exporting Reports

# Export the report as a dictionary
report_dict = report.as_dict()

# Export the report as JSON
report_json = report.json()

Evidenly AI helps ensure model reliability and performance over time. By following the steps outlined in this article, you can start using Evidently AI to generate insightful reports for your datasets.

GitHub: https://github.com/Sanjjushri/evidently-ai-poc

Explore more articles on Evidently AI:

  1. How to Access Data in an Evidently AI Report

2. Integrating Evidently AI with MLflow for ML Model Monitoring

--

--