Sitemap
Nerd For Tech

NFT is an Educational Media House. Our mission is to bring the invaluable knowledge and experiences of experts from all over the world to the novice. To know more about us, visit https://www.nerdfortech.org/.

Monitor Models & Beat Drift with Open Tools

8 min readJun 14, 2025

--

Introduction: The Silent Killer of ML Models — Why Monitoring Matters

Understanding Drift: Data Drift vs. Concept Drift Explained

# Simple visual check for data drift using histograms
import matplotlib.pyplot as plt
import numpy as np

# Compare feature distributions
plt.figure(figsize=(12, 5))
plt.subplot(1, 2, 1)
plt.hist(training_data['income'], bins=30, alpha=0.7)
plt.title("Training Income Distribution")
plt.subplot(1, 2, 2)
plt.hist(production_data['income'], bins=30, alpha=0.7, color='orange')
plt.title("Production Income Distribution")
plt.tight_layout()

The MLOps Monitoring Stack: Key Open-Source Tools

Hands-on: Setting Up Your Monitoring Pipeline

Step 1: Instrument Your Model

from prometheus_client import start_http_server, Summary, Counter
import time
import numpy as np

# Create metrics
PREDICTION_TIME = Summary('prediction_time_seconds', 'Time spent processing prediction')
PREDICTIONS = Counter('predictions_total', 'Total number of predictions', ['model', 'outcome'])
FEATURE_MEAN = Summary('feature_mean', 'Mean value for model features', ['feature'])
# Decorate your prediction function
@PREDICTION_TIME.time()
def predict(features):
# Record feature stats for drift monitoring
for feature_name, value in features.items():
FEATURE_MEAN.labels(feature=feature_name).observe(value)
# Make prediction (your model code here)
result = model.predict([list(features.values())])[0]
# Record prediction outcome
PREDICTIONS.labels(model='fraud_detector', outcome=str(result)).inc()
return result
# Start Prometheus HTTP server to expose metrics
start_http_server(8000)

Step 2: Configure Prometheus

global:
scrape_interval: 15s

scrape_configs:
- job_name: 'ml-model'
static_configs:
- targets: ['localhost:8000']
docker run -p 9090:9090 -v /path/to/prometheus.yml:/etc/prometheus/prometheus.yml prom/prometheus

Step 3: Set Up Grafana Dashboard

sum(rate(predictions_total{model="fraud_detector"}[5m])) by (outcome)

Detecting Drift in Practice: Statistical Methods & Specialized Tools

Statistical Methods

from evidently.dashboard import Dashboard
from evidently.dashboard.tabs import DataDriftTab
from evidently.model_profile import Profile
from evidently.model_profile.sections import DataDriftProfileSection

# Create a dataset reference (baseline)
reference = model_training_data.sample(5000)
# Set up the dashboard
drift_dashboard = Dashboard(tabs=[DataDriftTab()])
# Function to check for drift in new data
def check_drift(new_data):
drift_dashboard.calculate(reference, new_data, column_mapping=None)
# Get drift metrics as dictionary
drift_profile = Profile(sections=[DataDriftProfileSection()])
drift_profile.calculate(reference, new_data, column_mapping=None)
report = drift_profile.json()
# Extract drift metrics for alerting
metrics = report['data_drift']['data']['metrics']
drifted_features = [f for f in metrics if metrics[f]['drift_detected']]
if len(drifted_features) > 0:
print(f"ALERT: Drift detected in features: {', '.join(drifted_features)}")
# Trigger your alert/retraining system here
return True
return False
# Check new production data every hour
while True:
new_data = fetch_latest_production_data() # your function to get new data
drift_detected = check_drift(new_data)
time.sleep(3600) # wait an hour
import nannyml as nml

# Load reference and analysis data
reference_data = nml.load_credit_card_data()[0]
analysis_data = nml.load_credit_card_data()[1]
# Initialize the drift calculator
calculator = nml.DriftCalculator(
feature_column_names=feature_columns,
timestamp_column_name='timestamp'
)
# Fit on reference data
calculator.fit(reference_data)
# Calculate drift on analysis data
results = calculator.calculate(analysis_data)
# Plot univariate drift for a feature
nml.plots.univariate_drift(results, 'income', kind='feature')

From Detection to Action: Alerting and Automated Retraining Strategies

1. Alerting Configuration

# Alert rule in Grafana
# Name: Data Drift Detected
# Query: max(data_drift_score) > 0.7
# Evaluate: Every 1h For 0m
# Notification message: Data drift detected in model $modelName with score $driftScore

2. Automated Retraining Pipeline

from airflow import DAG
from airflow.operators.python import PythonOperator
from airflow.sensors.external_task import ExternalTaskSensor
from datetime import datetime, timedelta

default_args = {
'owner': 'mlops',
'depends_on_past': False,
'email_on_failure': True,
'email': ['mlops-team@company.com'],
'retries': 1,
'retry_delay': timedelta(minutes=5),
}
# DAG that runs when triggered by drift detection
dag = DAG(
'model_retraining',
default_args=default_args,
description='Retrains ML model when drift is detected',
schedule_interval=None, # Triggered by drift detector
start_date=datetime(2023, 1, 1),
catchup=False,
)
def fetch_new_training_data(**kwargs):
# Code to fetch recent data for retraining
return "path_to_new_data"
def retrain_model(**kwargs):
ti = kwargs['ti']
data_path = ti.xcom_pull(task_ids='fetch_new_training_data')
# Retrain model with new data
# Your model training code here
return "path_to_new_model"
def evaluate_and_deploy(**kwargs):
ti = kwargs['ti']
model_path = ti.xcom_pull(task_ids='retrain_model')
# Evaluate model quality
# If good enough, deploy it
# Your deployment code here
fetch_data = PythonOperator(
task_id='fetch_new_training_data',
python_callable=fetch_new_training_data,
dag=dag,
)
retrain = PythonOperator(
task_id='retrain_model',
python_callable=retrain_model,
dag=dag,
)
deploy = PythonOperator(
task_id='evaluate_and_deploy',
python_callable=evaluate_and_deploy,
dag=dag,
)
fetch_data >> retrain >> deploy

3. Decision Framework

Case Study: Monitoring a Deployed Classification Model

Conclusion & Key Takeaways: Maintaining Peak Model Performance

🔍 Interview Questions on ML Model Monitoring

⚠️ Common Pitfalls in Model Monitoring

📚 Further Reading

What Now?

--

--

Nerd For Tech
Nerd For Tech

Published in Nerd For Tech

NFT is an Educational Media House. Our mission is to bring the invaluable knowledge and experiences of experts from all over the world to the novice. To know more about us, visit https://www.nerdfortech.org/.

Tapan Kumar Patro
Tapan Kumar Patro

Written by Tapan Kumar Patro

📚 Machine learning | 🤖 Deep Learning | 👀 Computer vision | 🗣 Natural Language processing | 👂 Audio Data | 🖥 End to End Software Development | 🖌

No responses yet