Automated Data Drift Detection For Machine Learning Pipelines.

Mastering Tabular Data Validation with TFDV: A Practical Guide to Ensure Accurate and Reliable Machine Learning Models.

Serop Baghdadlian

Published in

Geek Culture

10 min readMar 29, 2023

Validation of input data is a critical component of our machine-learning workflow.

It becomes even more crucial for systems that are already in production, as automated validation can prevent model performance deterioration.

In this article, I will give you an introduction to the main types of data drift and a hands-on tutorial on using the TensorFlow data validation tool in order to automatically detect data drift in a machine learning system.

Data drift

To fully comprehend the different types of data drifts and potential data problems, let’s start by defining data drift:

“Data Drift is the gradual change in the distribution or characteristics of the input data used by a machine learning model over time. This shift can negatively impact the machine learning model’s performance and result in inaccurate predictions”.

Types of Data Drift

In a machine learning pipeline, there are several types of input data drifts that you should check for, depending…

Automated Data Drift Detection For Machine Learning Pipelines.

Mastering Tabular Data Validation with TFDV: A Practical Guide to Ensure Accurate and Reliable Machine Learning Models.

Data drift

Types of Data Drift

Written by Serop Baghdadlian