Automated Data Drift Detection For Machine Learning Pipelines.

Mastering Tabular Data Validation with TFDV: A Practical Guide to Ensure Accurate and Reliable Machine Learning Models.

Serop Baghdadlian
Geek Culture

--

Validation of input data is a critical component of our machine-learning workflow.

It becomes even more crucial for systems that are already in production, as automated validation can prevent model performance deterioration.

In this article, I will give you an introduction to the main types of data drift and a hands-on tutorial on using the TensorFlow data validation tool in order to automatically detect data drift in a machine learning system.

Data drift

To fully comprehend the different types of data drifts and potential data problems, let’s start by defining data drift:

“Data Drift is the gradual change in the distribution or characteristics of the input data used by a machine learning model over time. This shift can negatively impact the machine learning model’s performance and result in inaccurate predictions”.

Types of Data Drift

In a machine learning pipeline, there are several types of input data drifts that you should check for, depending…

--

--

Serop Baghdadlian
Geek Culture

Senior Machine Learning Engineer | Data Scientist 🧑‍💻. For consulting and coaching check out my personal website: https://serop-ba.github.io/