Automated Data Drift Detection For Machine Learning Pipelines.
Mastering Tabular Data Validation with TFDV: A Practical Guide to Ensure Accurate and Reliable Machine Learning Models.
Validation of input data is a critical component of our machine-learning workflow.
It becomes even more crucial for systems that are already in production, as automated validation can prevent model performance deterioration.
In this article, I will give you an introduction to the main types of data drift and a hands-on tutorial on using the TensorFlow data validation tool in order to automatically detect data drift in a machine learning system.
Data drift
To fully comprehend the different types of data drifts and potential data problems, let’s start by defining data drift:
“Data Drift is the gradual change in the distribution or characteristics of the input data used by a machine learning model over time. This shift can negatively impact the machine learning model’s performance and result in inaccurate predictions”.
Types of Data Drift
In a machine learning pipeline, there are several types of input data drifts that you should check for, depending…