Member-only story
Follow This Data Validation Process to Improve Your Data Science Accuracy
When training and inference data come from different sources
Table of Contents
- Introduction
- Enabling Data Collection
- Setting a Baseline
- Detecting Outliers
- Summary
- References
Introduction
This article is intended for data scientists who are either beginning or want to improve their current data validation process, serving as a general outline with some examples. First, I want to define data validation here as it can have different meanings for other, similar job roles. For the purpose of this article, we will say that data validation is the process of ensuring the training data used for your model matches or is in line with inference data. For some companies and some use cases, you will not need to worry about this issue if the data is coming from the same source. Therefore, this process must occur and is only useful when data is coming from different sources. Some of the reasons why data wouldn’t be coming from the same source is if your training data is historical and custom-made (ex: features derived from existing data), and/or your inference data is coming…