Member-only story

Follow This Data Validation Process to Improve Your Data Science Accuracy

When training and inference data come from different sources

Matt Przybyla
Towards Data Science
8 min readSep 1, 2023

--

Photo by NordWood Themes on Unsplash [1].

Table of Contents

  1. Introduction
  2. Enabling Data Collection
  3. Setting a Baseline
  4. Detecting Outliers
  5. Summary
  6. References

Introduction

This article is intended for data scientists who are either beginning or want to improve their current data validation process, serving as a general outline with some examples. First, I want to define data validation here as it can have different meanings for other, similar job roles. For the purpose of this article, we will say that data validation is the process of ensuring the training data used for your model matches or is in line with inference data. For some companies and some use cases, you will not need to worry about this issue if the data is coming from the same source. Therefore, this process must occur and is only useful when data is coming from different sources. Some of the reasons why data wouldn’t be coming from the same source is if your training data is historical and custom-made (ex: features derived from existing data), and/or your inference data is coming…

--

--

Towards Data Science
Towards Data Science

Published in Towards Data Science

Your home for data science and AI. The world’s leading publication for data science, data analytics, data engineering, machine learning, and artificial intelligence professionals.

Matt Przybyla
Matt Przybyla

Written by Matt Przybyla

Sr/MS Data Scientist. Top Writer in Artificial Intelligence, Technology, & Education. Towards Data Science. Subscribe: https://datascience2.medium.com/subscribe

No responses yet