TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial…

Member-only story

DATA | MACHINE LEARNING | QA

Upholding Data Quality in Machine Learning Systems

A recommendation on the unseen cornerstone of Machine Learning

David Farrugia
TDS Archive
Published in
4 min readJun 29, 2023

--

Photo by Battlecreek Coffee Roasters on Unsplash

In the dazzling world of machine learning (ML), it’s quite effortless to get engrossed in the thrill of devising sophisticated algorithms, captivating visualisations, and impressive predictive models.

Yet, much like the durability of a building depends not just on its visible structure but also its hidden foundations, the effectiveness of machine learning systems pivots on an often-overlooked but entirely crucial aspect: the data quality.

The Imperative of Upstream Data Quality Assurance

Think of your ML training and inference pipelines as the journey of a steam train.

It’s critical to maintain the health of the train itself — the ML system — but what if the tracks are compromised?

If the quality of data feeding your system is not ensured upstream, it’s akin to a damaged rail track — your train is destined to derail, sooner or later, especially when operating at scale.

Therefore, it’s paramount to monitor data quality from the get-go, right at the source.

--

--

TDS Archive
TDS Archive

Published in TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

David Farrugia
David Farrugia

Written by David Farrugia

Data Scientist | AI Enthusiast and Researcher | Talks about Python, AI, and Data. Get in touch — davidfarrugia53@gmail.com

No responses yet