A principal part of data engineering is to provide valid data to users. When ingesting data from APIs, files, or other data feeds, it is essential to check that the data conforms to what you expect.

It is difficult to trust an application if it supplies questionable data. Same goes for a predictive analysis that relies on questionable data. Such is the saying “garbage in, garbage out”. When we have data verification checks in place, we can maintain (with confidence) that our data at scale is good (and getting better) rather than considering quality as an afterthought.

Image for post
Image for post
Photo by Pixabay from Pexels

So how can we improve the quality of data?

Let’s dive into…

When we talk about data from a software engineering perspective, we often talk about how big it is: the number of events (volume), the rate at which it is generated (velocity), the different formats it comes in (variety). These terms are especially helpful for boasting about the power of our machines.

But, more important than all of those things is the actual usefulness (value) of the data to the people consuming it. In other words, the power of humans to drive insight and perform actions as a result of the work the machines are doing.

Image for post
Image for post
Photo by Rishi from Unsplash

In our case at a…

Wyatt Shapiro

“data” person

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store