How Google, Uber, and Amazon Ensure High-Quality Data at Scale

How 3 of the largest tech companies approach data quality

Kevin Babitz
The Startup


Photo by Charles Forerunner on Unsplash

All high-performing businesses should be leveraging data to make decisions. So much data is available to us and there are so many models and techniques that allow us to leverage it to make informed decisions that add value to customers and our businesses.

With this increased value and reliance on data, we must also consider the potential problems that arise with collecting and using this data to train models and inform decisions. You’ve probably heard it a thousand times: “garbage in equals garbage out.” If we train models with bad data, they will give us bad outputs.

How can we design systems to ensure high-quality data? How can these systems scale to handle millions of rows of data every day? How do we design systems that work across the diverse range of departments and datatypes at our company?

Some of the largest tech companies in the world publicly publish their approach to these problems. This repository is a great resource to read papers about the approach successful tech companies are taking to tackle various tech problems (including data quality).

In this article, we will cover how Google views the data quality problem as four…



Kevin Babitz
The Startup

Data Scientist | MSE in Data Science at University of Pennsylvania (May 2021)