Steering Clear of Bad Data: Why Monitoring Tools Aren’t Your Learner’s Permit

Sumit Mudliar
Data Quality & Beyond
3 min readJun 22, 2024
https://unsplash.com/photos/brown-ice-cream-cone-52jRtc2S_VE

Just as learning to drive safely begins long before you hit the open road, ensuring data quality starts well before your data enters production systems.

Many organizations rely heavily on data monitoring tools and reactive fixes for bad data in production. While these are important safeguards, they’re not enough on their own — much like how airbags in a car don’t negate the need for proper driver training.

Let’s break this down using the process of becoming a licensed driver:

  1. Learning to Drive (Data Quality Fundamentals)

Before you ever sit behind the wheel of a car, you need to learn the rules of the road, understand traffic signs, and grasp basic vehicle operation. Similarly, data quality begins with establishing clear standards, defining what “good data” looks like for your organization, and educating your team on best practices for data handling and management.

2. Getting a Learner’s Permit (Development Environment Testing)

Once you’ve learned the basics, you practice driving in controlled environments under supervision. This is akin to rigorous data testing during the development phase.

You create test scenarios, validate data transformations, and ensure your systems can handle various data inputs correctly. This stage allows you to identify and fix issues before they impact your production environment.

3. Passing the Driving Test (Pre-Production Validation)

Before you’re allowed to drive independently, you must pass a practical test to demonstrate your skills. In the data world, this equates to comprehensive pre-production validation. You run your data pipelines through realistic scenarios, perform user acceptance testing, and verify that your data quality measures are working as intended.

4. Buying a Car with Safety Features (Implementing Monitoring Tools)

Once you’re a licensed driver, you might purchase a car with modern safety features like airbags, anti-lock brakes, and lane departure warnings. These are analogous to data monitoring tools in production.

They provide an extra layer of protection and can help prevent or mitigate issues, but they’re not a substitute for good driving skills (or in our case, good data practices).

5. Driving on the Open Road (Data in Production)

Finally, you’re on the road, putting all your learned skills into practice. In the data world, this is when your thoroughly tested and validated data pipelines are running in production.

Your monitoring tools are active, helping you catch any unexpected issues, but your primary defense against poor data quality is the solid foundation you built through testing and validation during development.

Real-World Example: The Power of Precise Data

Let’s say you’re a bank relying on customer data to assess creditworthiness and determine loan eligibility. Inaccurate or incomplete data on income, employment history, or credit scores could lead to granting loans to unqualified borrowers.

This could result in loan defaults, financial losses for the bank, and even damaged customer relationships. Conversely, clean and well-maintained data allows you to accurately assess creditworthiness, leading to responsible lending practices, improved risk management, and satisfied customers.

Proactive Measures Beyond Testing

Just like safe driving requires a combination of awareness and defensive techniques, good data quality goes beyond thorough testing. Proactive measures like data lineage tracking help you understand the origin and flow of your customer data, making it easier to identify and fix errors at the source.

Additionally, data governance practices ensure consistent data collection and management across your branches and departments, preventing inconsistencies and errors.

Summary

Just as safe driving relies on a combination of skills, practice, and safety features, good data quality depends on a multi-layered approach.

While production monitoring tools are valuable, they shouldn’t be your first line of defense. By investing in thorough testing during development, proactive data management practices, and a focus on data quality from the ground up, you can catch and correct many issues before they ever reach your production environment, leading to more reliable data and more confident decision-making.

Remember, you wouldn’t hand car keys to someone who’s never had a driving lesson just because the car has airbags. Similarly, don’t rely solely on production monitoring tools to ensure data quality.

Start early, test thoroughly, and build quality into your data processes from the very beginning.

--

--

Sumit Mudliar
Data Quality & Beyond

Transforming ideas into reality through code. Driven by purpose, fueled by curiosity. Always learning and growing.