Data Management Series — 4. Data Quality: An End-to-End View

DP6 Team
DP6 US
Published in
5 min readApr 18, 2023

Introduction

The importance of implementing data quality monitoring processes has already been discussed extensively here, in articles such as Data Engineering for Martech — DataQuality — Data Engineering Series (Part II), Data Lineage: Ensuring Data Quality, and 5 Reasons to Implement Data Quality. However, this article intends to expand the discussion even further, providing a complete view of the implementation of Data Quality solutions. These are not just technical improvements of engineering processes. They benefit all areas that generate, work with, and consume data in some way.

Technical Scope

In the technical scope, which here means the actions of the data engineering team, it is essential to ensure the quality of the data collected to avoid errors and unnecessary maintenance work in the future. In general, the data engineering team is responsible for creating pipelines and processes for extracting, processing and ingesting data. It is important that the team pays attention to data quality at all times, to make sure they deliver high quality data to their consumers.

The engineering team can monitor, identify, and proactively act on data-related issues using a good data quality process. For example, if the engineering team is responsible for extracting data from an API and storing it in a database for consumption within the company, a rigorous data quality and monitoring process can identify errors and correct them in the shortest possible time. Without this monitoring and validation process, the engineering team will continue to store incorrect or inconsistent data indefinitely, until one of the consumers of the data alerts them to the error.

If an error is only spotted by an analyst late in the process, it leads to questions about the reliability of the data already consumed, and a lot of work may be necessary to correct it.

For example, imagine that an analyst has consumed data from the previous 5 months and then realizes that the data is incorrect. The engineering team would need to correct it before processing the 5 months of data again. This results in an exhaustive process of maintenance and correction monitoring, wasting computational resources and staff time that could be used for the development of new functionalities or tasks that generate more value to the business.

Analytical Scope

The analytical scope is defined here as the area responsible for preparing dashboards, reports and analyses that contribute to decision-making in the business. The analytical scope benefits from the data quality initiative via the reliability of the data used.

In the analytical scope, data is the raw material for generating value for the business, that is, it is the input for all the analyses and reports that are created. The analyst must have confidence in the accuracy and representativeness of the data. A common error involves incomplete or missing data.

Imagine that an analyst needs sales data for a 5-month period but only finds data for 3 months, as there was a failure in the collection process for the other 2 months. Even though the error is apparent, it is a source of frustration to those performing the analysis. They will have to ask the engineering team to make corrections in data processing. The related deliveries will be delayed as they wait for this validation and implementation. If there are many errors, the team of analysts may lose confidence in the data provided, with doubts about the processes used, and how representative the data is.

Business Scope

In the business scope, or the area responsible for using the data, a failure to implement data quality processes may result in analyses that are biased, or do not consistently reflect the reality of the business.

For example, imagine that a team of analysts generated a report to understand consumers’ purchasing behavior over the previous 5 months, to determine which products should be favored in ads and which ones should be removed from stock. Let’s say there was a failure during data processing, and data from consumers who purchased at night was not processed. The volume of data would still be significant, so the analysis team might not notice the data loss and proceed with the analysis.

The analysis would be incorrect as it would disregard the entire public that bought items overnight. The company’s decision makers would base their decisions solely on daytime purchasing behavior and ignore behavior at night, as it was not included in the data. Therefore, it is essential for the engineering team to implement data quality processes to avoid these bias issues and ensure that business decisions are based on accurate and reliable information.

Conclusion

Establishing data quality validation processes and routines is necessary to ensure the reliability of data used in business analysis. In addition, it is important to implement monitoring and error alerts for all steps of the process, so that problems can be identified as quickly as possible, to facilitate data maintenance and reprocessing.

It is crucial to remember that the reliability of data depends not only on the accuracy of the data, but also on trust in the engineering processes that alert users to potential failures. Implementing a process for Data Quality and error identification helps to avoid analyses that are based on incorrect or incomplete data, which often lead to biases and analytical errors that affect decision-making in the business.

Finally, the adoption of good Data Quality practices also increases the reliability of data as raw material for the analytical team, alleviating the concerns of those who consume the data, and promoting trust among the company’s stakeholders.

We have produced a complete e-book on the subject. Download here and learn more!

Profile of the author: Lucas Tonetto Firmo | A Computer Engineer graduate of Universidade São Judas Tadeu with an MBA in AI and Big Data from USP, Lucas is passionate about Technology and its ability to transform society’s way of life. He worked for two years developing websites and web applications and is currently a Data Engineer at DP6.

Profile of the author: Angélica Fatarelli | With a Bachelor’s in Information Systems and an MBA in Data Science, she worked for many years with software development. Today she works in the world of Data Engineering, bringing technological solutions to Digital Marketing with DP6.

--

--