Maximizing Data Value through a Holistic Approach to Data Quality: Balancing Preventive and Detective Measures

Effective data quality management requires a comprehensive approach that encompasses both preventive and detective strategies. The synergistic effect of combining these two perspectives can have a profound impact on driving positive business outcomes by ensuring the quality and value of data.

Bojan Ciric
The Future of Data
6 min readFeb 9, 2023

--

Illustration by author

Data quality is one of the key capabilities of data management, and it refers to the degree to which data meets certain quality standards such as accuracy, completeness, validity, consistency, and timeliness. Achieving high data quality is essential for organizations that want to make data-driven decisions, as poor quality data can lead to incorrect or misleading insights, negatively impact decision-making, and damage the organization’s reputation.

Holistic data quality management requires a comprehensive approach that considers two perspectives: preventive and detective. By implementing a holistic approach to data quality that considers both preventive and detective measures, organizations can leverage the full potential of their data to drive growth, innovation, and competitiveness.

Illustration by author

Preventive data quality

Data quality preventive perspective aim to prevent data quality issues from occurring in the first place, by implementing processes and controls that ensure the data meets certain quality standards. For example, implementing data validation rules and using data quality tools to check data before it is loaded into a system can help prevent data quality issues.

The key steps in the process are:

  • Define Data Quality Requirements: This step involves understanding the business requirements for data quality and defining clear and measurable standards for data accuracy, completeness, consistency, and timeliness (and any other data quality dimension defined as a standard in data policy). This information should be documented and communicated to all stakeholders.
  • Embed Active Data Quality Checks into DevOps and Agile Processes: This step involves defining and implementing data quality controls that are integrated into DevOps and Agile processes. This may include using data profiling, data validation, data standardization, and data enrichment checks to ensure that data meets the established quality requirements. The checks should be automated where possible and performed in real-time, at various stages throughout the SDLC, within applications, and within data pipelines, to catch and correct data quality issues before they can impact business operations. The goal is to embed data quality into the development and operational processes, leveraging DevOps and Agile methodologies (if applicable), to ensure that data is of high quality from the moment it is acquired, and to continuously improve the data quality process through iteration and collaboration.
  • Ensure Data Quality Through Data Observability: This step involves using data observability techniques to gain insights into the quality of data in data stores and pipelines. This may include using data quality tools and techniques to measure data quality, detect outliers, and understand the behavior and performance of data pipelines. Additionally, this step involves integrating data observability practices into the data quality process, such as logging, tracing, and alerting, to ensure that data quality issues are detected and addressed in real-time. The goal is to leverage data observability to improve the overall visibility and understanding of the data quality process, and to proactively identify and address data quality issues before they impact business operations.
  • Define and Implement Corrective Measures using AI/ML and Automation: This step involves developing and implementing a process to quickly address data quality issues as they are detected, leveraging AI/ML and automation where possible. This may include using AI/ML algorithms to identify and correct anomalies in data, as well as implementing automated processes to validate data and ensure that it meets established standards. The process should be designed to minimize manual intervention and make use of AI/ML and automation to increase efficiency and speed. The goal is to catch and correct data quality issues before they impact business operations, while also maximizing the benefits of AI/ML and automation to improve the overall data quality process.

Data contracts and preventive data quality

Data contract refers to the specification of data, terms, and conditions under which the data producer delivers data to the data consumer. Data contract is autonomous (self-executable) at the time of data delivery. Data itself, along with data contract are integral components of a Data Product concept.

Data contracts are an effective way to enforce preventive data quality measures. The data quality rules specified in the data contract are enforced at the time of contract execution, which helps to ensure that the data meets the agreed-upon quality requirements. Data contracts can be used to enforce data quality standards in data pipelines, data APIs, and data processing systems. This helps to prevent data quality issues from occurring, and to ensure that data is accurate, complete, and consistent before it is used in business processes and decision-making.

Detective data quality

Data quality detective perspective aim to identify data quality issues after they have occurred, so that corrective action can be taken. For example, regularly running data quality reports and audits to identify data quality issues and tracking data quality metrics to monitor the performance of the data pipeline are detective measures.

The key steps in the process are:

  • Perform Initial Data Profiling: This step involves gathering and analyzing basic information about the data, such as the structure, content, and relationships between data elements. The goal is to identify any potential data quality issues early on and to understand the general characteristics of the data. This information can be used to inform the development of data quality rules and to guide future data quality assessments.
  • Define Data Quality Rules: This step involves establishing specific rules and standards for the quality of data, such as acceptable values, ranges, formats, and relationships between data elements. The rules should be aligned with the goals and requirements of the organization and should take into account the results of the initial data profiling. The rules should be clearly communicated to all stakeholders to ensure consistent understanding and application.
  • Perform Data Quality Assessment: This step involves evaluating the data against the defined data quality rules to identify any issues or anomalies. This may include using data quality tools and techniques such as data validation, data profiling, and data standardization. The results of the data quality assessment should be documented and analyzed to determine the extent and impact of the data quality issues.
  • Conduct Root Cause Analysis and Execute Remediation Plan: This step involves identifying the root cause of the data quality issues and developing a plan to address them. This may involve updating data quality rules, modifying data pipelines or processes, or correcting data records. The remediation plan should be executed in a controlled and systematic manner, and the results should be monitored to ensure that the desired improvements in data quality are achieved.
  • Monitor and Control: This step involves continuously monitoring the data and the data quality process to ensure that data quality issues are detected and addressed in a timely manner. This may involve using data quality tools, techniques, and processes to detect and correct issues, as well as implementing data quality metrics to track progress and identify areas for improvement. The goal is to maintain high levels of data quality and to continuously improve the detective data quality process.

Conclusion

Preventive and detective data quality are two complementary approaches to ensuring the quality of data and maximizing its value to the organization. Preventive data quality focuses on preventing data quality issues from occurring by embedding data quality checks into the software development life cycle, applications, and pipelines. Detective data quality focuses on detecting and correcting data quality issues that have already occurred. By combining these two approaches, organizations can achieve a synergistic effect that maximizes the quality of their data and generates positive business outcomes. The synergic effect of preventive and detective data quality can be seen in increased operational efficiency, improved data-driven decision making, increased customer satisfaction, and improved regulatory compliance. By implementing a holistic approach to data quality that incorporates both preventive and detective measures, organizations can leverage the full potential of their data to drive growth, innovation, and competitiveness.

Disclaimer: The views and opinions expressed in this article are those of the author and do not necessarily reflect the views or positions of any entities author represents.

--

--

Bojan Ciric
The Future of Data

Technology Fellow at Deloitte | Data Thinker | Generative AI Hands-on | Converts data into actionable insignts