Interwoven Realms: Data Governance as the Bedrock for AI Governance

By Stefaan G. Verhulst and Friederike Schüür

Data & Policy Blog
Data & Policy Blog

--

In a world increasingly captivated by the opportunities and challenges of artificial intelligence (AI), there has been a surge in the establishment of committees, forums, and summits dedicated to AI governance. These platforms, while crucial, often overlook a fundamental pillar: the role of data governance. As we navigate through a plethora of discussions and debates on AI, this essay seeks to illuminate the often-ignored yet indispensable link between AI governance and robust data governance.

The current focus on AI governance, with its myriad ethical, legal, and societal implications, tends to sidestep the fact that effective AI governance is, at its core, reliant on the principles and practices of data governance. This oversight has resulted in a fragmented approach, leading to a scenario where the data and AI communities operate in isolation, often unaware of the essential synergy that should exist between them.

This essay delves into the intertwined nature of these two realms. It provides six reasons why AI governance is unattainable without a comprehensive and robust framework of data governance. In addressing this intersection, the essay aims to shed light on the necessity of integrating data governance more prominently into the conversation on AI, thereby fostering a more cohesive and effective approach to the governance of this transformative technology.

Six reasons why Data Governance is the bedrock for AI Governance.

1. Data governance covers the full data lifecycle, of which Artificial Intelligence is a part

  • Artificial intelligence is an intrinsic part of the data lifecycle: The data lifecycle, from planning to collection to deletion, is overseen by data governance. AI governance is concerned with the lifecycle of AI systems, including development, deployment, monitoring, and retirement. Effective AI governance relies on robust governance across the data lifecycle to ensure the reliability and relevance of AI systems. Data governance provides the foundation upon which AI governance can be built.

2. Data governance enables the development of responsible, fit-for-purpose AI systems.

  • Responsible Data Availability: the majority of AI systems require data for development. Data governance plays a critical role in making data available in a responsible manner. This includes determining what data should be made available, to whom, and under what conditions, including decisions to make data open.
  • Data Quality and Integrity: poor data quality can lead to biased or inaccurate AI outputs. The effectiveness of AI models largely depends on the quality of data they are trained on. Data governance ensures the quality, accuracy, and integrity of data.
  • Data Standards and Interoperability: the majority of AI systems require large training data sets. Data governance puts in place data standards. Standards are the foundation for data interoperability, the ability to create large data sets from many smaller ones.
  • Data Appropriateness and Representativeness: any AI system deployed in context will have a user base. If the data of the AI system is not representative of that user base, the AI system may work less well and may disadvantage some. Data governance ensures that the required information to assess the appropriateness and representativeness of data can be assessed prior to AI system development and deployment.

3. Data governance takes care of issues that AI systems would otherwise inherit.

  • Compliance and Regulatory Adherence: a key role of data governance is to ensure that data handling complies with privacy and other data protection laws and industry regulations. Lack of data governance means AI systems may fail to comply with relevant laws and regulations. AI systems may “launder” data.
  • Risk Management: a key role of data governance is to identify and mitigate risks related to data privacy, security, and compliance. Lack of data governance means AI systems inherit these risks.

4. Data governance is required to establish a social license for AI systems.

  • Stakeholder Engagement and Transparency: data governance emphasizes the importance of transparency and the involvement of a diverse range of stakeholders to foster trust and understanding across the entire data lifecycle. While AI governance emphasizes stakeholder engagement and transparency just the same, engagement is focused on AI systems development and use.
  • Consent and Social License for Data Re-Use: data governance ensures that data is collected and used in a manner that respects the consent of the individuals it pertains to. This is crucial for maintaining public trust, especially when data is reused in different contexts, including for AI development. Social license, or the ongoing acceptance of a company or industry’s standard practices by its employees, stakeholders, and the general public, is a key aspect. Proper data governance ensures that the reuse of data, especially in sensitive areas like AI, is aligned with societal expectations and ethical standards.
  • Dispute Resolution and Redress: data governance frameworks include mechanisms for dispute resolution and redress. This is particularly important in AI, where decisions made by algorithms can have significant impacts on individuals. Ensuring there are clear processes for individuals to challenge and seek redress for decisions made by AI systems, which are based on data, is a key aspect of responsible AI deployment.

5. Data governance is technology-agnostic, and thus more holistic in nature

  • Technology-agnostic: data governance applies to any form of technology that collects, uses or processes data. This means data governance offers a more holistic and adaptable framework that can evolve with changing technologies. AI governance is more specialized, narrow, and technology-bound — it focuses on the ethical, legal, and societal implications of artificial intelligence.
  • Common Foundation: As such, data governance offers the opportunity to provide a common foundation for other data-driven technologies such as Internet of Things and neurotechnology.

6. The implementation, standardization and codification of data governance provide valuable lessons for AI governance

  • Standards on use and interpretation: Data governance frameworks set policies and standards around the world on how data is interpreted and used, especially in making decisions that affect individuals or groups. It is important that insights are used in a way that is fair, unbiased, and respects privacy.
  • Due diligence policies and standards: Data governance sets standards and policies for data due diligence, including the validation of data sources, ensuring data is up-to-date, and using appropriate statistical methods.
  • Applications to AI: AI governance can build on lessons learned from data governance, for example, by setting standards for quality assurance and reporting requirements of AI models including AI model validation techniques and results. Many point to the urgency of putting in place AI governance. Leveraging data governance can accelerate that process.

Conclusion

In summary, effective AI governance cannot exist without robust data governance. Data governance not only provides the necessary infrastructure and guidelines for effective data management but also ensures that these data practices align with ethical standards, legal requirements, and societal expectations. This becomes increasingly important as organizations leverage data for AI and other advanced analytical purposes.

Data governance and AI governance are interwoven. Yet, in public discourse on AI governance, we note a frequent failure to link data to AI governance. This failure may slow the pace of the development of meaningful governance of AI (which some participants in the public discourse on AI governance may have an interest in). It reduces our ability to effectively leverage protections that are already in place, such as national or regional data protection laws and regulations, to address AI risks and potential harms. It puts at risk the development of responsible, fit-for-purpose AI systems. It may contribute to widening inequality; AI system development critically depends on data availability, which is highly asymmetric today.

A more integrated approach, which combines the broad principles of data governance with the specific requirements of individual technologies like AI, IoT, etc., offers a more balanced and effective governance structure. This integrated approach would ensure that while the unique aspects of each technology are addressed, there is also a consistent and overarching framework guiding data-related practices and decisions across all technologies.

Post Script: A call to strengthen Data Governance and Stewardship

Understanding the intrinsic dependence of AI governance on robust data governance, it becomes imperative to amplify global efforts in strengthening data governance frameworks and enhancing the practice of data stewardship. This realization is not just a call to action but a clarion call for a concerted, worldwide initiative to elevate data governance to a level where it can effectively support and shape AI governance. Stronger data governance mechanisms ensure that data, the lifeblood of AI systems, is managed responsibly, ethically, and transparently, thereby laying a foundation for AI systems that are trustworthy and aligned with societal values. Improving data stewardship involves cultivating a culture where data is not only seen as a resource but also as a responsibility, with a focus on its ethical use, protection, and equitable access. As we embrace this interconnectedness, our efforts in fortifying data governance and stewardship will not only benefit AI systems but will also contribute to a more resilient and ethical digital ecosystem, essential for the sustainable progress of technology in society.

About the authors

Dr. Stefaan G. Verhulst is Editor-in-Chief of Data & Policy. In addition, he is co-founder of The GovLab (New York) and The DataTank (Brussels). He is also a Research Professor at New York University.

Dr. Friederike Schüür is Chief of Data Strategy and Data Governance at UNICEF, where she lead efforts to guide the use of data and data technologies to advance human and child rights across UNICEF representation in over 190 countries, including global data strategy and policy development, workforce development for the digital and data future, and advocacy.

****

This is the blog for Data & Policy (cambridge.org/dap), a peer-reviewed open access journal exploring the interface of data science and governance. Read on for five ways to contribute to Data & Policy.

--

--

Data & Policy Blog
Data & Policy Blog

Blog for Data & Policy, an open access journal at CUP (cambridge.org/dap). Eds: Zeynep Engin (Turing), Jon Crowcroft (Cambridge) and Stefaan Verhulst (GovLab)