Misuse versus Missed use — the Urgent Need for Chief Data Stewards in the Age of AI

Stefaan Verhulst & Richard Benjamins

Published in

Data & Policy Blog

8 min readJun 13, 2024

In the rapidly evolving landscape of artificial intelligence (AI), the need for and importance of Chief AI Officers (CAIO) are receiving increasing attention. One prominent example came in a recent memo on AI policy, issued by Shalanda Young, Director of the United States Office of Management and Budget. Among the most important — and prominently featured — recommendations were a call, “as required by Executive Order 14110,” for all government agencies to appoint a CAIO within 60 days of the release of the memo.

In many ways, this call is an important development; not even the EU AI Act is requiring this of public agencies. CAIOs have an important role to play in the search for a responsible use of AI for public services that would include guardrails and help protect the public good. Yet while acknowledging the need for CAIOs to safeguard a responsible use of AI, we argue that the duty of Administrations is not only to avoid negative impact, but also to create positive impact. In this sense, much work remains to be done in defining the CAIO role and considering their specific functions. In pursuit of these tasks, we further argue, policymakers and other stakeholders might benefit from looking at the role of another emerging profession in the digital ecology–that of Chief Data Stewards (CDS), which is focused on creating such positive impact for instance to help achieve the UN’s SDGs. Although the CDS position is itself somewhat in flux, we suggest that CDS can nonetheless provide a useful template for the functions and roles of CAIOs.

We start by explaining why CDS are relevant to the conversation over CAIOs; this is because data and data governance are foundational to AI governance. We then discuss some particular functions and competencies of CDS, showing how these can be equally applied to the governance of AI. Among the most important (if high-level) of these competencies is an ability to proactively identify opportunities in data sharing, and to balance the risks and opportunities of our data age. We conclude by exploring why this competency–an ethos of positive data responsibility that avoids overly-cautious risk aversion–is so important in the AI and data era.

Why AI Needs Chief Data Stewards: Data and Data Governance as Foundational

While much attention is today devoted to the outputs of AI (generative AI in particular), it is equally essential to pay attention to the inputs — i.e., data. Data has been described as the “oxygen of AI.” It therefore follows that data and data governance form the bedrock of AI and AI governance. Many of the most pressing and problematic issues surrounding the field concern data: copyright, value extraction, privacy, bias, and much more. In addition, many AI initiatives use or generate data without considering the need for data sharing and data reuse; in fact, given the growing value of data for AI as well as copyright concerns, there is a risk that data will be increasingly privatized and locked away in silos where it cannot be used or reused for the public good.

For all these reasons, we argue that Chief Data Stewards (CDS), who play an essential role in fostering a data ecology that serves the public good, can play a valuable role in defining the role of CAIOs, and more generally in helping to define the contours of responsible AI. At a broad level, CDS are essential for advancing the value of data re-use in a responsible, systematic and sustainable manner, and for promoting responsible AI. They play a pivotal role in ensuring that data practices align with ethical standards, legal requirements, and the overall strategic goals of any organization that collects, stores, or re-used data. Given the centrality of data to AI, these are all valuable–even critical– tasks.

Here are some more specific data-related values and goals that CDS can help manifest:

Data Governance: CDS oversee the implementation of robust data governance frameworks. They ensure that data is managed, protected, and utilized in compliance with regulations and best practices, thus mitigating risks associated with data breaches and misuse.
Data Quality and Integrity: Ensuring high data quality is crucial for reliable AI outputs. Chief Data Stewards establish processes for maintaining data accuracy, consistency, and completeness, all of which are vital for training effective AI models.
Ethical Data Use: CDS promote the ethical use of data, ensuring that data collection, processing, and sharing practices respect individual privacy and consent. This is particularly important in building public trust in AI technologies.
Data Access and Reusability: CDS advocate for data accessibility and reusability. They work towards creating systems that allow for secure and efficient data sharing, fostering innovation and collaboration across different sectors.
Bridging Data Gaps: By identifying and addressing gaps in the data used for AI, CDS ensure that AI systems are trained on comprehensive and representative datasets. This helps in reducing biases and improving the accuracy and fairness of AI systems.
Value Realization: CDS help organizations realize the full value of their data assets by identifying new opportunities for data-driven insights and innovations, thereby driving business growth and competitive advantage.
Stakeholder Collaboration: CDS facilitate collaboration between various stakeholders, including data providers, data users, and regulatory bodies. This ensures that data initiatives are aligned with broader organizational and societal goals.

Key Competencies of a Data Stewardship Function

If the above outlines broad areas or values that CDS can help nurture, then the below includes a more detailed description of core competencies required of CDS to help achieve these goals. Together, these “clusters of competencies” add up to a job description of sorts for CDS; they are essential for governance at the intersection of AI and data.

1) Strategic Data Auditing and Assessment: This role requires that CDS steward AI initiatives for and in the public interest to monitor and assess the value, potential, and risk of data within an organization. Specific responsibilities within this cluster include: helping formulate and determine policy questions related to AI data initiatives; scoping and iterating assessments of “minimum viable” data needed for a particular initiative or project; identifying and documenting assets; considering the ethical and fundamental rights implications and other risks of using, or not using, data; and helping establish operational, technical and governance models to validate ways to measure impact of an AI project.

2) Establish Partnerships and Community Engagement: Data stewards also play a valuable function in stewarding relationships, both within the data ecosystem and in society more broadly. They must reach out to and vet potential partners for data and AI projects, and generally serve as points of contact regarding reuse of data. The work of CDS is engaged with users of data products and insights, and helps establish social licensing for data reuse through community engagement and deliberations. As part of this work, CDS play an important role in informing data agreements and other contractual relationships.

3) Internal Coordination: In order to manage data operations and data-driven AI initiatives, data stewards must manage internal resources, expertise and centers of authority. This responsibility focuses on internal relations to, for instance, gain approval from actors within the organization and coordinate with them to ensure that all stakeholders and organizational leaders are informed and aligned on projects. Stewards also help to establish data operations within organizations to map and match internal resources, expertise, and skills in order to enable data collaboration and data reuse to enable AI projects.

4) Nurture Data Collaboratives to Sustainability: Data collaboratives play a major role in enabling data sharing and data reuse for AI. Data stewards play a major role in fostering data collaboratives by working with stakeholders to gather necessary resources and in supporting broad, long-term impact and the sustainability of data collaboratives. Among other responsibilities, this role involves institutionalizing data innovation to make the reuse of data systematic; developing the business case to scale and sustain data innovation; and measuring impact and sharing insights to build a societal and business case for data collaboration to help foster more responsible — and effective — AI.

5) Disseminate and Communicate Findings: Data stewards act as the face of a company’s data projects and are responsible for communicating outcomes from data-driven AI initiatives to external actors. In this role, they are stewarding insights, and their responsibility is to raise awareness with users, partners, governments, and other stakeholders. They also communicate with actors on issues such as regulatory compliance and contractual obligations and help translate data intelligence into decision intelligence.

6) Proactively Contribute to the Public Good: At a more abstract–yet critical–level, the job of data stewards consists of positively identifying opportunities to use (or reuse) data in service of the public good. This competency refers as much to a specific function as to a more general ethos, a way of operating and an emphasis on responsibly balancing both risk and opportunity when it comes to data use and reuse. As we explain below, a determination to proactively identify opportunity and steer away from risk avoidance is perhaps one of the most valuable templates that CDS can offer for CAIOs; they provide powerful models for how to use data so as to positively contribute to the public good in the AI era.

Conclusion

We live during a paradoxical time of both data plenty and data scarcity, the latter marked by growing data silos and privatization. As we have elsewhere argued, the advent of AI may only heighten these trends. Growing recognition of the value of data, combined with fear over unauthorized reuse and inadvertent copyright violations, may lead data holders to tighten their grips in the digital ecology. This represents a missed opportunity–for society at large, and for humanity. Despite the undeniable risks of data sharing and of emerging technologies such as AI, technology continues to harbor tremendous opportunities for public good.

The role of CAIOs is essential, and will be instrumental in navigating the thin line that passes between risk and opportunity. In this task, CAIOs have much to learn from the experience of CDS. CDS are the ones tasked in today’s data economy with finding a balance between data misuse and missed use. This is precisely the challenge facing CAIOs today: how to steer away from excessive risk avoidance, and how to define a notion of responsible AI that is broad enough to encompass proactive solutions for the public good. In the end, it is not only our responsibility to ensure an ethical use of AI and data, but also to avoid not using these technologies to solve large societal challenges. Ethics is a double-edged sword: using AI without it is unethical, but not using it for public good when it is safe to use, is also unethical.

The experience of CDS is not flawless in this regard; mistakes have certainly been made. But it is precisely these mistakes (as well as the successes) that provide a useful template for CAIOs. By building on the experience–both the hits and misses–of CDS, CAIOs can help ensure that the age of AI is marked by a positive and proactive use of technology and data to further the public good.

About the Authors

Stefaan Verhulst is the co-founder of The Governance Lab and The DataTank, and Editor-in-Chief of Data & Policy (Cambridge University Press)

Richard Benjamins is Former Chief Responsible AI Officer at Telefonica and founder of its “AI for Society and Environment” area, including the ethical use of AI. Before that he was the company’s Chief AI & Data Strategist. He is the co-founder and Vice President of the Spanish observatory for ethical and social impacts of AI (OdiseIA) and board member of the environmental non-profit CDP Europe.

***

This is the blog for Data & Policy (cambridge.org/dap), a peer-reviewed open access journal published by Cambridge University Press in association with the Data for Policy Community. Interest Company. Read on for ways to contribute to Data & Policy.

Misuse versus Missed use — the Urgent Need for Chief Data Stewards in the Age of AI

Stefaan Verhulst & Richard Benjamins

Written by Data & Policy Blog