We are Developing AI at the Detriment of the Global South — How a Focus on Responsible Data Re-use Can Make a Difference

By Stefaan G. Verhulst and Peter Addo

Data & Policy Blog
Data & Policy Blog
7 min readOct 7, 2024

--

Recent research has underlined existing inequalities in artificial intelligence (AI) data production between developed and less-developed countries. An article in The Conversation demonstrated that while most data originates from wealthy countries, much of the less-valued quality-control labour is being outsourced to lower-income countries.

Further to these imbalances, attention is now shifting from the original excitement surrounding AI’s societal effects and impact in developed countries, to how AI might influence developing nations. This includes their efforts to achieve sustainable development goals, highlighting both the potential benefits and challenges AI presents in these contexts. The Economist recently ran a cover story extolling the potential of AI to help lower income countries in sectors such as education, healthcare, and agriculture. On the other hand, various commentators have expressed concerns that AI could cause a number of harms in the Global South.

At the root of this debate runs a frequent concern with how data is collected, stored, used — and responsibly reused for other purposes that initially collected for. For instance, data collected from satellite imagery and sensors can be reused to monitor environmental changes, such as deforestation, air and water quality, or the impact of climate change. Telco or social media data, collected for communication purposes, can be reused in disaster response scenarios to track the movement of people, identify areas in need of urgent assistance, and coordinate relief efforts more effectively.

Data is the lifeblood of AI, and data responsibility is therefore central to the goals of safe and inclusive AI, and especially to ensuring that the developing world shares in the fruits of technical innovation. As we have elsewhere written, one of the most important components of advancing the use of emerging technologies ethically is a participatory framework for responsible data reuse and sharing.

Responsible reuse of public and private data can break down silos. In Moldova open contracting data was used to improve access to HIV and tuberculosis medicine, in a country that has one of the highest patient rates in Europe. Access to mobile phone data has, for instance, been re-used foster intersectoral collaboration and innovation, like harmonizing access to public transport and private ride-sharing initiatives to limit travel time and reduce car usage. But reuse carries its own risks, especially to user privacy and security.

In this article, we propose that promoting responsible reuse of data requires addressing the power imbalances inherent in the data ecology. These imbalances disempower key stakeholders, thereby undermining trust in data management practices. As we recently argued in a report on “responsible data reuse in developing countries,” prepared for Agence Française de Development (AFD), power imbalences may be particularly pernicious when considering the use of data in the Global South. Addressing these requires broadening notions of consent, beyond current highly individualized approaches, in favor of what we instead term a social license for reuse.

In what follows, we explain what a social license means, and propose three steps to help achieve that goal. We conclude by calling for a new research agenda — one that would stretch existing disciplinary and conceptual boundaries — to reimagine what social licenses might mean, and how they could be operationalized.

Power Imbalances

There are a number of imbalances in power and influence among different stakeholders in the data ecology, including the manner in which data is managed and used. Larger players or those from more affluent regions have larger budgets and expertise, as well as larger computational power, to access and work with data. These imbalances are always present in the data ecology but take on particular significance when data is repurposed for other uses than those for which it was originally collected. In such cases, the original data subjects frequently lack the ability to influence or even be aware of secondary uses, and data is at risk of being utilized in ways that disproportionately benefit a few, or cause harm to the original data subjects.

These risks, it should be added, are particularly pronounced in the context of developing countries of Asia, Africa and Latin America. This is partly the case because of power imbalances between the Global South and governments and companies in the Global North. But also, vast asymmetries exist within Global South countries themselves, requiring close attention to the way data is collected, used, and reused by governments that profess to speak on behalf of the people.

The Need for a Social License

In theory, consent offers a mechanism to mitigate power imbalances. The truth, though, is that existing consent mechanisms are limited, in many respects archaic. They are based on binary distinctions (typically presented in check-the-box forms that most websites use to ask you to register for marketing e-mails) that fail to appreciate the nuance and context sensitive nature of data reuse. In addition, consent today generally means individual consent; the notion overlooks the broader needs of communities and groups. While we understand the need to safeguard information about the individual (such as health status), on a societal level this information can be important to inform, address or prevent health crises. Such individualized notions fail to consider the potential public good of reusing individual data (responsibly), a dynamic that is particularly problematic in societies — many located in the Majority World — that have more collective orientations, and where prioritizing individual choices could disrupt the societal fabric.

For all these reasons, we advocate shifting to the notion of a social license. This notion, which has its roots in the 1990s within the extractive industries, refers to the collective or societal acceptance of an activity — e.g., data reuse or sharing — based on perceived alignment with community values and interests. Social licenses transcend the priorities of individuals, and help balance the imperatives of data misuse and missed use (e.g., privacy violations vs. the risks of not using private data for the public good). As we have elsewhere noted, social licenses permit a broader notion of consent that is dynamic, multifaceted, and context-sensitive.

How to Establish a Social License

Social licenses represent an opportunity to manage the data more equitably, involving different concerned parties in their implementation. This first requires acceptance of the concept among policymakers, citizens, and other stakeholders, like private industry, health providers, think tanks like our, but any other interest group as well, and establishing widespread consensus on community norms and an acceptable societal balancing of risk and opportunity.

Community engagement would provide the establishment of a consensus-based foundation regarding preferences and expectations concerning data reuse. This would be done through, for instance, dedicated “data assemblies” or community deliberations about data re-use for particular purposes under particular conditions. This engagement would need to involve voices that are as representative and equitable as possible of the different parties concerned across communities, interest groups, social layers and geographies, including traditionally marginalized or silenced voices.

Further to community engagement, buy-in and involvement from the legal and policy community will be necessary to translate these collective preferences and norms into actionable and enforceable instruments and mechanisms. This critical step will require innovative approaches to develop ways of framing, and vehicles to contain, novel governance functions. As part of this, a dedicated interdisciplinary research agenda across data, policy, legal, social and other sciences will help understand the full landscape, and provide a link between the theory of social licensing and its practical implementation.

Finally, implementing social licensing requires new methods to ensure accountability and enforce legal and administrative mechanisms or instruments. This most likely requires institutional innovation, for example by highlighting the role of data stewards or other bodies and individuals tasked with responsibly promoting data sharing. Increasingly we’ve seen calls for the role of Chief AI Officer (CAIO) that can lead and oversee adoption and integration of AI technologies to drive innovation, efficiency and competitive advantage. We believe this scope is too limited. There should be a dedicated role on all levels and in all fields of data collection, public and private, that should not only identify how data can be used in the organization itself, but can also identify other opportunities based on other data available, on data that should be accessible, but also on own data that can be shared, not only for profit but also for the public good: a Chief Data Steward. This is new packet of responsibilities and we’re currently investing time in developing a training program for and already training data stewards that can work in their own organization.

About the Authors

Stefaan G. Verhulst is the co-founder of The Governance Lab and The DataTank, and one of the Editors-in-Chief of Data & Policy.

Peter Addo is Head of Emerging Tech Lab, Agence Française de Développement

***

This is the blog for Data & Policy (cambridge.org/dap), a peer-reviewed open access journal published by Cambridge University Press in association with the Data for Policy Community. Interest Company. Read on for ways to contribute to Data & Policy.

--

--

Data & Policy Blog
Data & Policy Blog

Published in Data & Policy Blog

This is the blog for Data & Policy (cambridge.org/dap), an open access journal for the impact of data science on governance. Editors-in-Chief: Zeynep Engin (UCL, Data for Policy), Jon Crowcroft (Cambridge, Turing Institute), Stefaan Verhulst (GovLab, NYU). Published by CUP.

Data & Policy Blog
Data & Policy Blog

Written by Data & Policy Blog

Blog for Data & Policy, an open access journal at CUP (cambridge.org/dap). Eds: Zeynep Engin (Turing), Jon Crowcroft (Cambridge) and Stefaan Verhulst (GovLab)

Responses (1)