Big Data, Meager returns?

(post-workshop report)

Yasodara Cordova (digital HKS, Berkman Klein Center)
Lorrayne Porciuncula (OECD, Berkman Klein Center)
Henri Brebant (MPP ’20, Harvard Kennedy School)

Accurate, structured, and abundant data are the main source of successful Artificial Intelligence platforms. These platforms can lead to an increase in production and economic gains, better management of resources, and improved public policies. However, disparities continue to widen between the North and the Global South.

On October 12th, we brought together specialists, academics and activists to investigate the main points arising from the “Data, Artificial intelligence and the Global South” conversation. The purpose of the workshop was to explore economic fairness and the Global South. This post shares the questions we asked, some of our findings, and what we need to consider next.

There is no easy fix to the asymmetries existing between developing and developed countries. The role of technology in mending these asymmetries, while worsening others, needs to be better understood. In this respect, the rise of Artificial Intelligence (AI) comes as an opportunity to rethink policy frameworks so that the social and economic dividends from big data related technologies can be better distributed.

With this in mind, researchers and entrepreneurs convened at a workshop co-organized by digital HKS and the Berkman Klein Center to discuss possible solutions and frameworks needed to rethink the relationship between data, AI, and their potential returns for the Global South.

Yasodara Cordova and Lorrayne Porciuncula opened the workshop [presentation] with the following key questions to set the scene for discussions:

“Can we (re)frame the concepts of economic fairness, sustainability and development in the Global South in a way that better involves big data and AI? Subsequently, how can public policies reconcile the interests of private companies, access to big data for development and local development of digital businesses? Can sustainable AI practices drive innovation and economic development in the Global South? Which frameworks can help map concrete solutions for a data-led inclusive economic growth in the Global South?”

Keynote speaker Tom Lee [link to speech] shared how MapBox paved the way for conversations on governance models for data as a public good and benefits from making data accessible so that companies can generate value from it. In his keynote, Tom addressed the Landsat case, where after aerial data from the Landsat project became open in 2008, the use of data skyrocketed and unlocked many new use cases for it. Tom stated that data is about humans, but that with changes in technological costs for data collection and further automation, the landscape for usage and value-generation will change. Ultimately, the questions we should be asking, according to him are:

“What knowledge will we need, how will we produce it, preserve it, share it?”
(image from: https://blog.mapbox.com/keeping-naip-free-open-cf9ad9d310be)

So, what knowledge? That’s what our panel on data typology aimed to explore.

And, how will we produce it, preserve it, share it? That’s what our panel on frameworks for data governance discussed.


Data Typologies

(1st panel)

Is it possible to label data as “non-related to human-derived activities”, such as those from energy, water, road, infrastructure networks, natural resources? Which type of data can be explored as a common strategic resource for the Global South?

Felipe Heusser (Rhinobird/Ciudadano Inteligente) presented a view on the asymmetries between the Global South and the Global North when it comes to data access and information. [presentation]. While the Global South’s history of technological dependency is not new, in his view AI presents new challenges, as most developing countries do not have the same level of capabilities to build platforms, collect data and compete with incumbent tech companies. Still, for him, AI could boost productivity and growth in the Global South, but in order to achieve that a more distributive model is needed.

Consider data as a commodity, Felipe asked:

“Should a fee be paid for data usage? Which regulation could ensure that data is available to local industries and communities? How to mitigate the inequalities created? Should those extracting data (as a commodity) compensate communities by investing in the creation of formal wealth in local industry, culture, sports or other social goods?”

Data access is key to the work made by Elizabeth Christoforetti (Harvard Graduate School of Design). She presented the work developed at Supernormal [presentation] with data to improve public policies at the municipal level. She recognised that to build any data commons, collaboration is necessary. Ideal frameworks for civic participation, both physical and online, are yet to be developed, and that will happen if we have a minimum o data available to shape those spaces. For Elizabeth, the key to a civic technology value system are risk mitigation and standards for data collection and access.

Finally, Ivo Correa, the Director of Public Policies and Communications for Latin America at Uber, added the point of the difficulties with balancing several governance levels, from national to municipal, and the inconsistencies between them. Due to the nature of many data-based business models, he stressed that data regulations are first being implemented at the city-level, and many simply follow the models in other countries. In his view, capacity building is necessary at the national administration level and closer cooperation between public and private sectors could help bridge capacity gaps. Ivo also highlighted the importance of accessing data from diverse sources for the success of the digital business and of the inclusion of stakeholders with different perspectives in the discussion of typology and governance frameworks.

With the participation of invited experts Sabelo Mhlambi (Berkman Klein Center) and Kathy Pham (Harvard Kennedy School), the Q&A highlighted key takeaways:

  • Best practices on open data need to be fostered to improve machine learning in the Global South. Market failures affect data governance and data governance should take into account the need for healthier markets and negative externalities (such as privacy risks).
  • People’s identities are increasingly being digitized. So can any data really be isolated from any kind of human activity (as derived from or contributing to)? In certain contexts, it could, such as for low-quality satellite imagery. Data which is not about humans directly is out there and is beyond the scope of privacy legislation — and so far, not yet regulated. What should we do about it?
  • Defining and standardising data requires reflecting on the process of collection and access to data, therefore, on its governance framework. Hence, any discussion of data typology is intertwined with that of regulatory frameworks, be them related to privacy, business models or trade.

Frameworks for data governance

(2nd panel)

If an underlying layer of information is essential to the setup of new businesses, can its existence and maintenance be understood in terms of fairness? As we focus on data beyond personal data, such as health data or data resulting from human interaction, what domains should be considered as strategic and in need of more attention, regarding intentional data trading and extraction?

Dr. Susan Aaronson (George Washington University) started her presentation [presentation] by stressing that although data collection is driven by foreign exchange and cross-border trade, it is not regulated like a physical good. Dr. Susan defended that the Global South should bargain to govern the data that is extracted from them by developing new governance frameworks. For her:

“Global governance architecture for data is unfair because developing countries don’t yet see data as an asset, don’t know how to govern data at the national level (the enabling environment) and haven’t yet built data sectors”.

Primavera de Phillipi (CNRS/COALA) provided an overview [presentation] of the most known concepts and analogies that surround the exercise of defining data in order to think about a governance framework. The researcher brought to the discussion various examples of systems following different architectures and business models. Primavera stated that data value lies in its aggregation and the possibilities of triangulations and correlations across data sources — meaning that data can never be really private and that the value of data grows exponentially. She stated:

Data as capital, raw material, labour, property, an extension of self or even data as infrastructure: none of the definitions fills the specificities that data brings to the equation. Data should be defined as data, not oil or anything else so that regulators and technologists can create their own appropriate agreements for each case. And since data is a non-rival resource, shouldn’t all data be open source? “

Dr. James Wahutu (NYU) [presentation]delivered an overview of why context, diversity and aspirations for data collection are key to the governance of data and AI in the Global South. Reminding that “unequal encounters conclude in unequal exchanges”, he brought to the discussion the concept of data colonialism. Dr. Wahutu talked about how data extraction practices set new standards (and brands) from the hyper-connected western citizen upon native habits in the Global South, which can have worrisome impacts to a country self-affirmation and development. Different cultures and communities could have different notions of the concepts of privacy, or the size of the private sphere, for example. Questioning normative definitions used for defining data, Dr. Wahutu concluded:

“Going beyond the normative definitions of data, we should build the understanding that data is culture. Therefore, data must be protected and preserved as the virtualization of knowledge and culture that it is. Moreover, data governance frameworks should reflect different views of culture and of private vs. public, not change it. “
Slide of Dr James’ presentation

The Q&A session counted on the presence of Juan Ortiz (Web Foundation) and Dan Ciuriak (CIGI), highlighted the key takeaways:

  • Lack of transparency and access to data leads asymmetric information which results in market failure. Super technology companies have a monopoly on data collection that stiff the competition and are bad for users, which are left with no choice.
  • In a data-driven world, market failure is further complicated by government failure. Asymmetry of information exists also in the political sphere, particularly in the Global South. Government is failing the democratic process, protecting its citizens and fostering new entrants in digital businesses. We should think about the role of governments and government transparency in this new space.

Conclusions

As the workshop drew to an end, the conclusion was that the workshop touched on important questions and concrete solutions that include the perspective of the Global South for both typology and governance issues are urgently needed. While governance frameworks and new technology architectures arise on the Internet, the Global South seems to be lagging behind. Economic inclusion, social fairness and sustainability should be principles that underline any governance framework, at an international or local level. Data concentration by super tech companies hurts innovation, competition and the chance of a better distribution of the benefits that automation can bring for developing countries.

There is still time to fix it.