Two Open Science Foundations: Data Commons and Stewardship as Pillars for Advancing the FAIR Principles and Tackling Planetary Challenges

By Stefaan Verhulst and Jean-Claude Burgelman

Data & Policy Blog
Data & Policy Blog
Published in
7 min readNov 14, 2024

--

Today the world is facing three major planetary challenges: war and peace, steering Artificial Intelligence and making the planet a healthy Anthropoceen. As they are closely interrelated, they represent an era of “polycrisis”, to use the term Adam Tooze has coined. There are no simple solutions or quick fixes to these (and other) challenges; their interdependencies demand a multi-stakeholder, interdisciplinary approach.

As world leaders and experts convene in Baku for The 29th session of the Conference of the Parties to the United Nations Framework Convention on Climate Change (COP29), the urgency of addressing these global crises has never been clearer. A crucial part of addressing these challenges lies in advancing science — particularly open science, underpinned by data made available leveraging the FAIR principles (Findable, Accessible, Interoperable, and Reusable). In this era of computation, the transformative potential of research depends on the seamless flow and reuse of high-quality data to unlock breakthrough insights and solutions. Ensuring data is available in reusable, interoperable formats not only accelerates the pace of scientific discovery but also expedites the search for solutions to global crises.

Image of the retreat of the Columbia glacier by Jesse Allen, using Landsat data from the U.S. Geological Survey. Free to re-use from NASA Visible Earth.

While FAIR principles provide a vital foundation for making data accessible, interoperable and reusable, translating these principles into practice requires robust institutional approaches. Toward that end, in the below, we argue two foundational pillars must be strengthened:

  • Establishing Data Commons: The need for shared data ecosystems where resources can be pooled, accessed, and re-used collectively, breaking down silos and fostering cross-disciplinary collaboration.
  • Enabling Data Stewardship: Systematic and responsible data reuse requires more than access; it demands stewardship — equipping institutions and scientists with the capabilities to maximize the value of data while safeguarding its responsible use is essential.

The Twin Pillars of Open Science : Data Commons and Data Stewardship

A. Data Commons

A data commons refers to shared, accessible physical and digital data infrastructures that are governed collaboratively to serve the public interest. Like other commons, such as airspace — governed and managed to optimize use for countries and airlines — data commons aim to maximize the value of the most critical resource for open science: data.

Data commons allows data to be available for research, innovation, and policymaking. In the context of addressing previously mentioned global challenges, an open data commons can provide the necessary framework to support large-scale, coordinated efforts across nations and sectors — while avoiding extractive relationships. The data commons framework helps redistribute the value of data across communities, ensuring the benefits are more equitably received. It can allow communities to set data use on their terms, avoiding a dynamic where outside stakeholders come in and take assets but provide no value to those represented within them.

To put it differently: Without a data commons the search for solutions will not be as fast nor as equitable. It is increasingly clear that when data access is privatized or enclosed, because of the increased competition for access to data for training Artificial Intelligence (AI), the beneficiaries will be scarce.

For instance, by providing access to high quality data in a machine-readable format, more open licensed data can be included in generative AI training and fine-tuning. This can decrease the reliance on data scraped from across the internet and minimize the extraction of data assets without the appropriate licensing. It can also ensure there is more representation in the datasets used to train AI models and increase the quality of the output.

At least three elements of a data commons are important to highlight:

  • Enabling Cross-Sectoral and Global Collaboration: Data commons can break down silos that frequently exist between academic institutions, governments, private organizations, and civil society. By allowing open access to critical data, such as satellite imagery for climate monitoring or conflict detection, data commons permit diverse stakeholders to work together more effectively. This cross-sectoral collaboration, in turn, helps foster innovative and inclusive solutions by pooling expertise, resources, and insights.
  • Ensuring Interoperability and Data Quality: Data commons can help ensure the quality, interoperability, and usability of data. For example, climate or conflict-related data can be standardized and harmonized, enabling data to be meaningfully analyzed across contexts. Data quality and standardization also make it easier to conduct cross-sectoral research and develop evidence-based solutions that can be applied on a global scale.
  • Enabling Competition and Innovation: Data commons are also crucial to maintaining healthy competition and innovation in the development of Large Language Models (LLMs). The creation of these models requires vast computational resources and access to high-quality, large-scale datasets. Without equitable access to similar data, smaller players and new entrants face insurmountable barriers, leaving the field dominated by just a few major entities. By ensuring shared, collaborative access to essential data through Data Commons, the playing field can be leveled, fostering innovation and preventing monopolization in the rapidly evolving AI landscape. This approach not only keeps competition alive but also drives more diverse and robust advancements in AI technology.

B. Data Stewardship

Data stewards refer to individuals or teams tasked with managing an organization’s data assets. Among other responsibilities, data stewards are essential for ensuring access to data in a systematic, sustainable and responsible manner. While they help develop and implement frameworks to safeguard against misuse of sensitive data, particularly in areas like AI development (e.g., algorithmic biases) and conflict zones (e.g., surveillance) they also help fulfill the positive potential of re-using data in a responsible manner, seeking to avoid the risk of missed use, which is less recognized but potentially as pernicious as the risk of data misuse.

In addition to their general importance for responsible data re-use, data stewards help achieve two further vital objectives:

  • Building Trust and Accountability: Data stewardship plays a crucial role in fostering trust by ensuring that data is managed and used in ways that align with the public good. For example, data stewards may implement accountability mechanisms to track how data is accessed and applied, preventing misuse while promoting transparency. Again, community-driven governance models are essential in these processes, giving voice to stakeholders — particularly marginalized groups — in decision-making regarding data governance. In this sense, data stewards have a vital role to play in promoting a more equitable and inclusive data ecology.
  • Ensuring Long-Term Sustainability: Effective data stewardship is key to maintaining the long-term sustainability of an open data commons (and thus open science). Stewards ensure that datasets are regularly updated, curated, and kept accessible, preventing the loss or degradation of valuable information. They are also essential to developing sustainable financing models and broad multi-stakeholder involvement, ensuring that the open data commons can continue to evolve and serve the public interest for a multitude of stakeholders over time.

Operationalizing Data Commons and Data Stewardship

Open science and the FAIR principles have the potential to address planetary challenges. However, realizing this potential requires operationalizing data commons and advancing data stewardship.

To achieve this, we propose the following priority steps:

1. Develop Comprehensive Blueprints for Data Commons

2. Leverage Lessons from Existing Commons

  • Study governance frameworks from other commons like airspace, fisheries, or open-source software to identify best practices. Adopt principles like collaborative management, equitable resource allocation, and conflict resolution mechanisms.
  • Ensure governance structures are adaptable, allowing for evolving technologies and societal needs while maintaining core principles of fairness and accessibility.

3. Create Sustainable Funding Mechanisms

  • Develop funding schemes that blend public resources with private sector investments to ensure the long-term sustainability of data commons.
  • Consider tiered membership fees for organizations accessing or contributing to the commons, scaled based on usage and resource capacity, to make sure the benefits are equitably distributed across communities.

4. Professionalize Data Stewardship

  • Create a professional association for data stewards to set standards, provide certification, and foster a community of practice.
  • Design training modules that equip data stewards with the skills to manage, govern, and ensure the ethical re-use of data in dynamic and complex environments.
  • Institutionalize the role of data stewards within organizations to ensure continuous oversight and alignment with FAIR principles and data governance goals.

About the Authors

Stefaan Verhulst is Co-Founder and Chief Research and Development Officer as well as Director of GovLab’s Data Program. He is an Editor-in-Chief of Data & Policy, the open-access journal published by Cambridge University Press.

Jean-Claude Burgelman is Emeritus Professor of Open Science at the Free University of Brussels. He retired in 2020 from the European Commission where he was in charge of Open Science policies. Until 2000 he was full Professor of Communication Technology Policy at the Free University of Brussels. He recently joined the advisory board of Open Knowledge Maps, Scimagine and became the editor in chief of Frontiers Policy Lab. In 2022 he became the director of the Frontiers Planet Prize, a global competition to stimulate science that can save the planet.

***

This is the blog for Data & Policy (cambridge.org/dap), a peer-reviewed open access journal published by Cambridge University Press in association with the Data for Policy Community. Interest Company. Read on for ways to contribute to Data & Policy.

--

--

Data & Policy Blog
Data & Policy Blog

Published in Data & Policy Blog

This is the blog for Data & Policy (cambridge.org/dap), an open access journal for the impact of data science on governance. Editors-in-Chief: Zeynep Engin (UCL, Data for Policy), Jon Crowcroft (Cambridge, Turing Institute), Stefaan Verhulst (GovLab, NYU). Published by CUP.

Data & Policy Blog
Data & Policy Blog

Written by Data & Policy Blog

Blog for Data & Policy, an open access journal at CUP (cambridge.org/dap). Eds: Zeynep Engin (Turing), Jon Crowcroft (Cambridge) and Stefaan Verhulst (GovLab)

No responses yet