Data Stewards Network
Published in

Data Stewards Network

2021 in Review: Advancing the Third Wave of Open Data in A Responsible Way

Photo: Unsplash/@NASA is licensed under CC0

Following the world-altering year that was 2020, 2021 came with its own fair share of triumphs and challenges. As we look back on the year at The GovLab, our goal has been to learn from these experiences to help decision-makers drive positive change. Along the way, we’ve worked to reinforce the four pillars of the Third Wave of Open Data. Our work has included strengthening the ability of public interest institutions to work more openly, collaboratively, effectively, and legitimately. Toward that end, many of our projects were focused on making data collaboration more systematic, sustainable and responsible.

We hope to use this blog to look back through all that we’ve accomplished this year and how we might build on it in 2022. In what follows, we share a selection of key findings and achievements across the many projects we worked on this year to help policymakers, practitioners, and other decision-makers re-use data to achieve great social impact. We’ve curated our publications and projects under the four pillars of the Third Wave of Open Data, as each pillar serves to guide the work we do (see also Chronological Overview of Curated Highlights):

  • Becoming more intentional and purpose driven when using data and AI;
  • Fostering partnerships and data collaboration;
  • Advancing data innovation at the subnational level; and
  • Prioritizing data responsibility and data rights.

***

BECOMING MORE INTENTIONAL AND PURPOSE DRIVEN WHEN USING DATA AND AI

In this Third Wave, we expect to see a much more purpose-directed approach to data provision than prior waves. Practitioners seek not simply to open data but to do so in a way that focuses on impactful re-use. With a more balanced focus on both the supply and demand side of the data equation, practitioners look beyond the data to account for the broader technical, social, political and economic context within which data is produced and consumed. The goal of this work is expected to be to achieve maximum social impact. Through robust problem definition, and clear and effective questions and communication, we can achieve this goal.

Below is a highlight of The GovLab’s publications from 2021. In addition to the publications listed here, we’ve added to our numerous blogs and partners’ publications over the past year.

PROJECTS

100 Questions:

The 100 questions initiative seeks to map the world’s 100 most pressing, high impact questions that could be answered if relevant datasets were leveraged in a responsible manner. We have launched three domains within this initiative:

The 100 Questions Future of Work Domain

Generating a sound framing for what the future of work entails (and does not entail) is notoriously difficult and poses a monumental challenge to policymakers tasked with preparing for — and responding to — the future of work. Recognition of this need was the impetus for The 100 Questions Future of Work Domain. The GovLab and its partner, The Bertelsmann Foundation, published a report providing an overview of the Future of Work domain for prospective partner organizations and stakeholders. It details the steps in the project, the methodology used, and it ends with a listing of the questions developed and potential next steps for the effort.

The 100 Questions Governance Domain

The GovLab at NYU Tandon, alongside the Asia Foundation, the Centre for Strategic and International Studies in Indonesia, and the BRAC Institute of Governance and Development have launched the Governance Domain of the 100 Questions Initiative. This domain looks at governance, seeking to find ways data can make governance more efficient, effective, and equitable. The public voting phase of the Governance Domain seeks input on which questions around data use in governance are the most pressing. The public voting closes December 31, 2021.

100 Questions Food Systems Sustainability Domain

Together with the Barilla Foundation, and the Center for European Policy Studies, the GovLab launched the Food Systems Sustainability Domain of the 100 Questions Initiative. It seeks to identify the 10 most impactful questions that will help make our food systems healthier for us, and the planet. “To make food production, distribution, and consumption healthier for people, animals, and the environment, we need to redesign today’s food systems. Data and data science can help us develop sustainable solutions — but only if we manage to define those questions that matter.” (Read more about it in Fiona Cece’s piece).

RESEARCH & ANALYSIS

Emerging Uses of Technology for Development: A New Intelligence Paradigm

“The GovLab and French Development Agency (AFD) published the report Emerging Uses of Technology for Development: A New Intelligence Paradigm, written by Peter Martey Addo, Dominik Baumann, Juliet McMurren, Stefaan G. Verhulst, Andrew Young, and Andrew J. Zahuranec. This paper examines how development practitioners can experiment with emerging forms of technology to advance development goals.”

Open data in action: initiatives during the initial stage of the COVID-19 pandemic

“The Governance Lab and OECD released Open data in action: initiatives during the initial stage of the COVID-19 pandemic. Building off The GovLab’s Call for Action to unleash the power of data collaboration for COVID-19, this report examines a number of open government data (OGD) initiatives “used to react and respond to the COVID-19 pandemic during the initial stage of the crisis (March-July 2020)”. It also seeks to transform lessons learned into considerations for policy makers on how to improve OGD policies to better prepare for future shocks.”

The Use of Mobility Data for Responding to the COVID-19 Pandemic

“The Use of Mobility Data for Responding to the COVID-19 Pandemic” was developed in a partnership between The GovLab, the Open Data Institute and Cuebiq. Written by Aditi Ramesh, Andrew Young, Andrew Zahuranec, Stefaan Verhulst, Brennan Lake, Ben Snaith and Olivier Thereaux, the report explains how data on mobility have been collected and used to track the spread of COVID-19 and various responses to the pandemic. It offers nine strategies to improve the re-use of mobility data to strengthen responses to the pandemic.

Data & Policy’s Special Collections:

Data & Policy is an open access journal by Cambridge University Press. The journal’s editors-in-chief are: Zeynip Engin (UCL & Data for Policy, UK), Jon Crowcroft (University of Cambridge & Alan Turing Institute, UK) and Stefaan Verhulst (The GovLab, New York University, USA). This year, they released two Special Collections of peer-reviewed, open access articles on specific data-related challenges guest edited by experts from the respective fields:

Telco Big Data Analytics for COVID-19

Guest Edited by Richard Benjamins (Telefónica) and Jeanine Vos (GSMA) with editor-in-chief Stefaan Verhulst, this special collection looks at the opportunities and challenges offered by mobile big data in the fight against COVID-19. The collection consists of 5 papers from 33 researchers and experts from different sectors around the world.

Data and Migration Policy

As part of the Big Data for Migration Alliance, this collection includes a series of articles on complex, cross-border human mobility, emphasizing the role of data in informing policy responses and humanitarian aid.” Marzia Rango (IOM) and Michele Vespe (EC JRC) act as Editors for this Special Collection with support from editor-in-chief Stefaan Verhulst.

Where Is Everyone? The Importance of Population Density Data

“The GovLab released a new report examining data and information about population density that can help to create social and economic value. The report, “Where Is Everyone? The Importance of Population Density Data” by Aditi Ramesh, Stefaan Verhulst, Andrew Young and Andrew Zahuranec, is the first in a new series of Data Artefact Studies from The GovLab. The piece focuses in particular on Facebook’s Population Density Map and several of its implementations to understand promising practices, opportunities, challenges, and risks in the use of population density data more broadly.”

Data Science for Social Good: Philanthropy and Social Impact in a Complex World

“SpringerLink published the book “Data Science for Social Good: Philanthropy and Social Impact in a Complex World.” Edited by Ciro Cattuto and Massimo Lapucci, the publication compiles recent research on data science for social impact, machine learning, and artificial intelligence. In addition to The GovLab’s Stefaan Verhulst, the book contains chapters from frequent GovLab collaborators and various world-class experts on science for social impact.”

The Living Library’s Selected Readings Series:

“As part of an ongoing effort to build a knowledge base for the field of improving governance through technology, The GovLab publishes a series of Selected Readings, which provide an annotated and curated collection of recommended works on themes such as open data, data collaboration, and civic technology.”

Inaccurate Data, Half-Truths, Disinformation, and Mob Violence by Fiona Cece, Uma Kalkar, and Stefaan Verhulst

“.. a curation of findings and readings that illustrate the global danger of inaccurate data, half-truths, and willful disinformation.”

Selected Readings on Data, Gender, and Mobility by Michelle Winowatan, Uma Kalkar, Andrew Young, and Stefaan Verhulst

“This curated and annotated collection of recommended works on the topic of data, gender, and mobility was originally published in 2017, and updated in 2021.”

Selected Readings on the Use of Artificial Intelligence in the Public Sector by Kateryna Gazaryan and Uma Kalkar

“This curated and annotated collection of recommended works focuses on algorithms and artificial intelligence in the public sector.”

***

FOSTERING PARTNERSHIPS AND DATA COLLABORATION

The Third Wave fosters partnerships between a diverse range of organizations and individuals to create more open data projects engaged in creating social value and driving meaningful change. In this phase, data collaboratives will emerge as a key driving force behind public problem solving by bringing together public and private-sector actors to break down challenges of data asymmetries, inequality and a lack of transparency to fill critical information gaps that currently exist in the ecosystem. The GovLab’s Data Collaboratives Explorer offers a directory of more than 200 real-world examples of data collaboratives across numerous sectors and fields.

In addition to our Data Collaboratives Explorer, we’ve been engaged in other projects to foster partnerships and data collaboratives, including:

PROJECTS

#Data4COVID19 Africa Challenge

“The #Data4COVID19 Africa Challenge was an initiative by The Govlab, Expertise France, and l’Agence française de développement (AFD) to mitigate these issues by spurring innovative, data-driven work to address COVID-19 and its secondary effects. We invited intersectoral teams (representing academic, government, and nonprofit actors) specializing in data analysis to submit proposals for funding for innovative projects that addressed challenges associated with COVID-19. We then supported seven of these groups over a six-month period as they unlocked datasets, worked with other actors in the field, and tried to translate their insights into action.”

PLACE Data Trust

“The GovLab and Future State, with support from the Rockefeller Foundation, are partnering with PLACE to design a new operational and governance approach to create, store, and access mapping data through a Data Trust. On March 23, 2021, we hosted a kick off workshop with various non-profit and private sector stakeholders, where we focused on key governance questions when considering how to turn the theory of data trusts into practice.”

Designing Data Collaboratives to Better Understand Human Mobility and Migration in West Africa:

“The Big Data for Migration Alliance (BD4M), an effort spearheaded by the International Organization for Migration’s Global Migration Data Analysis Centre (IOM GMDAC), European Commission’s Joint Research Centre (JRC), and The GovLab, released the report Designing Data Collaboratives to Better Understand Human Mobility and Migration in West Africa, providing findings from a first-of-its-kind rapid co-design and prototyping workshop, or “Studio.”

RESEARCH & ANALYSIS

#Data4COVID19 Mobility Repository

“The GovLab, in partnership with Cuebiq and with support from the Open Data Institute, published a repository of data collaboratives that use mobility data, passively collected data about the location of a device, for COVID-19 responses.” The reasoning behind the development of the repository can be found in a report by the partners titled “The Use of Mobility Data for Responding to the COVID-19 Pandemic”. The report explains how data on mobility have been collected and used to track the spread of COVID-19 and various responses to the pandemic.

AI Localism Repository

“Early last year, The GovLab’s Stefaan Verhulst and Mona Sloane coined the term “AI localism” to describe how local governments have stepped up to regulate AI policies, design governance frameworks, and monitor AI use in the public sector. The AI Localism repository is a curated collection of AI localism initiatives across the globe categorized by geographic regions, types of technological and governmental innovation in AI regulation, mechanisms of governance, and sector focus.”

***

ADVANCING DATA INNOVATION AT THE SUBNATIONAL LEVEL

Unlike previous waves focused at the national and international levels, the Third Wave places a greater emphasis on building open data capacity and meeting open data demand at the subnational level. Data held by the public sector and other institutions in cities, municipalities, states, and provinces are, by definition, more targeted and narrower in scope than data made available at the national or supranational level. Subnational open data is more likely to align with the direct and immediate needs of citizens, and the actors representing the demand for that data are likely to be more proximate to the people they intend to benefit and more familiar with their needs.

We list below The GovLab’s work related to this goal:

TOOLS

The Third Wave of Open Data Toolkit

“The GovLab’s Open Data Policy Lab initiative released The Third Wave of Open Data Toolkit. The toolkit — co-authored by Andrew Young, Andrew J. Zahuranec, Stefaan G. Verhulst and Kateryna Gazaryan — builds off our report on the Third Wave of Open Data to support “the work of data stewards, responsible data leaders at public, private, and civil society organizations empowered to seek new ways to create public value through cross-sector data collaboration” and features eight actions items that outline “specific operational guidance on how to foster responsible, effective, and purpose-driven re-use.””

AI Localism Canvas

Developed by Stefaan Verhulst, Andrew Young, and Mona Sloane, the AI Localism Canvas “examines this field of AI Localism — a global move toward innovative governance of AI at the subnational level. The piece introduces the current state of play in the field, and introduces an “AI Localism Canvas” to help decision-makers identify, categorize and assess instances of AI Localism specific to a city or region. It provides several examples of AI governance innovation on the local level and provides an “AI Localism Canvas” as a framework to help guide the thinking of scholars and policymakers in identifying, categorizing, and assessing the different areas of AI Localism within a city or region.”

RESEARCH & ANALYSIS

Open data in action: initiatives during the initial stage of the COVID-19 pandemic

Building off The GovLab’s Call for Action to unleash the power of data collaboration for COVID-19, the Open data in action: initiatives during the initial stage of the COVID-19 pandemic report examines a number of open government data (OGD) initiatives “used to react and respond to the COVID-19 pandemic during the initial stage of the crisis (March-July 2020). It also seeks to transform lessons learned into considerations for policy makers on how to improve OGD policies to better prepare for future shocks.”

PROJECTS

The Data Stewards Academy

In April 2021, the GovLab, through its Open Data Policy Lab Initiative, launched The Data Stewards Academy, a self-directed learning program intended to support data stewards in creating public value through data collaboration. The program is adapted from a selective Executive Course on Data Stewardship, an eight-week accelerated learning program to support public and private sector leaders to open data and reduce data access inequities in ways that advance their institution’s goals. The learning materials guide data stewards (and aspiring data stewards) through the process of designing and implementing a data reuse strategy for solving public problems. After this first successful experience, the second and third cohorts of students were launched.

ODPL City Incubator

The GovLab’s Open Data Policy Lab launched the City Incubator in July 2021. This is a first-of-its-kind program to support data innovations in cities’ success and scale, the City Incubator is giving 10 city officials access to hands-on training and access to mentors to take their ideas to the next level. It enables cutting edge work on various urban challenges and empowers officials to create data collaboratives, data-sharing agreements, and other systems. This work is supported by Microsoft, Mastercard City Possible, Luminate, NYU CUSP and the Public Sector Network.

***

PRIORITIZING DATA RESPONSIBILITY AND DATA RIGHTS

The Third Wave of open data is characterized by a responsibility-by-design approach to open data activities in order to promote fairness, accountability, and transparency across all stages of the data lifecycle to manage risks and maximize value. This work involves not just preserving data rights and needs but also measuring benefits against risks. Although privacy is key in any open data project, it does not exist in a vacuum. There are other risks from and to the data ecosystem that need to be considered and protected alongside (not to the exclusion of) privacy. In addition, there exist power asymmetries in how data is made available that may reinforce existing inequities. Importantly, the emerging conception of the Third Wave suggests data suppliers can be proactive in assessing the ethical implications of data re-use, and take steps to ensure that external actors do not use data in a way that could harm data subjects.

Relevant to this goal are the projects and tools below:

TOOLS

The Data Responsibility Journey Mapping Tool

The GovLab launched the public beta of The Data Responsibility Journey Mapping Tool, designed to help Data Stewards and other decision-makers assess and mitigate risks across the lifecycle of a data collaborative. The tool guides users through important questions and considerations across the lifecycle of data stewardship and collaboration: Planning, Collecting, Processing, Sharing, Analyzing, and Using.

22 Questions to Assess Responsible Data for Children (RD4C)

The first product of this second phase is the 22 Questions to Assess Responsible Data for Children (RD4C) tool. The RD4C principles were developed to act as a north star, guiding practitioners toward more responsible data practices. The 22 Questions to Assess Responsible Data for Children (RD4C) is an audit tool that helps stakeholders involved in the administration of data systems that handle data for and about children align their practices with the RD4C Principles.

PROJECTS

Responsible Data for Children (RD4C) initiative — second phase

In 2021, the second phase of the Responsible Data for Children Initiative (RD4C) was launched. RD4C aims to advance best practice in data responsibility; identify challenges and develop practical tools to assist practitioners in evaluating and addressing them; and encourage a broader discussion on actionable principles, insights, and approaches for responsible data management. In the initiative’s second phase, The GovLab and UNICEF continue to promote and enable the use of RD4C Principles; develop and test a new methodology and platform for auditing data systems; develop detailed case studies on diverse, instructive uses of data from across the UNICEF ecosystem; and regularly disseminate new insights and findings through the newly established RD4C Blog.

AI Ethics: Global Perspectives

The GovLab, NYU Tandon School of Engineering, Global AI Ethics Consortium (GAIEC), Center for Responsible AI @ NYU (R/AI), and the TUM Institute for Ethics in Artificial Intelligence (IEAI) jointly launched the free, online course, AI Ethics: Global Perspectives. Designed for a global audience, it conveys the breadth and depth of the ongoing interdisciplinary conversation on AI ethics. This course seeks to bring together diverse perspectives from 20 experts across the field of ethical AI, to raise awareness and help institutions work towards more responsible use.

International Digital Self-Determination Network

The International Digital Self-Determination Network was formed in October 2021 and is composed of the Directorate of International Law, Swiss Federal Department of Foreign Affairs; the Centre for Artificial Intelligence and Data Governance at Singapore Management University; the Berkman Klein Center at Harvard University; the Global Tech Policy Practice at the TUM School of Social Sciences and Technology and The GovLab at New York University. It seeks to “bring together diverse perspectives from different fields around the world to study and design ways to engage in trustworthy data spaces and ensure human centric approaches.”

RESEARCH & ANALYSIS

Reimagining data responsibility: 10 new approaches toward a culture of trust in re-using data to address critical public needs

In this piece for Data & Policy, Stefaan Verhulst outlines 10 approaches and innovations for data responsibility in the 21st century. These emerging concepts we have identified include: end-to-end data responsibility; decision provenance; professionalizing data stewardship; moving from data science to question science; contextual consent; responsibility by design; data asymmetries and data collaboratives; personally identifiable inference; group privacy; and data assemblies.