Using Data for COVID-19 Requires New and Innovative Governance Approaches

Data & Policy Blog
Data & Policy Blog
Published in
5 min readMay 21, 2020

“Decision Provenance” Allows Us to Establish Data Accountability and Responsibility across the Data Life Cycle

By Stefaan G. Verhulst and Andrew J. Zahuranec

There has been a rapid increase in the number of data-driven projects and tools released to contain the spread of COVID-19. Over the last three months, governments, tech companies, civic groups, and international agencies have launched hundreds of initiatives. These efforts range from simple visualizations of public health data to complex analyses of travel patterns.

When designed responsibly, data-driven initiatives could provide the public and their leaders the ability to be more effective in addressing the virus. The Atlantic and New York Times have both published work that relies on innovative data use. These and other examples, detailed in our #Data4COVID19 repository, can fill vital gaps in our understanding and allow us to better respond and recover to the crisis.

But data is not without risk. Collecting, processing, analyzing and using any type of data, no matter how good intention of its users, can lead to harmful ends. Vulnerable groups can be excluded. Analysis can be biased. Data use can reveal sensitive information about people and locations. In addressing all these hazards, organizations need to be intentional in how they work throughout the data lifecycle.

Decision Provenance: Documenting decisions and decision makers across the Data Life Cycle

Unfortunately the individuals and teams responsible for making these design decisions at each critical point of the data lifecycle are rarely identified or recognized by all those interacting with these data systems.

The lack of visibility into the origins of these decisions can impact professional accountability negatively as well as limit the ability of actors to identify the optimal intervention points for mitigating data risks and to avoid missed use of potentially impactful data. Tracking decision provenance is essential.

As Jatinder Singh, Jennifer Cobbe, and Chris Norval of the University of Cambridge explain, decision provenance refers to tracking and recording decisions about the collection, processing, sharing, analyzing, and use of data. It involves instituting mechanisms to force individuals to explain how and why they acted. It is about using documentation to provide transparency and oversight in the decision-making process for everyone inside and outside an organization.

Toward that end, The GovLab at NYU Tandon developed the Decision Provenance Mapping. We designed this tool for designated data stewards tasked with coordinating the responsible use of data across organizational priorities and departments. These data stewards can engage relevant internal and external staff to participate in the mapping toward identifying and making visible important actors and associated decision points impacting the safe and effective handling of data.

Why It Matters

First, decision provenance can establish increased responsibility and accountability. In data-driven projects, individuals might not play a direct role in task execution but decide on the strategy undertaken. Understanding who these individuals are and why they made those decisions can be essential, both in figuring out what works and what failed and how an organization might improve in the future.

This work can be important for course correction, ensuring there is a designated person who can adjust the strategy if a serious gap becomes apparent. Already, recent analyses have questioned the design of proposed contact-tracing apps. Though there still could be value in these tools, developers will need to consider criticisms of their decisions against the rationales they provided at the time. If the initial reasoning is lacking, change will be necessary.

Second, it allows for better coordination and governance by allowing organizations to ask if they’ve identified relevant parties that should be consulted at each stage of the data life cycle. Far too often, large projects are conceived, planned, and implemented among a narrow group of stakeholders who miss obvious risks and opportunities.

In their recent piece for the Harvard Business Review, Satchit Balsari and his colleagues note that many of the existing apps, dashboards, and aggregators describing COVID-19’s effects lack sufficient engagement with relevant subject-matter experts. The result is that many tech solutions do not describe what they purport to, rely on data that is unvetted or has limited utility, or present oversimplified results with little epidemiological basis.

In crisis situations, bad data can lead to bad decisions that harm millions — often with vulnerable and traditionally marginalized communities bearing the brunt of these impacts. Organizations need to ensure they consult a broad range of experts to ensure their work is useful and reliable.

Lastly, tracking decision provenance helps keep a project’s many stakeholders and decision-makers up to speed on complex projects. Keeping people informed can help to maintain institutional buy-in and support, avoid redundant work, ensure opportunities for creating value with data are not missed.

This value is essential for long-term pandemic response, given that efforts often involve many stakeholders working across multiple sectors. In New York alone, the state’s reopening advisory board includes 100 representatives from academia, business, civil society, labor, and nonprofits.

The GovLab’s Decision Provenance Mapping Tool

Tracking decisions related to data does divert time and resources. But organizations do not face a black-and-white choice between saving lives and acting responsibly. As we face complex problems, we need decision provenance to be efficient and replicate our data efforts in new contexts. Decision provenance must be integrated into data partnerships moving forward. If we can keep track of the current data response to coronavirus, we might apply its lessons to save those affected by today’s crisis and the crises of tomorrow.

Below we present the Decision Provenance Mapping Tool, initially developed for the Responsible Data for Children initiative (RD4C), to address this need. The tool asks users to identify specific data activities undertaken across the data lifecycle, note any policies or laws impacting those activities, and record the individuals or teams responsible, accountable, consulted, or informed for each of these activities.

We believe this tool could be useful in assessing data investments related to COVID-19 and determining which internal and external actors influence the decision making around them. By tracking how data about COVID-19 is used, we believe organizations can promote more responsible and accountable data practices throughout the pandemic.

About the authors:

Andrew J. Zahuranec is Research Fellow at The GovLab @NYU, where he is responsible for studying how advances in science and technology can improve governance.

Stefaan G. Verhulst is Co-Founder and Chief Research and Development Officer of The GovLab, where he is responsible for building a research foundation on how to transform governance using advances in science and technology.

This is the blog for Data & Policy (cambridge.org/dap), a peer-reviewed open access journal exploring the interface of data science and governance. Read on for five ways to contribute to Data & Policy.

--

--

Data & Policy Blog
Data & Policy Blog

Blog for Data & Policy, an open access journal at CUP (cambridge.org/dap). Eds: Zeynep Engin (Turing), Jon Crowcroft (Cambridge) and Stefaan Verhulst (GovLab)