Toward Responsible Data Re-Use for COVID-19 in New York City: Key Takeaways from The Data Assembly

Aditi Ramesh
Open Data Policy Lab
8 min readOct 21, 2020

By Aditi Ramesh, Andrew Young, and Andrew J. Zahuranec

The Data Assembly is an initiative from The GovLab supported by the Henry Luce Foundation to solicit diverse, actionable public input on data re-use for crisis response in the United States. We invite you to share your comments and suggestions on our preliminary report here by Friday, October 30th.

The COVID-19 pandemic has had enormous social, political, cultural, and human costs. A variety of solutions — from increased testing capacity to social and economic support measures — have been deployed in recent months. Yet, the situation still stands to be improved and can be through the expanded use of data.

Effective and responsible data re-use can help policymakers and practitioners provide relief to those affected by the pandemic. The GovLab has identified over 300 data re-use projects addressing various dimensions of the ongoing crisis. However, there is still significant work to be done in order to maximize its value for addressing emerging issues. In March 2020, The GovLab shared A Call For Action to make data collaboration more systematic, sustainable, and responsible in light of COVID-19.

With support from The Henry Luce Foundation, The GovLab launched The Data Assembly to pioneer a new methodology re-imagining how we engage society in creating ideas for data re-use. The Data Assembly aims to collect and synthesize actionable, diverse, public input to identify concerns, expectations, and opportunities in data-driven response to COVID-19. More importantly, it aims to identify the necessary conditions and procedures to enable responsible re-use for crisis response now and in the future.

As part of The Data Assembly, The GovLab hosted three mini-public deliberations this summer to understand the opportunities and risks involved in data re-use. The three respective mini-publics included:

  1. Data holders and policymakers;
  2. Civic rights and advocacy organizations; and
  3. New Yorkers from across the five boroughs.

We facilitated these three mini-publics to engage stakeholders who represent specific interests and communities directly impacted by data re-use; to understand the perspectives of those driving policy and practice; and to engage members of the public at large who are often the intended beneficiaries and subjects of the data being re-used. Consultations for the first two groups involved 15–20 experts curated using the GovLab’s Smarter Crowdsourcing methodology. The New Yorkers Mini-Public deliberation featured 55 New York City residents, sourced through a random sampling methodology, with a focus on diversity across age, gender, income, and borough of residence.

Responsible Data Re-Use

The core output of the Data Assembly is a Responsible Data Re-Use Framework, which seeks to inform decision-makers on how best to re-use data to solve public problems. This draft framework seeks to inform if, when and how the re-use of personal data can be aligned with people’s expectations and societal values.

The Data Assembly deliberations also yielded five cross-cutting recommendations, discussed further in our preliminary report:

  1. Match Urgency with Accountability: Participants expressed willingness to tolerate increased surveillance for public health purposes, but this expanded support for data collection and re-use does not excuse organizations from abiding by responsible data practices and other basic duties of care;
  2. Support and Expand Data Literacy: Meaningful public participation in a data re-use effort depends on all communications being clear, well-justified, and broadly understandable;
  3. Center Equity: Organizations should consider whether the data they intend to re-use misses or under-represents any groups or whether the methods have the potential to exacerbate existing inequalities;
  4. Engage Legitimate, Local Actors: Participants highlighted the need for effective public engagement and leadership from local actors in government and civil society, and the involvement of trusted intermediary organizations that can help to engage with and solicit input from target beneficiary communities.
  5. Develop Positions for Responsible Data Re-Use: Data re-use projects are complex undertakings that require coordination with various actors inside and outside an organization. Dedicated job positions and responsibilities devoted to these issues, such as data stewards, can allow organizations to better respond to new circumstances as they arise.

Expert Panel

On October 14, 2020, The GovLab facilitated a 90-minute expert panel and virtual town hall to share findings from the Data Assembly and generate additional insights on the opportunities and challenges of responsible data re-use. The exchange sought to provide a platform for key stakeholders and leaders in the space, including co-hosts of the series and representatives from the mini-public deliberations, to reflect on the report and share their own experience in working to advance responsible data use and re-use. Moderated by The GovLab’s Co-Founder and Chief Research and Development Officer Stefaan Verhulst, the panel featured:

Stefaan invited panelists to reflect on the value and risks of data re-use for addressing the COVID-19 pandemic, recommendations noted in the report, and steps needed toward implementing the emerging data responsibility framework. Below we summarize some of the high level themes that emerged from the exchange.

The panelists provided sophisticated, multi-faceted reflections on a number of key issues including but not limited to those summarized below — so we also encourage you to watch the full expert panel here.

Photo by Beau Horyza on Unsplash

Encouraging public conversation on data re-use

The Data Assembly made clear that community leaders and members of the general public should be involved in the planning stages of data re-use to clarify what is “mission critical” and valuable to them.

During the panel, Mariko Silver, President and CEO of the Henry Luce Foundation, discussed the need for “a trust building process that requires us to be engaged in up and down decision-making hierarchies.” She also noted that involvement of local communities and individuals in decision-making processes will likely result in greater impact, especially in the context of the pandemic.

Similarly, Paul Ko, Head of LinkedIn Economic Graph Analytics, noted “[t]he principles of data re-use need to be consistently socialized. […] We are continuously surfacing up [problems] and ensuring [public] dialogue.”

Prioritize data re-use that benefits all people

The Data Assembly preliminary report and expert panel also surfaced the need for purpose-driven data re-use that prioritizes equitable benefits to different populations — including traditionally under-served communities and those that are not well represented in institutional datasets, or “data invisibles.”

First Deputy Public Advocate Nick E. Smith, for instance, described the need for proactive rather than reactive policies in preparation for a second wave of the pandemic. In his work, Smith has seen the disparate impacts that gaps in data collection and re-use have on communities of color:

“What COVID exposed was a very long-standing system where people of color haven’t been receiving healthcare and wider ranging resources that we need to be healthy. […] Data about [the impact of COVID on people of color] is important.”

Re-centering our focus on local communities

The Data Assembly highlighted the importance of having practitioners prioritize re-using data to address local, community-based problems and opportunities, rather than broad, ill-defined, or speculative objectives.

Manhattan Borough President Gale Brewer emphasized the need to unlock the value of data at the local level. Open data, she argued, helps communities understand what is happening on the ground and coordinate response with other city agencies. She described the challenges small businesses in Manhattan face: “If we don’t have the data, we don’t know if something makes sense or doesn’t make sense. […] Data can really, really help us plan.”

Similarly, Zachary Feder, the Open Data Program Manager at the Mayor’s Office of Data Analytics, discussed the role of data re-use for New York residents and city agencies in creating new types of value if used responsibly.

“Data re-use and open data, in particular, is predicated on the value that comes from that data sharing,” said Zachary. “There will be different perspectives and uses beyond those originally anticipated.”

He noted the value of open health data in driving critical response and success of the NYC Recovery Data Partnership, which seeks to ingest information from organizations across the city to help others make better decisions.

Capturing data provenance

The Data Assembly report and expert panel specified the importance of tracking and communicating the origin, potential biases, limitations, and previous uses of datasets to ensure that those re-using the data are clear on what insights the data can and cannot provide. Understanding this data provenance, they felt, could support more effective decision-making and risk mitigation later in the data re-use lifecycle.

Jaclyn Sawyer, director of data systems at Breaking Ground, re-enforced this idea, saying a challenge practitioners face is “the loss of context and nuance when we re-use data outside the context it was collected in”, and that a one-size-fits-all approach to data re-use will not be workable.

Encourage data literacy and education efforts

Many panelists stressed the need to foster a participatory data culture. Diana Plunkett, manager of strategic initiatives at Brooklyn Public Library, and Kathleen Riegelhaupt, associate director of digital policy at New York Public Library, discussed how public libraries can play a key role in building data literacy amongst the public and enable more effective community engagement. Diana Plunkett noted, “Each of our branches is very much the center of their communities so having these types of conversation [about data] and making them hyper-local is what we’re interested in.”

“The more people know about [what’s being collected], the easier it is for them to decide whether to opt in or not.”

Kathleen Riegelhaupt added to these sentiments by describing New York Public Library’s overall commitment to participatory conversations and broad citizen engagement. She noted that the Data Assembly’s recommendations resonated given the library’s overall prioritization of “privacy, accessibility, equity, impartiality, and safety” and “technology built and deployed in the public interest.”

The panelists, including the library representatives, spoke about the value of community-led data literacy training and education for meaningful public participation and decision-making around data.

The GovLab welcomes input from New Yorkers and anyone interested in data re-use on the Data Assembly and its preliminary report. We welcome all questions, comments, and concerns about the report by Friday, October 30th. All contributions will inform the final report and Responsible Data Re-Use Framework, as well as The GovLab’s other work related to the COVID-19 pandemic happening around the globe.

--

--