Re-use of smart city data: The need to acquire a social license through data assemblies
Written Testimony by Stefaan G. Verhulst
Co-Founder, The GovLab, Tandon School of Engineering, New York University
Before the New York City Council Committee on Technology
Oversight Hearing: Smart City, January 19, 2021
Chairman Holden and distinguished members of the Committee, thank you for allowing me the privilege to appear virtually before you today. My name is Stefaan G. Verhulst, Co-Founder of and Chief Research and Development Officer at the Governance Lab (“The GovLab”) at New York University. The GovLab is an action research center whose mission is to strengthen the ability of institutions — including but not limited to governments — and people to work more openly, collaboratively, effectively and legitimately to make better decisions and solve public problems. I had the unique pleasure of addressing this committee two years ago on how to leverage and share data for urban flourishing (and in particular, how to do so through an approach known as data collaboratives, which I will explain further below).
Though I speak today on similar issues, we live in a vastly different world than in 2019. Over the last year, the city has faced enormous challenges. To date, the COVID-19 pandemic has sickened an estimated 493,000 New York residents and killed over 25,000. An economic crisis left 14 percent of all workers without jobs by September and closed thousands of small businesses. Protests forced the city to reckon with questions of police accountability and long-standing inequities.
In crises such as these, calls for the city to harness technology and data to help policy-makers find solutions grow louder and stronger. Many have spoken about accelerating already ongoing work to turn New York into “a smart city” — using digital technology to connect, protect, and improve the lives of its residents. Some of this proposed work could involve the use of sensors to collect data on how people live and work across New York City. Other work could involve expanding the city’s relationships with private organizations through data collaboratives. Data collaboratives, which are central to our work at the GovLab, are a new form of collaboration that extends beyond the conventional public-private partnership model, in which participants from different sectors exchange their data to create public value. The city already operates one such data collaborative in the form of the NYC Recovery Data Partnership, a partnership that allows New York-based private and civic organizations to provide their data to analysts at city agencies to inform the COVID-19 pandemic response. I have the privilege of serving as an advisor to that initiative.
Data collaboration takes place widely through a variety of institutional, contractual and technical structures and instruments. Borrowing in language and inspiration from the open data movement, the emerging data collaborative movement has proven its value and possible positive impact. Data reuse has the potential to improve disease treatment, identify better ways to source supplies, monitor adherence to non-pharmaceutical restrictions, and provide a range of other public benefits. Whether it is informing decision-making or shaping the development of new tools and techniques, it is clear that data has tremendous potential to mitigate the worst effects of this pandemic.
However, as promising and attractive as reusing data might seem, it is important to keep in mind that there also exist widespread concerns and challenges. Like all tools, the technologies that make up a smart city, and the data they generate, can be used well or badly, in ways that align with local values and expectations and in ways that do not. As you are all no doubt aware, in Toronto, city leaders signed onto a billion-dollar plan with Sidewalk Labs, an Alphabet subsidiary company, to transform a slice of the city’s waterfront into a “neighborhood of the future” without consulting residents early enough in the process. Local opposition and concerns about sensor-enabled surveillance eventually led to the project’s cancellation.
In order to avoid a similar crisis of legitimacy in New York, it is essential that we build trust among all stakeholders, and especially the general public. That is why it is incumbent on policymakers and others who would use data for the public good to exercise extra due diligence in doing so. They need to both ensure they can limit the application of technology to the originally identified purpose and that the application matches local expectations and values. Any reuse of data must be accompanied by a “social license” to do so, by which I mean the ongoing approval within those communities the project seeks to help, as well as other stakeholders. I firmly believe (and history has shown) that without trust and legitimacy — themselves products of an open, transparent, and participatory planning process — no smart city project can ever succeed.
The Data Assembly
Aware of this need to build trust and legitimacy, and also of ongoing city-led efforts to use data in the pandemic response, The GovLab recently launched a project that we call the Data Assembly: a Citizens Assembly on the Re-use of Data. Built in collaboration with the Henry Luce Foundation, the project aims to enhance public participation in the re-use of data for COVID-19 in New York City. In particular, the Data Assembly sought to increase understanding among policymakers and others about how different communities feel about the underlying issues, and especially about risks and benefits that are inherent to data reuse. Current signals about social attitudes and values toward data tend to come primarily from op-ed pieces in newspapers and broad surveys of public opinion. Such snapshots of opinion tend to be quite simplistic and lack a context for meaningful deliberation. Deliberative public engagement methodologies, such as those used for the Data Assembly, offer a more context-rich approach, allowing us to understand how different constituencies make value judgements and how they perceive challenges and risks involved in data sharing.
Through the summer of 2020, we sought diverse and actionable input toward developing a responsible data re-use framework. Our work relied on virtual “mini-public” deliberations which we co-hosted with the Brooklyn Public Library and New York Public Library. The deliberations took place among three cohorts: data holders and policymakers operating in New York; rights groups and advocacy organizations; and New York city residents. Participants in each group received short briefings about data re-use followed by examples of hypothetical applications (inspired by real world cases) of using data for COVID-19. The exhibits prompted conversations about what types of data re-use participants considered appropriate, and under what conditions. My colleagues and I facilitated these discussions, encouraging participants to reflect on why they felt the way they did and what general principles city leaders should use when thinking about data re-use.
Table 1: The Data Assembly Re-Use Exhibits
These engagements differed from traditional public hearings and solicitations in that they provided people from a variety of backgrounds the space to develop ideas collaboratively and co-create solutions to problems they saw every day. Not only did these conversations reveal under-examined perspectives but also sophisticated attempts to weigh value against risk. While not everyone is an expert in data science, we found that everyone interacts with data in some way and has instinctive and often quite informed views on its use.
The mini publics offered nuance that only diverse public input can provide, and revealed the complexity needed to design and evaluate a data re-use project. We firmly believe that understanding such complexity, and building on it, is key to enhancing public trust of data re-use, and thus to unlocking some of the very real potential that technology offers in addressing the Covid-19 pandemic.
Toward a Responsible Data Re-Use Framework
We extracted major points and expectations from the three deliberations into a report titled Responsible Data Re-Use Framework. This report is intended to provide general design and governance considerations when re-using data for public good .
The report includes a Design Wheel of Data Re-Use which provides organizations a checklist to consider public expectations and development options. The checklist includes the following elements (the Appendix contains more details):
- Why, the purpose, scope, and limitations of data reuse;
- What, the data assets needed and their technical requirements;
- Who, the actors involved and their responsibilities;
- How, the operational strategy and governance framework for data re-use;
- Where, the local focus and contextual and jurisdictional implications of a project;
- When, the duration of the data re-use effort, including data retention, termination, and modification.
The wheel does not prescribe how organizations should resolve the questions it raises. That work needs to be done by organizations themselves, in collaboration with the public, according to the specific components of the project. One of the findings of our research concerns the wide variability of needs and optimal structures for data reuse. Our intention was therefore to provide a broad framework, but to ensure enough flexibility for contextual adaptation.
Figure 2: The Design Wheel of Data Re-Use
We presented these findings to the public and city leaders on October 21, 2020 through an expert panel and virtual town hall discussion. The panel featured reflections on the Data Assembly’s findings and recommendations from Manhattan Borough President Gale Brewer, New York City First Deputy Public Advocate Nick E. Smith, NYC Open Data Program Manager Zachary Feder, representatives from the Brooklyn and New York Public Library, and Henry Luce Foundation President and CEO Mariko Silver.
The broad support for the initiative we received from the participants and townhall attendees suggests a new methodology that the city might embrace for future technology and data re-use proposals. By using mini-publics, the city can do more than just ensure it is engaging with various groups. It can engage New Yorkers in co-developing city policies, principles, and priorities. We are now building on this more targeted, nuanced citizen engagement strategy to tap into stakeholders’ perceptions on effective and legitimate local governance of AI projects through our AI Localism initiative.
The Data Assembly initiative produced additional insights that could be instructive for future projects. In addition to the design principles noted above, the report includes some attitudes and principles that can guide data re-use on COVID-19 or smart city work more generally. Participants in all three mini-publics expressed support for increased responsible e-use of data for public interest purposes, though this expanded support does not excuse organizations from responsible data practices and other basic duties of care. Participants also expressed a desire for data re-use efforts to ensure equity by including legitimate, local actors to create public value from data rather than prioritizing state or federal actors. Our analysis of the conversations further revealed a need to promote data literacy through institutions such as public libraries and to help organizations create positions within their organizations devoted to coordinating data re-use, positions we call data stewards.
Figure 3: Recommendations from The Data Assembly
My colleagues and I at The GovLab believe the Data Assembly methodology offers the city a new way forward on the issues under discussion today, as they relate to smart cities. In our view, oversight cannot just be a reactive process of responding to complaints but a proactive one, inviting city residents, data holders, and advocacy groups to the table to determine what is and is not acceptable. Amid rapidly changing circumstances, the city needs ways to collect and synthesize actionable and diverse public input to identify concerns, expectations, and opportunities. We encourage the city to explore assembling mini-publics of its own or, failing that, commission legitimate partners to lead such efforts.
New York faces many challenges in 2021 but I do not doubt the capacity of its people to overcome these struggles. Through people-led innovation and processes, the city can ensure that data re-use conducted as part of the smart city is deemed legitimate and more effective and targeted. It can also support the city in ensuring work across the city is more open, collaborative, and legitimate.
Appendix: The Responsible Data Re-use Framework
The below framework is informed by The GovLab’s Data Assembly deliberations and is organized according to the Why, What, Who, How, When, and Where of data re-use. The framework intends to support a move toward more equitable, ethical, and sustainable data re-use efforts in the public interest.
WHY: the purpose, scope, and limitations of a data re-use project
- What is the purpose of this data re-use project? Clearly outline the mission and goals.
- How will this project benefit the community and inform decision-making to address COVID-19?
- Is the re-use of data necessary for this project? Consider if this data re-use is the most direct, least invasive means to achieve the intended purpose.
- Who are the target audiences of this project?
- What steps will this project take to capture under-served, “data-invisible” populations?
- What are the risks posed to the subjects and communities used in this data collection by re-using this information?
WHAT: the standards, formats, and technical requirements of data assets used in a project
- Where did the data come from?
- Are the potential biases, limitations, and previous uses of the datasets clearly outlined?
- Has the data provenance — the origin, biases, limitations, and past dataset use — been communicated to other collaborators and stakeholders?
- What efforts have been made to mitigate biases and limitations of the dataset?
- Has the data been aggregated and anonymized to protect individuals and groups involved in the data?
- What steps have been taken to avoid the inadvertent re-identification of data subjects in relatively small samples?
- How does the data re-use initiative ensure that “data invisibles” are not left behind when targeting service delivery?
- What are the risks associated with the data visibility of previously “data invisible” groups? Consider these risks even for projects intended to benefit these communities.
- What safeguards will be put in place to protect these datasets (and thereby the people from whom the information was collected)?
WHO: the actors, their custodial duties, data access criteria, and rights and responsibilities involved in a project
- State the actors involved in the project and their sector criteria (i.e. data provider, data user, data subjects, members of the public, community leaders, local governments, non-profits, businesses, academia, trusted intermediaries, or intended beneficiaries.
- What type of re-use of data is most appropriate and potentially impactful by each actor?
- How will this project engage with stakeholders during the planning stages of the project?
- Has this project embedded data steward roles throughout the data re-use lifecycle to ensure the data is clean, accurate, and handled carefully and ethically?
- Have “trusted intermediaries” been identified and included in the project planning and implementation process? Do some of these intermediaries have legal knowledge that can support more effective data re-use?
- What steps will be taken to support community-led data literacy training and education? Including intermediates in these initiatives can help spur community engagement.
- Are the institutional actors involved data literate? What steps have been taken to strengthen their data literacy skills?
HOW: the operational strategy and governance framework for data re-use
- Have data subjects and community leaders been consulted throughout the project’s planning stage?
- Has a representative community panel been consulted on appropriate types of data re-use before the project start date?
- Are data subjects clear on which data activities are enabled by their consent? Specifically, is the upfront ability to opt-out of data re-use — including re-use of aggregated data — offered to subjects before the start of the project? Bolstering data literacy initiatives can help ensure that this consent language is clearly communicated to data subjects.
- Have legal agreements between data suppliers and data users been made publicly accessible? Are these documents publicly published? Are these documents translated into short, accessible language for the public?
- Has a transparency charter regarding the intentions, operations, parties involved, and outcomes of a data re-use effort been created and communicated to the public?
- Has the decision-making methodology, including how authorities collect, process, share, analyze, and re-use data, been shared with the public?
- Can the data provenance be tracked and justified for data re-use in this project? Are the scope and limitations of the data publicly represented in a transparent manner?
- Does the project have a third-party oversight board that influences data re-use and ensures it is carried out in an ethical manner?
WHEN: the duration of the data re-use effort, including data retention, termination, and modification
- Ensure the data is only held for as long as necessary to address the core issue or to answer the key question that is driving the re-use project.
- Have any future-oriented or exploratory analyses of the data received renewed consent from data subjects?
- Are data subjects informed on their ability to opt out of data re-use prior to the initiation of the new project/analysis?
- What are the best practices identified from this project? What are the areas of improvement? Gather end-to-end data feedback from stakeholders and collaborators highlighting challenges, risks, and opportunities at the planning, collecting, processing, sharing, analyzing, and re-using stages of the data lifecycle. Policies, procedures, and oversight should be designed and deployed with a focus on navigating inevitable shifts in circumstance over time. Have these findings been published to help future projects?
WHERE: the local focus and contextual and jurisdictional implications of a project
- How does the re-use of data address local, community-based problems and opportunities?
- How will the project ensure that the data is held for the shortest amount of time needed to reach its intended mission?
- What protocols are in place to protect subjects and areas included in sensitive aggregated location data?
- How are risks and challenges of geo-location identified, assessed, and mitigated by data stewards and third-party oversight boards?
- What protocols are in place for parties to relinquish access or destroy re-used data after it has served its purpose? How will these processes be verified?
- What is the process for parties to renew consent from data subjects to hold and study the aggregated data for a longer period of time if needed?
Read more about the Data Assembly and the results of The GovLab’s mini-publics at: https://thedataassembly.org/.