Leveraging and Sharing Data for Urban Flourishing

Testimony before New York City Council Committee on Technology and the Commission on Public Information and Communication (COPIC)

Delivered on February 12, 2019

Dear Speaker Johnson, Chairperson Koo, and members of the Committee and Commission:

We live in challenging times. From climate change to economic inequality, the difficulties confronting New York City, its citizens, and decision-makers are unprecedented in their variety, and also in their complexity and urgency. Our standard policy toolkit increasingly seems stale and ineffective. Existing governance institutions and mechanisms seem outdated and distrusted by large sections of the population.

To tackle today’s problems we need not only new solutions but also new methods for arriving at solutions. Data can play a central role in this task. Access to and the use of data in a trusted and responsible manner is central to meeting the challenges we face and enabling public innovation.

This hearing, called by the Technology Committee and the Commission on Public Information and Communication, is therefore timely and very important. It is my firm belief that rapid progress on developing an effective data sharing framework is among the most important steps our New York City leaders can take to tackle the myriad of 21st challenges.

My name is Stefaan G. Verhulst and I have been an NYC resident for the last 20 years. I am also the Co-Founder and Chief of Research and Development of The GovLab, an action research center based at the Tandon School of Engineering, New York University (NYU). Our mission is to improve people’s lives by changing and updating governance with new technologies. I am also the lead of The GovLab’s Data Program and am delighted to share some of the insights we have gained through our work with a variety of (city) partners on open data and data collaboratives.

I am joined today by some of my distinguished NYU colleagues, Prof. Julia Lane and Prof. Julia Stoyanovich, who have worked extensively on the technical and privacy challenges associated with data sharing. I will, therefore, avoid duplicating our testimonies and won’t focus on issues of privacy, trust and how to establish a responsible data sharing infrastructure, while these are central considerations for the type of data-driven approaches I will discuss. I am, of course, happy to elaborate on these topics during the question and answer session.

Instead, I want to focus on four core issues associated with data collaboration. I phrase these issues as answers to four questions. For each of these questions, I also provide a set of recommended actions that this Committee could consider undertaking or studying.

The four core questions are:

  • First, why should NYC care about data and data sharing?
  • Second, if you build a data-sharing framework, will they come?
  • Third, how can we best engage the private sector when it comes to sharing and using their data?
  • And fourth, is technology is the main (or best) answer?

Let me start with the first question: Why should New York City care about data and data sharing?

As a society we increasingly recognize and accept that opening and sharing data is essential for government transparency — the focus of today’s hearings. Open data can help shed a powerful light onto hidden corners of governance, ensuring accountability and transparency and, equally important, empowering citizens in the process.

But I am here today in part to tell you that data sharing has another powerful benefit. Our research at the GovLab shows that, when analyzed and used responsibly, data also has the potential to transform how city government works, enabling more agile and legitimate decision-making as well as more targeted and effective service delivery.

Our research suggests that this process happens through at least four pathways:

First, data can transform governance through improved situational analysis, which enables more targeted and effective interventions. An increased ability to access and analyze shared data can allow public officials across departments and agencies to better understand, often in real-time, trends in city activity. It can also help officials understand the geographic distribution of various phenomena, such as population flows and new business activity.

Traffic accidents and fatalities, the reduction of which is promoted by this city through the Vision Zero initiative, are two issues that could be better understood through real-time, anonymized traffic pattern and pedestrian data.

Second, data collaboration can also provide better insights into cause and effect of phenomena, allowing policy-makers to focus on root causes of public challenges rather than their symptoms. For example, one could use street-level images and pedestrian analytics to understand how infrastructure and design choices impact those with reduced mobility. Alternatively, one could look at vehicle movement patterns to recognize and measure the seasonality of different industries across the city.

Data collaboration can also improve predictive capabilities, enabling better planning and preparation. Consider, for instance, how emergency management officials might use data on fixed-street asset locations and commercial space usage. Before a large storm, this data might identify which areas are most at risk, either because they lack defenses from the elements or are near businesses that present unique hazards to residents. This information could, in turn, allow officials to use resources to mitigate those circumstances. It might also be used to support adequate planning and preparation, providing first responders with targeted information that could help them save lives if a catastrophe were to strike.

And, finally, data collaboration can improve governance by giving officials better ways to assess the impact of government programs and initiatives, thus enabling experimentation and evidence-based policy-making. For example, more data on commercial space usage and vehicle movements could help the city better understand the effect of construction on local businesses, allowing officials to offer aid and support where needed. During subway work on the 2nd Avenue subway, for example, local store-owners said business declined between 25 and 50 percent. Data would allow the City to provide a more specific cost estimate.

Based on the above, our overarching recommendation would be to increase awareness among city officials and employees about the value of data in decision-making and governance. Achieving this awareness, we believe will help drive data sharing and data-driven practices across the city.

Toward this broader goal, I have at least two specific recommendations. They are that this Committee may:

  • First, call for the creation of an urban evidence base, comprising illustrative case studies capturing the potential and impact of data collaboration within New York City and additional documentation and guidance that would serve to incentivize and empower city officials to do more with shared data. This knowledge hub would be akin to a portal we have created at the GovLab that similarly documents the potential of data sharing around the world;
  • Second, consider a directory of data science experts within city government that would complement a directory of data sets (collected or acquired by the City). Making the existing expertise more searchable and discoverable, city officials may be in a better position to seek help in establishing the value propositions outlined above.

The second question I want to ask is: If you build a data-sharing framework, will they come?

In recent years, governments have spent increasing amounts of time, resources, and effort to make government data accessible with the broad goal of making government more transparent and improving people’s lives. Indeed, the evidence (as revealed for example by our work at the GovLab) shows that this is true.

Yet, despite the irrefutable potential of open data, our work also shows that much of that potential remains untapped. This is, quite simply, the result of a certain market failure: Many (possibly most) open data initiatives are not designed with a focus on matching the supply of government data to the (possible) demand for it. In other words, much of the data released is released without a clear idea of what challenges it might address and of how it might be useful or used. As a result, it sits untouched and unused; its tremendous potential (not to mention the resources involved in releasing it) wasted.

So, in answer to the question “If you build it, will they come?” our answer is a resounding not necessarily. Data and data sharing through APIs and other technical means have very real potential. But I would caution, equally, that if data-sharing initiatives are not demand-driven, there is a very real risk of wasted time and resources and, as a result, a general loss of faith in the very idea of using data to solve public problems.

One of our key recommendations to this Committee is the city needs a much better understanding of data demand within and beyond city government. To avoid the types of market failures I’ve just talked about, government needs to ensure the data-sharing infrastructure proposed addresses a specific and identifiable issue, has a clear sense of the target audience of users, and — importantly — is actually usable by that audience.

Mindful of all these issues, the GovLab last year released what we call an Open Data Demand Assessment and Segmentation methodology that provides several recommendations to ensure that the release of data is more demand-driven.

For the purposes of this Committee, I would recommend to call for an exercise to establish demand by identifying key questions and problems the city faces. And, in particular, focus this exercise on identifying problem areas (such as affordable housing or traffic accidents) where there is a clear case data could provide most value.

This demand can be identified in several ways:

  • One way would involve every city agency developing a list of top ten questions that could benefit from bringing currently inaccessible data to bear and might, with such data made available for analysis, positively transform the way those agencies achieve their mission.
  • This could, of course, also be achieved through a top 100 questions list across city government or various other iterations. The point is to think through specific exercises to help identify real questions and real areas of demand. I am sure this will make the process of releasing data far more productive and useful.
  • Once questions are identified and their importance validated, engage in a data audit — i.e. review what data is necessary to answer these questions, and compare that with the inventory of data collected and/or acquired by the City as to enabling the matching of the data supply to demand. This effort could also inform the determination of who has authorized access to which datasets for what purposes.

The third question I want to ask today is: How can we best engage the private sector when it comes to sharing and using their data?

The reason we need to ask this question is because, even though many governments and cities around the world have embraced data and the principles of open data, the truth is that much data that could be relevant for policymaking still resides within the private sector. This is as much true for data itself as it is of data capabilities — the ability to process and analyze data, and to derive relevant insights.

So if cities are to address modern public challenges and improve people’s lives, they need to find ways to engage with the private sector — and in particular to gain access to privately held data. At the GovLab, we’ve spent a significant amount of time working on the notion of data collaboratives, which are an emerging form of public-private collaboration in which corporate-sector data is leveraged to help find innovative solutions to public challenges.

This is not the place to go into great detail on what data collaboratives are, the various ways they are organized and what they can do. We have established a repository of more than 150 examples of data collaboratives. Let me just give you a few examples of city data collaboratives in action, and how they’ve helped solve public problems:

  • In Chicago, data collaboratives helped public agencies, newsrooms, academics, and researchers better understand the local criminal justice system through the sharing of arrest and investigatory stop data, snapshots of the county jail population, and information on the State’s Attorney’s cases.
  • In Los Angeles, a data collaborative between LinkedIn and the Office of the Mayor of Los Angeles gave the mayor’s data team anonymized data on the tech talent in the city. This resource informed the workforce and educational policies of Los Angeles as it sought to enhance its technological talent.
  • In Singapore, crowdsourced, user-suggested routes helped improve the utility and responsiveness of Singapore’s private bus transportation services.

As for specific recommendations on how the city could consider accessing and using private sector data, I have a few, which I will outline here:

  • Commission a survey of privately held data sources covering New York City with the goal of better understanding whether they can help answer questions or solve problems identified through the process outlined earlier;
  • Identify and establish relationships with key individuals within corporations who are in charge of data and data sharing. We call such individuals “data stewards” and they are critical actors in any data sharing exercise. A New York network of data stewards could help to build momentum and establish good practices around the use of private-sector data assets that could provide value for the City, while also surfacing key challenges dampening the positive public impact of our current data age.
  • Finally, we believe that the notion of data collaboratives holds tremendous potential, and we recommend New York stake out a leadership role in testing and refining approaches for unlocking the public value of private-sector data. While many cities are rapidly adopting new “Smart City” tools, New York has an opportunity to support rigorous research and analysis of operational and governance models for data collaboration and benefit from more robust and evidence-based approaches to data-driven governance.

Moving on now to the fourth and final question I said I wanted to discuss today. That question was: Is technology the answer?

Anyone who has worked with technology in large organizations knows, as I just said in a slightly different context, the technology itself is often the least of the problems. Resistance to change and transformation (to new insights or new forms of innovation) doesn’t usually occur because of technical problems. It arises as a result of entrenched institutional and cultural resistance.

This is true of the private sector, and it is equally true of government. Establishing a responsible data-driven environment in the city will require nothing less than a cultural shift within agencies and all aspects of the government. Sometimes the cultural shift is needed to overcome overt resistance. Often, it’s required to deal with risk-averse behavior. For example, many city officials recognize the potential of data yet might still avoid sharing out of a fear of future penalties and sanctions for sharing that some might claim to be unauthorized or ill-advised. There is a general feeling (which is probably largely true) that they are more likely to get into trouble for too much sharing than for too little.

So, in answer to the question, Is technology the answer? I would argue technology is necessary, but it’s definitely not sufficient. We need to consider the whole issue of data sharing and data-driven governance within a much broader context.

Out of a wide range of possible options, I have three specific recommendations today for incentives and training approaches that could help drive cultural change:

  • First, this Committee should consider establishing a set of metrics to evaluate and measure agencies’ performance with regard to data sharing. These metrics need to include a number of indicators, including discoverability (e.g., through data directories); timeliness, to ensure insights are fresh and accurate; and usability, including but not limited to data standards and formats;
  • Second, to facilitate the necessary cultural shift, the city government should consider integrating data sharing and use into performance reviews and budget and personnel resource allocation for all agencies. This is an important step in creating the right incentives.
  • And, finally, consider creating a “data stewards” award that would be given to the agency or individual within an agency most successful at making data discoverable and leveraging that data to improve their mission. This could also include private-sector actors who have partnered with the city to establish data collaboratives. Such an award would add needed public recognition and appreciation for the value of data sharing in city government.

Summary and Conclusion:

These remarks cover what I wanted to talk about today. My testimony contains specific recommendations and steps for further action. If I were to summarize or encapsulate these points into four broader recommendations, recommendations that can later be filled out with detailed actions, they would include the following:

  • Raise awareness within city agencies about the potential positive impact of data sharing;
  • Find ways to better match supply and demand in the data equation, and in particular take steps to better understand the demand and need for data;
  • Take steps to leverage private data, especially through the use of public-private data collaboratives;
  • And finally, facilitate a cultural shift within city agencies to overcome overly reluctant and risk-averse behavior that prevents more data sharing and data-driven decision making.

There are, of course, various pathways to these four recommendations, but the important point is that they are not complicated or financially expensive. The potential of data sharing and collaboration is real and achievable.

What’s required to make it happen is leadership — to implement what is already mandated, to overcome bureaucratic inertia, and to transform how decisions are made and services provided. I know that members of this committee are committed to this endeavor, and I look forward to seeing the very positive results on life in this city. In the meantime, I’m happy to answer any questions.

Appendix 1: Taxonomy of Open Data Impact

Based on insights derived from 19 case studies, the GovLab has found that open data projects tend to fit into one of four overarching categories:

  • Improving Government: Boosting the effectiveness of institutions primarily by tackling corruption and increasing transparency, and enhancing public services and resource allocation;
  • Empowering Citizens: Allowing citizens to take control of their lives and demand change by enabling more informed decision making and new forms of social mobilization, both in turn facilitated by new ways of communicating and accessing information.
  • Creating Opportunity: Stimulating citizens and organizations by fostering innovation and promoting economic growth and job creation.
  • Solving Public Problems: Helping policymakers and citizens address intractable problems with new forms of data-driven assessment and by enabling data-driven engagement.

Appendix 2: Periodic Table of Open Data’s Impact Factors

Based on the existing literature and case studies, we have developed a Periodic Table of Open Data Elements detailing the enabling conditions and disabling factors that often determine the impact of open data initiatives. Although the importance of local variation and context is, of course, paramount, current research and practice shows that the elements included in five central issue categories — Problem and Demand Definition, Capacity and Culture, Partnerships, Risks, Governance — are likely to either enable or disrupt the success of open data projects when replicated across countries.

Appendix 3: Open Data Demand Assessment and Segmentation Methodology

The GovLab, in partnership with the Inter-American Development Bank, and with the support of the French Development Agency developed the Open Data Demand and Assessment Methodology to provide open data policymakers and practitioners with an approach for identifying, segmenting, and engaging with demand. This process specifically seeks to empower data champions within public agencies who want to improve their data’s ability to improve people’s lives.

Appendix 4: Data Collaboratives

Data collaboratives are cross-sector collaborations that use data and data science expertise to unlock the societal value of private-sector data. Increasingly, data collaboratives around the world generate insights in health, education, and crisis response, among many other sectors. The above graphics illustrate several notable examples of data collaboratives in the real world and the many forms in which a collaborative can take place.