Toward A Periodic Table of Open Data in Cities
by Andrew Zahuranec, Adrienne Schmoeker, Hannah Chafetz and Stefaan G Verhulst
In 2016, The GovLab studied the impact of open data in countries around the world. Through a series of case studies examining the value of open data across sectors, regions, and types of impact, we developed a framework for understanding the factors and variables that enable or complicate the success of open data initiatives. We called this framework the Periodic Table of Open Impact Factors.
Over the years, this tool has attracted substantial interest from data practitioners around the world. However, given the countless developments since 2016, we knew it needed to be updated and made relevant to our current work on urban innovation and the Third Wave of Open Data.
Last month, the Open Data Policy Lab held a collaborative discussion with our City Incubator participants and Council of Mentors. In a workshop setting with structured brainstorming sessions, we introduced the periodic table to participants and asked how this framework could be applied to city governments. We knew that city government often have fewer resources than other levels of government yet benefit from a potentially stronger connection to constituents being served. How might this Periodic Table of Open Data Elements be different at a city government level? We gathered participant and mentor feedback and worked to revise the table.
Today, to celebrate NYC Open Data Week 2022, the celebration of open data in New York, we are happy to release this refined model with a distinctive focus on developing open data strategies within cities. The Open Data Policy Lab is happy to present the Periodic Table of Open Data in Cities.
Separated into five categories — Problem and Demand Definition, Capacity and Culture, Governance and Standards, Partnerships, and Risks and Ethical Pitfalls — this table provides a summary of some of the major issues that open data practitioners can think about as they develop strategies for release and use of open data in the communities they serve. We sought to specifically incorporate the needs of city incubators (as determined by our workshop), but the table can be relevant to a variety of stakeholders.
While descriptions for each of these elements are included below, the Periodic Table of Open Data Elements in Cities is an iterative framework and new elements will be perennially added or adjusted in accordance with emerging practices.
We invite you to contact us with your thoughts and suggestions about the elements listed and to reach out if you see any important elements missing. Over the coming weeks, we will make our own revisions (including engaging with stakeholders to see if we can assign each element an “atomic weight” according to importance).
Please do not hesitate to email us at firstname.lastname@example.org.
We look forward to your contributions.
Problem and Demand Definition
Open data initiatives tend to be more successful and avoid the notion of, “if you build it, will they come,” when they are clearly optimized for an intended audience or user base from the start. Part of the Third wave of open data means publishing with purpose, understanding how data could yield impact and what the constituencies for it would be. This work requires researchers to study and understand the community in which they operate and to ensure their efforts match its technical, social, political, and economic context.
Understanding a problem exists is one matter. Knowing how it can be best addressed is another, one best achieved by refining the problem at hand. As we note in our Third Wave of Open Data Toolkit, a well-defined problem leads to targeted solutions where it is possible to understand who data work will help and how. After identifying a general issue it wants to improve, an organization can refine its area of concern toward an actionable problem with a clear, measurable outcome that will benefit a specific audience. Organizations might mind it helpful to frame their concern as a research question, one answerable through data science. Drafting might take several iterations.
Benefits and Goals
In the past, open data advocates have tended to argue for increased data re-use by relying on normative arguments. Proponents claimed open data enabled greater transparency or provided accountability. While these arguments can be persuasive in some contexts, private-sector leaders, government officials, and the public often need to understand how it will specifically benefit them. Otherwise, open data becomes another “nice-to-have” instead of an immediate need. In these circumstances, it is often better to appeal directly to personal or organizational interests, to provide simple explanations of how open data will support their short, medium, and long-term goals.
Data Audit and Inventory
Once the problem and value proposition are in place, practitioners are able to explore the availability of the datasets they hold.. A clear problem definition can help to uncover which internally held data sources could add value and inform strategies for collecting or accessing that data.
Data Ecosystem and Stakeholders
Just as no person is an island, no data project exists in a vacuum. Cities often have a multitude of overlapping government agencies, businesses, and nonprofits who are interested in and have collected data about community-specific problems. Identifying who these stakeholders are and what data assets they have can allow organizations to avoid unnecessary data collection and creation. Conversely, if a scan of the data environment reveals an absence (or inaccessibility) of a particular kind of data, this information can be used to motivate new collection.
Capacity and Culture
Issue salience directly affects how much and how quickly people and their leaders address issues. When topics are at the top of the agenda — due to some crisis, a sudden awareness of long-standing injustice, or some pressing need — it is easier for organizations to launch projects related to them. Organizations might seek to connect their data-driven initiatives with some issue of special importance within the community they operate or otherwise understand how a local area’s priorities intersect with the issues they want to work on.
Open data portals have been key in enabling data openness, combining various institutional datasets, and allowing users to browse, filter, search, and download data to their machines. While the open data portal format will likely remain a common piece of technical infrastructure, new and sophisticated technological developments could facilitate greater collaboration and responsibility in data re-use. These developments could include improved computing capacity to analyze large datasets and new and secure ways of transmitting data. To facilitate this improved technological development, intersectoral, multidisciplinary research and development effort will be useful.
Institutional Data Capacity
Organizations across sectors can increase the societal and organizational value created through data reuse by bolstering their personnel’s data skills and ensuring those skills are distributed throughout the organization. When data skills and resources are relegated to small teams or units, organizations are unlikely to maximize the societal and organizational value of data re-use. Instead, capacity needs to be distributed evenly to ensure people in all parts of the organization understand the data they have, can use it to create value and are willing to forge internal and external relationships around it. Focused efforts to invest in, foster, and distribute data skills can help an organization become more evidence-based and systematic across all its operations.
Facilitating greater data competence within the general public is an important step to ensuring that it can receive greater benefits from data re-use and data openness, as well as face fewer risks from it as data subjects. To advance the full participation of the general public in data efforts there is a need to foster its data competence, going beyond the fundamental need for data literacy. This can bridge the gap between the public and the data ecosystem so that the public could both participate in and contribute to data efforts. This approach can provide the means necessary to address the persisting differences in power in the current data and digital era, as well as guarantee novel productive capacity while enabling creativity. Consequently, empowering the public to see itself as a producer of data, will put it in the ‘position to negotiate’ the ways in which data is re-used by different stakeholders.
This work falls into the larger category of public engagement. People will not engage with systems if they do not know they exist and do not know how they can help them. By working with residents and helping them understand and trust institutional systems and processes, organizations can ensure their efforts are adopted by the broader public.
Culture and Institutional Buy-in
As with all projects, data-driven initiatives need to have advocates within the organization sponsoring them. By gaining broad internal approval, especially from leadership, open data advocates can ensure they have the resources needed to unlock institutional resources and overcome roadblocks. They can obtain this support through a clear articulation of the potential business case and purpose of the effort, recognition of institutional priorities (and pitfalls), an understanding of how the work would fit into the broader data economy and ecosystem, and a clear strategy for mobilizing existing tools and resources to operationalize the strategy.
Trust and legitimacy are key in the planning processes pertaining to data re-use. In order to ensure that data re-use initiatives create a public good, they need to obtain a social license. This term means exercising the necessary due diligence and engaging with all relevant stakeholders to ensure that data re-use is aligned with public and stakeholder concerns and expectations. To make sure that data and technology are used responsibly, it is important that both the benefits and the risks associated with them are evaluated by local stakeholders.
Responsive Feedback Loops
Open data initiatives tend to be less successful when they do not create meaningful mechanisms for users and beneficiaries to provide input to demand-side practitioners. By allowing experts, leaders, and the public to comment and addressing that feedback in a clear and direct manner, organizations can find new opportunities to create value and address harms they might have previously missed. They can also promote public trust by demonstrating to individuals that their concerns have been heard.
Resource Availability and Sustainability
Technological innovation and infrastructure development are often cost-intensive exercises with extended time frames. Organizations need internal and external sources of funding before they can systematize impactful and responsible data reuse.
Governance and Standards
Data licensing regimes refer to the conditions under which an institution (such as a government, business, or nonprofit) is able to use and re-use data. These regimes provide a way to secure and promote the re-use of data, either among a set of actors or among the public. They can require interoperability, articulate permissions, and conditions around use, redistribution, modification, separation, compilation, non-discrimination, propagation, and/or application. Selecting a fit-for-purpose data license requires assessing different licensing regimes' benefits and challenges, and could potentially involve the development of a new, customized data license to meet organizational needs.
Open data projects are better positioned for success when practitioners develop and monitor metrics of impact to inform management and iteration.
In some cases, data projects can be advanced despite some level of risk. In such cases, practitioners must ensure that projects that deal in the information that is potentially personally identifiable (including anonymized data) have outlined and implemented a clear, upfront strategy for addressing risks created by open data use.
Open by Default
A central principle of the Open Data Charter is the notion of “open by default,” the presumption that data should be published in absence of a legitimate justification to the contrary. Given the level of government resource allocation and time investment required to implement strong open data initiatives, high-level political buy-in and codified open data policies are needed to provide the incentives and flexibility to government officials to meaningfully advance open data goals.
Policies and Procedures
Clear policies pushing forward access to information and data — such as codifying data stewardship positions, supporting mechanisms that encourage personnel to bolster their data skills, and “Data for Good programs — can act as important drivers for open data initiatives. Without explicit policy backing, the sustainability of open data efforts can be called into question, and access to necessary data can dry up at any time. The existence of Freedom of Information policies can also provide means for accessing relevant information, though often at a much slower pace than open data.
Contracts and Data Sharing Agreements
Contracts and data sharing agreements are written agreements that establish the terms for how data is shared between parties. These documents are important, not merely for outlining roles and responsibilities but also for ensuring there is accountability and trust between parties and avoiding misunderstandings. These documents can play a key role in governance between parties and are often a prerequisite in any data reuse.
A widely prevalent challenge to positive impact arises from poor data quality. Data quality is an issue in developed countries, but often presents even greater barriers to success in developing countries. Quality issues can manifest in a number of ways, like inaccurate information, a lack of completeness in official datasets, out-of-date data, or otherwise corrupted datasets. Data quality can also include issues of data standardization. When combining multiple datasets, organizations also need to ensure that these resources can be used together to generate insights.
As stated by data.gov, data standards are “technical specification that describes how data should be stored or exchanged for the consistent collection and interoperability of that data across different systems, sources, and users.” These technical specifications are what make it possible to share, exchange, and combine data. They provide a framework for data collaboration. Standards can include topics such as machine readability, the ability of data to be read by a mechanical device, and involve matters such as data portability, the right and ability of data subjects to move their information from one controller to another.
Although open data is meant to provide value to data users without any direct engagement with data holders necessary, partnering with entities on the supply side (including government and private sector actors) can help to fill data gaps and enable higher impact data use.
Matching the supply of data with those who demand it can be costly in terms of time, resources, and staff. Organizations need to identify relevant partners, develop the data infrastructure and capacity necessary to handle new information flows, and negotiate legal agreements. Any one of these actions can be difficult for an organization — especially a small one — and can dissuade data collaborative efforts. Dedicated data stewards devoted to facilitating collaboration can be useful in addressing some or all of these issues. Data stewards are important actors in the Third Wave by making the data value chain more fluid, working to facilitate data collaboration and lowering transaction costs between those supplying the data and those using it.
In many cases, demand-side actors’ expertise lies in technology or data science rather than the problem areas they seek to address through the use of open data. Tapping into the knowledge of stakeholders with relevant sector-specific expertise can improve efforts to optimize and target open data efforts based on a true understanding of needs, opportunities, and barriers.
Open data practitioners can extend their capacity by collaborating with like-minded organizations, institutions, or individuals who are not directly related to the issue at hand but, nonetheless, invested in a project’s success. These groups can include open data advocates and academics who see a data-driven initiative as a way to foster data openness and develop an evidence base. It can include journalists and members of the public who might raise awareness of the activities that an institution is undertaking. It can also include other governments, businesses, and nonprofits who see an effort as a model for their own work and a way to identify new, promising practices. These stakeholders can fill various gaps in human or technical capacity if engaged. They can also promote trust and legitimacy with other third-party actors.
Risks and Ethical Pitfalls
Privacy concerns probably rank among the most commonly cited worries over opening up data. Especially in conflict-stricken regions, individuals’ anonymity can be of life-or-death importance. Potential privacy harms can arise even from the release of ostensibly anonymized personally identifiable information (PII). Although the vast majority of open data efforts seek to anonymize or otherwise limit the release of PII, it is important to recognize that a lack of sophistication in anonymization or aggregation efforts can result in the inadvertent release of sensitive information51. In addition, in some instances information that itself poses no privacy concerns can be combined with other openly available datasets; the aggregated and linked information can lead to unexpected disclosure of personal data, such as bringing together open data on political activities with separately accessible information on a person’s location or place of work, for example.
Because much government data contains sensitive information regarding individuals, industries, and national security, opening that data often leads to quite reasonable questions about data security. Cybersecurity remains a challenge across the world, and perhaps especially so in developing countries, which may lack the technical expertise to adequately protect information from sophisticated hackers and other intrusions. At the same time, though security concerns are very real and important, they must be balanced against the opportunity cost or risk of not sharing data; often, government decision-makers can lean on tenuous security concerns to justify keeping data closed and restricting access, potentially limiting the solution space.
Whether related to humanitarian efforts, crisis relief, or the livelihoods of vulnerable populations, data-driven efforts in developing economies can be literally life-or-death affairs. Given the many challenges and obstacles involved in open data projects, it is important to recognize the risks inherent in basing such life-and-death decisions on information that could be incomplete, out-of-date or otherwise faulty. The broader point is this: insights generated from data are only as good — and their impacts only as positive — as the quality of the underlying data.
Entrenching Power Asymmetries and Inequities
Although data can be empowering, it can also consolidate or reinforce existing privileges and authority inherent in societies. This problem is closely linked (though not restricted) to digital divide challenges; when only the elite of society have access to data and/or data science capabilities, releasing data is likely to disproportionately benefit that elite. Open data projects need to work hard to ensure that their social and economic benefits are widely, and evenly, distributed. Projects must be built with an understanding of how they interact with a society so as not to deepen inequities.
The term “open washing” has taken hold in practitioner circles over recent years describing the risk that governments and other institutions may seek to leverage the enthusiasm for open data to avoid more difficult and potentially transformative openness and transparency efforts.
Legal and Regulatory Requirements
City, national, and international organizations are increasingly pursuing ways to regulate the use of data and algorithms in the places they operate. These standards — which include the European Union’s General Data Protection Regulation, California’s California Consumer Privacy Act, and Children’s Online Privacy Protection Act — place real and specific requirements on data holders and users to minimize the potential harm to data subjects. The costs for violating these provisions can be significant for both the institution found in violation (in terms of financial or legal penalties), data subjects (in terms of directly exposing them to harm and malicious actors), and the public (in terms of undermining trust in other data initiatives). Organizations must be cognizant of the legal and regulatory frameworks in the areas that they operate and have the resources needed to remain within them.