The “Onion Model”: A Layered Approach to Documenting How the Third Wave of Open Data Can Provide Societal Value

Andrew J. Zahuranec
Aug 26 · 7 min read

This piece, written by Andrew J. Zahuranec, Andrew Young, and Stefaan Verhulst, was originally published on the Open Data Policy Lab blog, which contains information that supports decision-makers at the local, state, and national levels as they accelerate the responsible re-use and opening of data. To learn more please visit https://opendatapolicylab.org/.

Photo by Wilhelm Gunkel on Unsplash

There’s a lot that goes into data-driven decision-making. Behind the datasets, platforms, and analysts is a complex series of processes that inform what kinds of insight data can produce and what kinds of ends it can achieve. These individual processes can be hard to understand when viewed together but, by separating the stages out, we can not only track how data leads to decisions but promote better and more impactful data management.

Earlier this year, The Open Data Policy Lab published the Third Wave of Open Data Toolkit to explore the elements of data re-use. At the center of this toolkit was an abstraction that we call the Open Data Framework. Divided into individual, onion-like layers, the framework shows all the processes that go into capitalizing on data in the third wave, starting with the creation of a dataset through data collaboration, creating insights, and using those insights to produce value.

This blog tries to re-iterate what’s included in each layer of this data “onion model” and demonstrate how organizations can create societal value by making their data available for re-use by other parties.

The Data Lifecycle

Data — which can be generated from things like smartphones, scientific studies, and financial transactions — is an instrumental part of our modern world. The growing capacity to generate it has created new opportunities to study complex problems while, at the same time, created new risks related to privacy and surveillance.

Though it can be used responsibly or poorly, data is not a simple asset. It is the result of a process known as the data lifecycle that includes:

  • Collection: Gathering data from surveys, censuses, voting or health records, business operations, web-based collections, and other relevant, accessible sources.
  • Processing: Removing irrelevant or inaccurate information, reformatting contents to be interpretable by an analytic software, and otherwise validating the data collection.
  • Sharing: Accessing the data with relevant collaborators with the intent of deriving insights from it.
  • Analyzing: Assessing the data collection with a goal of extracting insights about the issue they are studying.
  • Using: Acting on the insights derived. These actions can affect data collected for future operations.

Increasing Access to Data through Data Collaboration

Once data has been generated, it needs to be made accessible to those who can use it. Though we live in a data of era abundance, all too often data generated resides in silos controlled and monetized by companies. New models for collaborating and accessing public and private-sector data, such as open data platforms or data collaboratives, can break these silos.

As we’ve discussed elsewhere, data collaboratives are a new form of collaboration, beyond the public-private partnership model, in which participants from different sectors exchange their data and data expertise to create public value. They include:

  1. Public Interfaces: Organizations provide open access to certain data assets, enabling independent uses of the data by external parties.
  2. Trusted Intermediary: Third-party actors support collaboration between data providers and data users from the public sector, civil society, or academia.
  3. Data Pooling: Data holders agree to create a unified presentation of datasets as a collection accessible by multiple parties.
  4. Research and Analysis Partnership: Organizations engage directly with public-sector partners and share certain proprietary data assets to generate new knowledge.
  5. Prizes and Challenges: Organizations make data available to participants who compete to develop apps; answer problem statements; test hypotheses and premises; or pioneer innovative uses of data for the public interest and to provide business value.
  6. Intelligence Generation: Organizations internally develop data-driven analyses, tools, and other resources, and release those insights to the broader public.

Insights

From these data collaboratives, organizations can analyze the data. This analysis can be used to look forward or backward, providing information on a problem related to:

  • Situational Awareness: Answering what happened;
  • Cause and Effect Insight: Answering why it happened;
  • Prediction: Answering what will happen; and
  • Impact Assessment: Answering what happened following an intervention.

Enabling Conditions

Data filtered through individual projects can produce tremendous insights. However, broader changes to the data ecosystem are needed to make data more broadly accessible and enable future work.

These changes can be supported by organizations taking deliberate steps to be more open. As we argue in The Emergence of a Third Wave of Open Data, a more open ecosystem can be enabled by organizations that:

  • Publish with Purpose by matching the supply of data with the demand for it, providing assets that match public interests;
  • Foster Partnerships by forging relationships with non-professionals (e.g. small businesses and civic groups) who understand how data can inform meaningful real-world action;
  • Prioritize Subnational Efforts by providing resources to cities, municipalities, states, and provinces to create new subnational data sources; and
  • Center Data Responsibility by promoting fairness, accountability, and transparency across all stages of the data lifecycle.

Value

The transformation of the ecosystem, when done effectively, responsibly, and in accordance with local expectations, can produce large-scale, real-world value. These types of value can be categorized as:

  • Improving Governance: Insights from data can improve how organizations operate by making their processes more transparent to others, improving resource allocations, and enhancing their ability to deliver services.
  • Empowering People: Insights can empower people by communicating information they need to meaningfully act and make decisions about the challenges they face in their lives.
  • Creating Opportunity: Data-driven insights can inspire organizations to innovate in how they operate. For businesses, this innovation can be about identifying new business models while governments might use information to inform policy directed at economic well-being.
  • Solving Public Problems: Insights can optimize processes and services and better identify the needs of those who rely on those services. It can enable data-driven assessments of the environment and more targeted interventions.

Riding the Third Wave of Open Data: Priority Actions

Finally, these values can be best achieved by embracing an approach to open data that facilitates them. This approach, which we call the Third Wave of Open Data, takes a much more purpose-directed approach than prior waves; it seeks not simply to open data, but to do so in a way that focuses on impactful reuse, especially through inter-sectoral collaborations and partnerships. The Third Wave pays at least as much attention to the demand as to the supply side of the data equation, and it is concerned not simply with data itself but with the broader technical, social, political and economic context within which data is produced and consumed.

This Third Wave can be enabled by organizations adopting eight key actions:

  1. Creating and Empowering (Chief) Data Stewards: Developing and nurturing responsible data leaders to support impactful data re-use.
  2. Fostering and Distributing Institutional Capacity: Taking steps to avoid consolidating and siloing of data skills and resources, and instead catalyzing such capacity to filter into daily institutional operations;
  3. Articulating Value and Build an Impact Evidence Base: Demonstrating the concrete, tangible value of increased access to and re-use of data;
  4. Supporting New Data Intermediaries: Engaging actors who can lower transaction costs in data collaborative relationships
  5. Establishing Governance Frameworks and Seeking Regulatory Clarity: Creating safeguards to mitigate risks of harmful outcomes;
  6. Creating the Technical Infrastructure for Re-use: Investing in innovative and sophisticated technologies to improve data use on data supply and demand sides;
  7. Fostering Public Data Competence: Engaging citizens to promote wider use of data informed by local contexts and priorities; and
  8. Tracking, Monitoring, and Clarifying Decision and Data Provenance: Capturing data-handling and decision-making processes to ensure coordination.

***

This piece describes a model for thinking about data and data re-use. The “data onion,” also known as the Third Wave of Open Data Framework, untangles the different processes involved in (re)using data to generate value and promoting a more open data ecosystem.

This information can promote better and more impactful data management. We encourage data stewards and anyone else interested in learning more about how they can use these processes to read our Third Wave of Open Data Toolkit. Additional insights can be found on our website.

Data Stewards Network

Responsible Data Leadership to Address the Challenges of…