How to Scale Data Collaboratives?

Key Takeaways from the London Data Stewards Camp

On February 25th, The GovLab in partnership with the Open Data Institute hosted the third Data Stewards Camp in London, UK. The Data Stewards Camp series, part of GovLab’s Data Stewards Network initiative, seeks to convene responsible data leaders from across the private (and public) sector to collaboratively develop new insights, tools, and methodologies to inform more systematic, sustainable and responsible uses of private-sector data in the public interest.

Data Stewards Camp Participants

The initial Data Stewards Camp was held in San Francisco in May 2018, followed by the second event held in Cape Town in December 2018. The London Data Stewards camp brought together representatives from a number of companies and organizations — primarily based in the UK — to advance our understanding of the operational and governance models for data stewardship and data collaboration.

Over the course of the half-day workshop and public panel discussion hosted at the ODI, three key takeaways surfaced:

  1. The Need for More Sophistication about the Potential Value of Data Collaboratives
  2. The Opportunity to Experiment with Different Operational Models of Collaboration
  3. Fit-For-Purpose Governance Models Are Still Lacking

The Need for More Sophistication About the Potential Value of Data Collaboratives

In order to inform a more meaningful consideration of the potential viability and sensitivities of a data collaborative, there is a clear need to weigh the potential value against the risks. While there is clearly still more to learn and operationalize, the field has made significant progress in understanding the risks of data collaboration — including those related to individual and group privacy and unclear lines of accountability should a collaborative create harm, to name a few. However, to date the field has largely lacked an effective mechanism for understanding the potential value and impact of collaboration around private-sector data assets, as well as a methodology for assessing the risks and potential harms of not acting on particular opportunities for leveraging such data in the public interest.

To help define the contours of such a data collaborative value assessment, participants tested a five-pronged, question-driven approach for determining the potential public (and business) value of a particular data collaborative:

  1. What are the intended and potential societal benefits?
  2. What are the intended and potential corporate benefits?
  3. What are the intended and potential informational benefits?
  4. Who are the intended and potential beneficiaries?
  5. What are the potential risks of not sharing data or collaborating?

Participants were able to capture key considerations across each of these questions when testing the approach with diverse use cases — like data from financial institutions or telecom data brokers. The exercise also showed how difficult it can be to have a focused discussion on the potential value of data collaboration without the (important) consideration of risks somewhat diverting that process. Many agreed, however, that such an approach for becoming clear and granular about the potential value of a data collaborative — separate from the assessment of risks — can help data stewards more effectively weigh the potential risks of data sharing against the potential rewards, and subsequently make better-informed decisions about whether to pursue a collaboration.

The Opportunity to Experiment with Different Operational Models of Collaboration

After exploring means for capturing the potential value of data collaboratives, participants sought to identify pathways for determining optimal operational models for given opportunities. This discussion commenced with a deep dive look at ODI’s data access archipelago.

The discussion on the emergent data access models served as a jumping off point for the assembled stewards to consider how best to align the operational model for unlocking the value of their company’s data with institutional expectations and constraints. Other operational models discussed included the use of trusted intermediaries capable of teasing out insights from private-sector data and transferring those to the public sector; corporate data-driven intelligence products developed and made accessible by businesses in the public interest; and application programming interfaces (APIs).

These diverse operational models exist on differing spectra of data openness — with APIs representing the model with the least friction in terms of accessibility — and level of collaboration, with models like data trusts and research partnerships requiring particularly high levels of inter-institutional and inter-sectoral cooperation. Given all of this complexity and the many avenues available to data stewards interested in unlocking the value of their data toward particular ends, participants made clear that additional, targeted experimentation — including the ODI’s work on data trusts — around the many operational models for data collaboration will be essential to advance the field.

Fit-For-Purpose Governance Models Are Still Lacking

Finally, participants discussed three central models for governing data collaboratives to ensure that they are consistent with societal needs and undertaken in a responsible, accountable manner: institutionalizing data stewardship, data sharing agreements and contracts, and independent ethical councils.

During the discussion of concrete data stewardship roles and data-sharing contractual approaches in particular, the tension between innovation, discoverability, and flexibility on one hand, and more prescriptive and well-defined governance mechanisms continued to surface. Particularly as it relates to secondary uses of data, the ability to, for instance, pre-register acceptable projects and use cases could help to ensure that collaboratives function in a manner that is consistent with expectations from stakeholders and the public. Taking a page from the registration of research studies, this level of predetermination, however, could lessen the ability of those working to leverage data in the public interest to explore emergent opportunities, iterate, and course correct over time.

While ‘sandboxes’ — constrained spaces allowing for experimentation with new approaches in a safe environment — are often viewed as a means for testing new technologies, but some participants suggested that such an approach could be applied in the legal or policy context. Such a sandbox approach could allow for testing new contractual or licensing templates “in the wild,” while ensuring that any negative impacts can be absorbed and studied without negatively affecting stakeholders or citizens.

Ethical councils are an interesting governance model leveraged in, for example, the Orange Telecom Data for Development (D4D) challenge. This prize-backed challenge sought to unlock the economic development value of anonymized, aggregated Call Detail Records for the Ivory Coast and Senegal. To help flag research projects submitted through the challenge that might pose ethical risks, the D4D External Ethics Panel (DEEP) — comprising academic, business, public sector, and civil society representatives — was established and involved throughout the review process. Participants made clear, though, that ultimately ethics cannot be fully outsourced. Rather, they need to be embedded into standard data handling practices across the data lifecycle. This suggests that even in scenarios where independent ethical councils exist, internal data stewards will still play an essential role.

Finally, many participants felt that some sort of regulatory oversight and audit processes should be established. Absent periodic, independent ethical checks, there is some potential for irresponsible data processes to perpetuate and create harms. The idea of a Data Stewardship Standards Board, similar to professional organizations like the Bar Association for the legal profession, could help to set a broader ethical framework and mandate, while acting as an oversight and coordination body for the field.

Across all of these models, participants raised the key objective of integrating data collaboration governance practices into existing corporate compliance processes. Such a joined up approach could help lessen transaction costs for data stewards, and increase the sustainability of data collaboration programs over time.

Data Stewardship in Action public panel (left to right): Stefaan Verhulst, The GovLab (facilitator); Jo Rabin, Deutsche Bank; Elena Simperl, University of Southampton, Peter Jackson, Legal and General; Alice Petrova, Hazy; Peter Wells, ODI (facilitator)

Next Steps

At the close of the workshop, participants shared recommendations for how the field of data stewardship can be pushed forward in the near term. These ideas largely fell into three categories of work:

  • Continuing to develop and refine the profile and responsibilities of data stewards toward establishing a common understanding of needs and expectations across industries;
  • Building both a legal repository and an ethical repository capturing previous work and good practices around effective and legitimate governance of data collaboratives;
  • Activating the Data Stewards Network and the insights its generated to date to undertake targeted experiments that can help build an evidence base of what works in practice (and what does not).

Armed with these recommendations and informed by the thoughtful and complex discussions throughout the day, The GovLab is developing a set of tools and frameworks to inform more effective and legitimate approaches for assessing the value of data held in the private sector, experimenting with different operational models of collaboration, and defining fit-for-purpose governance strategies. Our goal is to empower data stewards and others interested in the public value of private sector data to further advance the field in a manner that ensures the continued datafication of society improves governance and subsequently the lives of people, while mitigating the risks of poorly governed data collection, processing, sharing, analysis, and use.