The potential of Data Collaboratives for COVID-19

Data & Policy Blog
Data & Policy Blog
Published in
9 min readApr 1, 2020


By Stefaan G. Verhulst

We live in almost unimaginable times. The spread of COVID-19 is a human tragedy and global crisis that will impact our communities for many years to come. The social and economic costs are huge and mounting, and they are already contributing to a global slowdown. Every day, the emerging pandemic reveals new vulnerabilities in various aspects of our economic, political and social lives. These include our vastly overstretched public health services, our dysfunctional political climate, and our fragile global supply chains and financial markets.

The unfolding crisis is also making shortcomings clear in another area: the way we re-use data responsibly. Although this aspect of the crisis has been less remarked upon than other, more obvious failures, those who work with data — and who have seen its potential to impact the public good — understand that we have failed to create the necessary governance and institutional structures that would allow us to harness data responsibly to halt or at least limit this pandemic. A recent article in Stat, an online journal dedicated to health news, characterized the COVID-19 outbreak as “a once-in-a-century evidence fiasco.” The article continues:

“At a time when everyone needs better information, […] we lack reliable evidence on how many people have been infected with SARS-CoV-2 or who continue to become infected. Better information is needed to guide decisions and actions of monumental significance and to monitor their impact.”

It doesn’t have to be this way, and these data challenges are not an excuse for inaction. As we explain in what follows, there is ample evidence that the re-use of data can help mitigate health pandemics. A robust (if somewhat unsystematized) body of knowledge could direct policymakers and others in their efforts. In the second part of this article, we outline eight steps that key stakeholders can and should take to better re-use data in the fight against COVID-19. In particular, we argue that more responsible data stewardship and increased use of data collaboratives are critical.

I. Theory of the Case: How Data — and Data Re-Use — Can Fight Pandemics

A range of evidence exists from around the world to suggest that better use of data can help to address many complex public problems, including climate change, crime, economic inequality, and public health crises. At The GovLab, the action-research center at New York University which I co-founded, we have repeatedly seen that the sharing and reuse of aggregated and anonymized data — e.g., from telecommunications, social media, satellites, and Internet of Things sensors — can improve traditional models for tracking disease propagation and other questions.

Such efforts often occur when private companies repurpose their data, collected during the normal course of business, toward public ends. For instance, telecommunications data from the company Orange has been re-used to support the response to Ebola in Africa; and data from Telefónica in Mexico has been deployed to combat swine flu. Social media data from Facebook has likewise been used to understand public perceptions around Zika in Brazil while satellite data has helped track seasonal measles outbreaks in Niger. In addition, geospatial data has supported malaria surveillance and eradication efforts in Sub-Saharan Africa and, more generally, many infectious diseases have been monitored using mobile phones and other mobility-related data.

The potential and realized contributions of all this data teach us several vital lessons. Chief among these is the importance of re-using data, particularly involving entities working toward the public good leveraging data held by the private sector. In all the examples mentioned above, the potential of data was realized when companies provided public, non-profit, research, or civic organizations with functional access to privately held datasets. We call such partnerships data collaboratives. They are an emerging form of public–private cooperation deployed around the world, often with significant results.

The second important lesson is that the supply of and demand for data is often widely dispersed across geographies and sectors. Combined with a shortage of data expertise, particularly within smaller non-profit and civic organizations, this fragmentation leads to tremendous inefficiencies that stunt data’s potential and limit its ability to mitigate complex public problems. The problem of fragmentation is exacerbated by conflicting legal jurisdictions and often poorly communicated and misunderstood regulatory regimes.

These inefficiencies in data collaboration lead to costly delays in response times, lost opportunities to save lives and livelihoods, and a persistent lack of preparation for future threats. Much potentially impactful data is never made accessible to those who could productively use it. Much of the data that is released is never used in a systematic and sustainable way due to limited discoverability, poor quality, and a lack of data expertise among recipient organizations. The European Commission’s Expert Group on Business to Government Data Sharing, of which I am a member, recently stated:

“[M]uch of the potential for data and its insights to be used for the benefit of society remains untapped […] Due to organisational, technical and legal obstacles (as well as an overall lack of a data-sharing culture) business-to-government (B2G) data-sharing partnerships are still largely isolated, short-term collaborations.” (Executive Summary, p. 7.)

II. Seven Steps Toward Better Re-Use of Data to Fight COVID-19

How do we overcome these shortcomings? Based on the European Expert Group’s final report, our own research, and that of others (see a full list here), we have identified seven steps that should be taken immediately by policymakers and other stakeholders. These steps would enhance the data collaboration ecosystem, helping to ease inefficiencies and bottlenecks. In the process, they would make the re-use of data a much more potent weapon in the fight against Covid-19, as well as other future pandemics.

1. DEVELOP A GOVERNANCE FRAMEWORK: The lack of clear, consistent national and transnational governance frameworks is one of the main factors currently limiting the potential of data re-use. Hence, it is crucial public and private actors, as well as civil society, work together to develop or clarify a clear set of laws, regulations and norms to govern the trusted re-use of privately held data for the public interest. This framework should include governance principles, open data policies, trusted data re-use agreements, and transparency requirements and safeguards. In addition, it should include accountability mechanisms, including ethical councils, that clearly define duties of care for data accessed in emergency contexts and do not obligate new and additional data collection by the private sector.

2. BUILD CAPACITY: Collaboration is also frequently limited by a lack of capacity — technical, financial and otherwise — to re-purpose and re-use data. It’s therefore essential that policymakers and other stakeholders work to increase the readiness and operational capacity of the public and private sectors to re-use and act on data, for example by investing in training, education, awareness building and reskilling for lawmakers and civil servants. Building capacity also includes increasing the ability to ask and formulate questions that matter and that can be answered in a meaningful way by data. Currently, much data is released by organizations without any clear purpose or clearly understood end use. Identifying and articulating a list of priority questions, as well as metrics to assess impact, could facilitate more targeted and rapid responses by data holders when societies are confronted with crises.

3. ESTABLISH DATA STEWARDS: Private, public, and civil society entities should create and promote the position of Chief Data Stewards within organisations. Data stewards can take the form of individuals or groups within organisations. They are tasked with identifying and nurturing potential collaborations, as well as with identifying data that could be shared toward the public interest. Data stewards can also lead efforts to measure impact and ensure that any insights that result from sharing are actually acted upon. Importantly, Data stewards should be mandated with protecting potentially sensitive information and ensuring the re-use of data does not violate privacy.

4. BUILD A NETWORK: Parties across sectors should work together to establish a network of data stewards. This community of practice could coordinate and streamline efforts and provide greater transparency on current work on data stewardship and collaboration. Its mission, objectives, participants, and criteria for participation should all be made open to the public, and its activities should be undertaken in an inclusive manner.

5. ENGAGE CITIZENS: Citizens should be encouraged to co-create data collaboratives for well-defined and documented public interest purposes of their own choice. To enable this, governments and corporations should promote user-friendly crowdsourcing and data donation mechanisms. These mechanisms should clearly articulate to citizens how their data will be responsibly used, re-used, and protected. In general, efforts should be made to make more transparent to citizens what the benefits of data collaboration could be for them personally, and for society at large.

6. UNLOCK FUNDS: Funding from a variety of sources, including crowdfunding, should be unlocked and sustained without the use of heavy-handed procurement. Funders should support data systems and infrastructure with an eye toward future crises as well as current challenges. A system of pre-qualified recipients should be established to facilitate rapid access to funds and other resources during the early stages of a crisis. Other incentives for data collaboration should also be established, including public recognition of private companies and civil organisations that engage in data collaboratives, so that they may be eligible for funds. In addition, existing schemes that exist to incentivize or encourage data collaboration should take societal priorities into account when making decisions on how to allocate funds.

7. PROMOTE TECHNOLOGICAL INNOVATION: For all the progress in data methods and new hardware and software, technology continues to be something of a bottleneck. With the support of governments and foundations, data scientists and researchers should co-design and co-develop technologies that can help implement data collaboration at scale, in a responsible and sustainable way. This collaborative research should be as transparent and interdisciplinary as possible, and could focus initially on core needs such as privacy-preserving technologies, security technologies, and access-control technologies.

One overarching component of our framework is essential to emphasize, and applies to all of the above action items. Sharing is not, and cannot become, a pretext for privacy violations or erosions. Reuse must only take place on aggregated and anonymized data within the confines of a strict set of rules that ensure privacy. Societal benefits cannot come at the cost of individual rights (or else they are no benefits at all). As explained above, data stewards will play an important role in ensuring these protections. Working with other internal and external stakeholders, they can ensure privacy protections, and that the potential of personal data to solve public problems is never used as justification for the limitation of individual liberties.


Needless to say, these seven steps are not silver bullets. They will not, on their own, eliminate existing roadblocks to data collaboration. They will also not single-handedly allow us to overcome the global pandemic the world now confronts, but this is a long, hard-fought battle, and every inch counts.

Data can be a potent weapon, especially when it is re-used responsibly across actors and sectors. The recommendations we outline here represent an important first step toward unleashing that potential. At The GovLab, we plan to continue our research — in collaboration with our many partners, including the 300 individuals that have signed our Call for Action from around the world — toward identifying legitimate and effective ways to implement the above steps.

The above article is based upon a Call For Action: Toward Building The Data Infrastructure And Ecosystem We Need To Tackle Pandemics And Other Dynamic Societal And Environmental Threats. It has received almost 300 signatories.

The author would like to thank all signatories and in particular Ciro Cattuto and Richard Benjamins for their leadership and support in the initiation of the call; and Andrew Young, Andrew J. Zahuranec and Michelle Winowatan, all at The GovLab, for their research support.

Stefaan G. Verhulst is Co-Founder and Chief Research and Development Officer at The GovLab, and an Editor-in-Chief of the open access journal Data & Policy published by Cambridge University Press. For more information about to contribute to Data & Policy, read here.



Data & Policy Blog
Data & Policy Blog

Blog for Data & Policy, an open access journal at CUP ( Eds: Zeynep Engin (Turing), Jon Crowcroft (Cambridge) and Stefaan Verhulst (GovLab)