Data Collaboratives: Sharing Public Data in Private Hands for Social Good

reposted from Forbes

n.b. The title of this piece is hard to pick because the subject of this article is not well known. It refers to those data held by businesses and corporations, which can be responsibly shared for public good.

Sensor-rich consumer electronics such as mobile phones, wearable devices, commercial cameras and even cars are collecting zettabytes of data about the environment and about us. According to one McKinsey study, the volume of data is growing at fifty percent a year. No one needs convincing that these private storehouses of information represent a goldmine for business, but these data can do double duty as rich social assets — if they are shared wisely.

Think about a couple of recent examples: Sharing private data helps to improve policy interventions. California planners make water allocation decisions based upon expertise, data and analytical tools from public and private sources, including Intel , the Earth Research Institute at the University of California at Santa Barbara, and the World Food Center at the University of California at Davis.

In Europe, several phone companies have made anonymized datasets available, making it possible for researchers to track calling and commuting patterns and gain better insight into social problems from unemployment to mental health. In the United States, LinkedIn is providing free data about demand for IT jobs in different markets which, when combined with open data from the Department of Labor, helps communities target efforts around training.

Private data sharing is also vital to improving the reproducibility of the science upon which policies are based. Ben Goldacre, a British physician and advocate for better scientific research methods, tells the startling story of how the global public health policy of giving over 2 million children in the developing world a deworming pill is, in fact, based on a flawed clinical trial conducted in Kenya a decade ago. The researchers who ran that study had the guts (albeit a decade after the fact) to hand over the raw data to other researchers to examine. What did they find? Spectacular errors. Yet this kind of independent check is rarely done because many researchers hoard their data lest someone else free ride on their hard work.

Indeed, medicine is one of the critical battlegrounds for data sharing. People assume that when their doctor prescribes a drug or implants a device that the science underlying the intervention is sound. That assumption is not always true because we simply don’t have all the relevant data. The AllTrials campaign, which launched in the U.S. this summer with support from 620 organizations, including dozens of patient groups and leading medical societies, points out that the results of up to 50 percent of clinical trials have never been reported, leaving doctors and patients half blind about the treatments they are prescribing and taking.

The campaign wants all trials registered and all their results reported. The National Institutes of Health has actually built a superb resource to do this with clinicaltrials.gov, an open access library where anyone in the world can register and report the results of their clinical trials. Yet, despite a legal mandate to do so in the US, compliance from the pharmaceutical industry and academia has been poor — perhaps because enforcement from the FDA has been negligible. But now companies are getting on board.

Sharing data can also be an opportunity for doing well by doing good. Global investors representing more than 3.5 trillion euros in assets backed the AllTrials campaign and demanded that the pharmaceutical companies in which their hedge and pension funds are invested move toward transparency. The move was a recognition that misleading claims about drug safety can lead to expensive lawsuits, product recalls, and declining market share, all of which are bad for bad for business. But it was also in recognition that companies such as GSK, Johnson & Johnson, and Bristol Myers Squibb have all created projects and platforms that have shown in the past few years that sharing data from the past can be done.

In fact, what’s happening now with the pharmaceutical industry and medicine has the potential to blow up scientific research — in a good way. The University of Rochester recently created an iPhone app to collect data on dexterity, balance, and gait from people suffering from Parkinson’s disease. Within six hours, they had enrolled over 7,000 people — recruitment that would have, typically, taken months to achieve. And as a condition of having their app in the Apple store, Rochester had to agree to make its raw research data available to other researchers and to patients. Not only can third parties scrutinize results, other researchers can use the same data to do different studies without imposing an additional burden on patients.

Despite the promise of data sharing, these kind of data collaboratives remain relatively new. There is a need to accelerate their use by giving companies strong tax incentives for sharing data for public good. There’s a need for more study to identify models for data sharing in ways that respect personal privacy and security and enable companies to do well by doing good. My colleagues at The GovLab together with UN Global Pulse and theUniversity of Leiden, for example, published this initial analysis of terms and conditions used when exchanging data as part of a prize-backed challenge. We also need philanthropy to start putting money into “meta research;” it’s not going to be enough to just open up databases: we need to know if the data is good.

After years of growing disenchantment with closed-door institutions, the push for greater use of data in governing can be seen as both a response and as a mirror to the Big Data revolution in business. Although more than 1,000,000 government datasets about everything from air quality to farmers markets are openly available online in downloadable formats, much of the data about environmental, biometric, epidemiological, and physical conditions rest in private hands. Governing better requires a new empiricism for developing solutions together. That will depend on access to these private, not just public data.