Let’s build data-sharing into the scientific workflow

Published in

Frankl Open Science

5 min readMar 29, 2018

At Frankl, one of our key aims is to make data sharing part of the everyday workflow of scientists — for the benefit of everyone.

Data sharing is an important part of open science. Not only does it increase trust in the findings reported in the scientific literature, but it allows the data that scientists have worked hard to collect to be repurposed — to answer other questions or combined with other datasets.

In principle, this should be happening already. When a paper is published in a scientific journal, the authors typically make a declaration that other researchers can contact them and ask for the data. But in practice this rarely happens. Researchers are busy. They miss the email. They’ve moved on to other projects and can’t remember where the original data are or what all the different variables mean.

The problem is well documented. For example, a 2011 survey of 1,329 scientists concluded as follows:

Scientists do not make their data electronically available to others for various reasons, including insufficient time and lack of funding. Most respondents are satisfied with their current processes for the initial and short-term parts of the data or research lifecycle (collecting their research data; searching for, describing or cataloging, analyzing, and short-term storage of their data) but are not satisfied with long-term data preservation. Many organizations do not provide support to their researchers for data management both in the short- and long-term.

More recent studies suggest that, despite various initiatives, very little has changed.

Enter DataThief

The problem of data sharing is one I’ve been thinking about for some time.

Back in 2012, I wrote a blogpost about an application called DataThief, which helps you “steal” data from graphs. It allows you to re-analyse the data, re-visualize it in a way that shows different patterns, and identify individual data points that are having an outsized effect on the overall conclusions.

DataThief was a bit of fun. It’s limited to the data that researchers happen to disclose in their published figures. But it got me thinking about the importance of sharing the underlying data that supports a study’s conclusions:

The data reported in a journal article are really just a snapshot of the actual data recorded, filtered through the authors’ preconceptions about what questions are interesting to ask and how to go about doing that. There’s an imperative to present the data in a neat, sanitized package, with all the rough edges and anomalies smoothed out; to tell a coherent story that will convince reviewers and editors that it’s worthy of publication in a reputable journal. Years of work and terabytes of data may be compressed into just two or three pages.
In an ideal world, when a paper is published, researchers should also be able (and encouraged) to publish the data on which the paper is based, as well as the script showing exactly how those data were analysed.
There are, of course, many obstacles in the way and questions to be answered before this becomes standard practice. Who would host and maintain the data? Just how raw should the raw data be? What if the authors are writing multiple papers based on the same data set? Who gets credit for reanalyses of the data set? What happens if a reanalysis shows up an error in the original paper? If the research involves human participants, how do we reassure them that their anonymity will be maintained?
Undoubtedly, there are many more problems that I haven’t thought of. But, as scientists, we need to work through these issues and find ways to set our data free.

Data sharing Frankl-style

Six years later, Frankl is taking us some way to addressing these issues. Our aim — to use today’s marketing terminology — is to make data sharing “frictionless”.

The idea is this: Data collected via Frankl apps will be automatically shared in secure data repositories. We’ll develop standardized data protocols meaning that all Frankl apps will code variables in the same way.

We’ll ensure that the data are accompanied by an explanation of what each variable means. And this will all be built into the app. So no effort is required on the part of the researcher.

Sharing the data will then be a simple case of changing the access privileges to make the data public or share it with specific trusted individuals.

There are two other key features of Frankl.

First, Frankl applications will have the capacity to put metadata — data about the data — on the Ethereum blockchain, providing a permanent record of the data’s existence and making it easier to track it down.

Second, the Frankl token will provide an incentive for researchers to share their data as openly as possible.

At present there’s no incentive to share data. A minority of scientists do so because they think it’s a good thing to do. And some scientific journals mandate that data are shared when they accept a study for publication. But the Frankl token means that data sharing (and other wholesome open science practices) can be rewarded. It offers carrot as well as stick.

There are many ways to do this but here’s one scenario:

A scientist uses a Frankl app to collect data. Each time they use the app, they make a micropayment in Frankl tokens. A portion goes to the app developer to reward them for making a great app and incentivize them to keep building. The remainder is held back so, when the researcher releases the data, they get a refund. Which they can then use for their next study, collect more data, and share it with the world.

Frankl Open Science

Frankl is (and always will be) a work in progress. We don’t have a monopoly on good ideas, so we’d love your feedback.

If you’d like to know more, you can read our whitepaper, check out our website, and follow us on Facebook and Twitter @FranklOpenSci.

You can also chat directly with us via our Telegram channel: you can download the Telegram app here and find us at t.me/franklcommunity.

Let’s build data-sharing into the scientific workflow

Enter DataThief

Data sharing Frankl-style

Frankl Open Science

Written by Jon Brock