We talked to over 100 researchers about open science, this is what we learned.

Johannes Beil
amie.ai
Published in
4 min readJan 19, 2019
Photo by Ousa Chea on Unsplash

We at amie.ai spend a lot of time talking to researchers about how they handle their data. Our vision is to create the infrastructure for massive research collaborations, so the question about open sourcing their data comes up naturally.

Open science has come a long way. It is now widely agreed upon that data should be accessible after the publication. If there has been a successful research project, the evidence that supports the findings should be open. So far so good, but we are still a long way from collaborating openly on ongoing projects. The real strength of open research would come from outside perspectives, new ideas, big data analytics, and the chance to utilize all of the not published data from unsuccessful or inconclusive projects. Most researchers we talked to were in principle not opposed to that either, if they can be sure to be registered as the ones who found it first, but would probably also need some incentives to do so. This aligns well with more extensive surveys. Now if most of them agree that it would be a good thing to share every single data point, and it’s just a question of a timestamp on a server to verify who did it first, why is it not done more? And why is the chance to get on another publication or contribute to society not already a strong enough incentive?

To answer this, we looked at the place where sharing is already strongly incentivized and sometimes enforced: the team. It turns out, even there, sharing can be prohibitively difficult, due to the way modern research handles data:

Photo by Jeremy Olson on Unsplash

Data silos
Researchers use a combination of software tools and instruments to take and analyze data. To make the input efficient, they use different media to store the data: Thoughts in a paper notebook, raw data from an instrument on disk, analysis code in a Jupyter notebook on GitHub, tracked parameters in a spreadsheet. This quickly creates data silos, because none of the storage media are set-up to connect to each other easily, and the connections are in the researcher’s head. If she wants to update a colleague, she has to describe the workflow and recreate the links in an email or a presentation, which again becomes a data silo.

Photo by Ricardo Viana on Unsplash

Consistency
Since research deals with the unknown, there are few conventions on how to name steps and parameters in the context of experiments and what to store. New things to name come up all the time, so the naming happens on the go. We have seen every imaginable way to name temperature, and laboratory notebook entries range from extremely detailed narratives to not human-readable lists of acronyms and numbers.

Photo by Simson Petrol on Unsplash

Digitalization
Most researchers still use a paper notebook as a central source of truth in their experiments. The paper notebook has stood the test of time because it is incredibly flexible, easy to use, relatively secure, and hard to falsify.

Photo by Cesar Carlevarino Aragon on Unsplash

We need tools!
This by no means exhaustive list shows that there are many practical hurdles to overcome before researchers can collaborate seamlessly. There is more standing in the way of open research than the fear of getting scooped on a publication or a patent. Researchers currently need powerful incentives to share their data, because it is not just about putting all of their data on a publicly accessible server every night, it takes significant time and effort to bring it into a form that makes it useful to other researchers, which can be similar to time it takes to write a publication. With the current pressure to continually publish, it is no wonder that journals for negative results are not taking off. If researchers already spend half of their time publishing, they will choose the success stories, over mountains of inconclusive data.

For open science to become a reality, we need the tools that allow researchers to efficiently share every single datapoint instantly in a way that expresses how and why it was created.

Check out amie.ai to see how we are building part of that toolset by putting research data on a graph.

--

--

Johannes Beil
amie.ai
Editor for

Former quantum physicist, now helping research with software at amie.ai.