Making Reproducibility Reproducible

Tyler Whitehouse

Published in

Gigantum

6 min readApr 19, 2019

Reproducibility doesn’t have to be magic, anymore. This image is provided by Abstruse Goose under the Creative Commons License

TL;DR - We believe the following

Approaches to the transmission of scientific knowledge are currently broken, mainly due to the criticality of software in modern research.
Calling re-execution of static results “reproducibility” isn’t enough. Reproducibility should be functionally equivalent to collaboration.
Academic emphasis on best practices is ineffective and should switch to a product based approach that minimizes effort rather than maximizes it.
By focusing on the needs of the end user, people can actually improve how scientific knowledge is communicated and shared.

Reproducible Work

The transmission of scientific knowledge and techniques is broken, but you already know this because otherwise the title wouldn’t have caught your eye. Reproducibility is the poster child for this breakdown because it is a serious problem, but also perhaps because it is tangible. What is actually wrong with the transmission of modern science is much harder to put your finger on.

We don’t want to go all Walter Benjamin on you, but scientific interpretation in the age of digital reproduction needs a deep re-think. Since that is more appropriate for a dissertation, let’s just focus on reproducibility.

Historically, reproducibility was the manual re-execution and validation of a single result, something fundamental to science going back to the 17th century. Typically the job was to recreate physical effects or observations, and manual re-execution fit well with the written descriptions of scientific results.

As digital data collection emerged in the latter half of the 20th century, expectations for reproducibility added an archival and computational component. Theoretically, data and methods were enough to understand & re-execute a work, and the ease of digital transmission made reproducibility feasible.

But did it?

Looks hard. Doesn’t look fun. This image is under the Creative Commons License.

In the last 20 years, science continued to digitize, and complex software & intensive computation are now central to many investigations and fields. In this context, the research paper is often just a summary of activities expressed almost entirely through curated data and thousands of lines of code.

Expectations for reproducibility have changed again, and they now demand both software and data be available, while open access & open source increasingly go hand in hand. But barriers to transmission refuse to go away. Functioning software and code now depend on a techno-human tower that can fall if the slightest thing goes wrong.

Reproducible Environments

Very recently, containers (e.g. Docker & Singularity) have streamlined the creation of portable software environments. While their manual use requires significant labor and skill, they provide the most reasonable avenue to re-execution. Accordingly, expectations of reproducibility are undergoing an update and containers are a new requirement for the digital archive.

However, skill mismatches between research producer and consumer haven’t vanished yet. The ability to create and use containers presents yet another technical asymmetry that interrupts the communication & transfer of science.

Scientists will soon be subjected to a sequence of articles on best practices for containers, but the plethora of posts on Towards Data Science about “easy containers” should raise suspicion. Basic asymmetries in skill and resources between researchers still create significant difficulties in communicating and transferring scientific knowledge.

Very recently, thanks to software developers, there are user friendly frameworks, e.g. Binder, that provide streamlined re-execution environments for a broad audience. Such frameworks run in the cloud to provide ephemeral “re-execution as a service”, thus promoting wider access than previously conceivable just a short while ago.

But, we question whether “re-execution” is an actual measure of validation or reproducibility. It seems that we need something a bit more interrogative into the scientific process behind a given work.

Science needs functional tools that promote the collective effort, not a bunch of busy work. Image is a Detroit Industry Mural by Diego Rivera.

Reproducible Work Environments

When we began Gigantum, we wanted reproducibility to be part of the daily experience, not just the result of a post hoc process. Reproducible work and reproducible environments are all good, but we wanted reproducible work environments, i.e. to have the fruits of daily work be reproducible without extra effort or thought. This seemed the best way to make reproducibility reproducible.

Pushing reproducibility down into the daily experience moves it closer to real collaboration. Basically, if I can reproduce your work in my environment and you can reproduce mine in yours, then we can work together. It also means that I can interrogate your work and your process in a way that is natural to me.

So, we focused on the connection between collaboration and reproduction, and we formulated some basic requirements that we think functional reproducibility should provide:

Integrated versioning of code, data and environment;
Decentralized work on various resources, from laptops to the cloud;
Editing and execution in customized environments built by the user.

Basically, we think that the line between collaboration and reproducibility should be access & permissions, and that’s all. They should be functionally equivalent. The existence proof for this is that they are functionally equivalent for two super skilled users with enough time.

To go a step further, we wanted to eliminate the effort & skill needed to create reproducible work environments and to make this capacity broadly and automatically available. So we created something that does the following.

Focuses on the user experience and tries not to change how people work;
Runs locally on a laptop but provides cloud access for sharing and scale;
Automates best practices and admin tasks to save labor and add skill.

The first requirement is important for adoption because learning curves or interrupting functioning work habits won’t promote use. The second respects users’ daily lives, budgets and the occasional need for scale. The third forces the platform to promote efficiency & level out asymmetries.

The User Experience

A platform that satisfies the above requirements doesn’t just make reproducibility feasible. It makes it user focused.

We think it is clear that user focused approaches are best to promote real change. For example, the rapid adoption of Project Jupyter and Binder show how approaches can scale when they solve a difficult technical problem but also appeal to a broad user group.

When you think about it this way, the problem of reproducibility seems more like one of a bad user interface than a deep scientific problem. This is an oversimplification, and the issues from bad science or the inherent difficulties are deep and potentially worsening.

However, if you haven’t noticed, the business as usual approach of academia (i.e. more standards, more best practices & more training) isn’t working well. After years of urgency, it isn’t even clear if the results are mixed. Maybe they are just bad. For example, there still hasn’t been even modest adoption of best practices and results are published monthly that are not computationally reproducible. And it is for good reason.

For most researchers, the effort of manually enabling reproducibility is typically more trouble than it is worth. Basically, it boils down to bandwidth and priorities. Thinking that loosely enforced community standards can make reproducibility high enough priority that it commands sufficient bandwidth seems naive, especially if it may require potential career sacrifice.

So perhaps it is time to change the perspective from an academic one towards a more product focused one. To better promote reproducibility, people should account for the frailties of the human beings expected to enable it, and then create software that addresses those frailties. Even the champions of reproducibility sometimes have difficulty living up to their own ideals.

At Gigantum, we think that the best solutions improve the lives of even the most problematic user. We think it is important to make everybody on the team better, not just the most skilled. So, if people are pursuing approaches to reproducibility that don’t take end users into account, then we suspect that they may be wasting everybody’s time. And nobody can afford that.

Written by Tyler Whitehouse (CEO), Dean Kleissas (CTO), and Dav Clark (HDS) at Gigantum.

Follow us on Twitter or just try the software!

For the last two years we worked hard to develop the MIT licensed Gigantum Client. While it isn’t completely finished yet, you can read about it here, try it here, or even download it here.