Science’s two reproducibility problems

Frankl talks to Lenny Teytelman of about reproducible methods, cancer biology, and whether science is in crisis

Photo by Ousa Chea on Unsplash

It was, inevitably, Carl Sagan who said it best:

The scientific way of thinking is at once imaginative and disciplined… It urges on us a delicate balance between no-holds-barred openness to new ideas, however heretical, and the most rigorous scrutiny of everything — new ideas and accepted wisdom.

The quote is from Sagan’s 1995 book The Demon-Haunted World, in which he warned of the encroaching dangers of superstition and conspiracy theories — the kind of magical thinking in which anything goes. Science, he argued, is the antidote, the “candle in the dark”, marrying openmindedness with skepticism so that ideas become part of the scientific canon only after “rigorous scrutiny”.

This at least is how science is supposed to work. But the past few years have left many worrying that the necessary “balance” between novelty and rigour has been lost. Across diverse fields of research, scientists are finding that, when they repeat experiments published by other scientists, they get very different results. The implication is that a large proportion of research published in scientific journals isn’t actually true.

One widely recognised contributor to this “reproducibility crisis” is publication bias — decisions on whether or not to publish a scientific study often depend on the results of the research. All else being equal, studies that show an interesting effect are more likely to be published than studies looking for the same effect but finding nothing. Inconvenient data are either wrangled into a publishable form or are quietly forgotten.

But publication bias is far from the whole story. Just as problematic is the limited information that scientific journals provide. Years of work are condensed into a couple of pages giving only a brief summary of the research methods and analysis steps involved in a study.

Anyone wishing to replicate the study finds themselves having to interpret that verbal description and translate it back into a sequence of actions. Important details are glossed over or get lost in translation. Even the original researchers can struggle to reproduce the sequence of actions that they themselves carried out.

Like GitHub but for science

A few weeks ago, I had the pleasure of talking to Lenny Teytelman, founder and CEO of It’s been described as “GitHub but for science”. But as Lenny was keen to point out, and GitHub developed independently. It’s a case of “convergent evolution”, he says.

In essence, is a platform for researchers to record the steps taken in acquiring scientific data. It can provide a far greater level of detail than scientific journals allow (although of course the journal articles can link back to the Protocols page for the study). As with GitHub, version control allows teams to collaborate in developing and refining scientific protocols, which can in turn be shared, copied, and forked.

As Lenny and I discussed, there’s a close alignment here with our mission at Frankl. The applications we’re building help automate or standardise the collection of scientific data leading to greater consistency. And while we’re initially building applications ourselves, our longer term goal is to help scientists share their own applications with the wider community, making it easier for other researchers to closely replicate their research.

Reproducibility in cancer science

Our conversation turned to a report published recently in Science, announcing a significant down-scaling of the Reproducibility Project: Cancer Biology.

The backstory is this. In 2012 scientists from biotech company Amgen published a brief article in Nature, noting industry concerns about the reproducibility of published research. Entire fields of medical research had been spawned, they argued, based on preclinical research that was not reproducible. As a result, many patients had “subjected themselves to a trial of a regimen or agent that probably wouldn’t work.”

Given these concerns, the Amgen scientists routinely replicate published studies before they attempt to build on them. But of 53 replication studies conducted, just 6 had provided results comparable to the original published study.

These and other worrying reports led in 2013 to the Reproducibility Project: Cancer Biology — an attempt to independently replicate 50 “high impact” findings from the field. To date, 10 completed replications have been published, of which, five were essentially successful and three were inconclusive.

But just eight more replications are expected to reach completion. The remaining 32 have been discontinued due to a combination of unforeseen costs and delays in troubleshooting and optimizing experiments in order to achieve meaningful results.

What do we mean by reproducibility?

Five definite successes out of eight conclusive results seems at face value to be a good return. It’s certainly a better percentage hit rate than Amgen reported.

However, it’s important to make a distinction between two different senses of reproducibility:

  • Methods Reproducibility asks whether independent researchers can closely reproduce the experiment based on publicly available information.
  • Results Reproducibility asks whether, given the same methods, similar results are achieved.

In other words, Methods Reproducibility is a prerequisite for Results Reproducibility. Any study that was discontinued because the methods were inadequately described is a reproducibility fail.

As the Science article notes, some of these studies have been successfully replicated by independent labs. One possibility is that these successful replications relied upon information that was not provided in the the original reports.

However, the lesson from psychology’s replication “crisis” is that there are often ways of tweaking the analyses to get the desired or expected results. If researchers set out with the attitude that if they fail to replicate an effect, they are doing something wrong, the research literature can be flooded with apparent replications of an effect that doesn’t exist.

This, of course, is where a resource like Protocols can help. By providing a much more detailed step-by-step guide to conducting the experiment, it improves Methods Reproducibility. It means too that, when replication attempts achieve different results to the original study, we can begin to try and work out why.

Lessons from history

The conversation with Lenny reminded me of an essay on tumour virology by Daniel Kevles in “Hidden Histories of Science” — a story that hinges on failed and successful replication.

It’s now widely accepted that some viruses can cause cancer. HPV — the human papillomavirus — is a notable example. The idea has it roots in the work of Peyton Rous whose experiments on tumours in chickens, published in 1911, implicated some form of infectious agent. But throughout the first part of the 20th century, Rous’s work was considered irrelevant to human cancer.

Ludwick Gross

In 1966, Rous received the Nobel prize for his work. But the hero of Kevles’ story is Ludwick Gross who, in the 1940s, conducted a series of studies from a makeshift lab in the Veterans Hospital in the Bronx. (According to Wikipedia, Gross had to keep the laboratory mice in the trunk of his car). The experiments involved grinding up the organs of mice that had leukaemia, passing the material through a filter, and then injecting the filtrate into newly born mice. The filtration process removes cellular material (including bacteria) but viruses are small enough to pass through. Gross’s prediction, then, was that the mice receiving the injection would develop leukaemia. And this is exactly what happened.

Gross’s colleagues, however, remained unconvinced. It wasn’t just that the idea was controversial or that Gross himself did not have a stellar reputation in the field, nobody could replicate his findings. The problem, Kevles notes, was that scientists attempting these replications failed to stick closely to Gross’s methods. They’d used filtrates from a different leukaemia. They’d tested a different strain of mice. Or they’d waited some time after the mice were born before injecting them.

It wasn’t until the mid 1950s, when Jacob Furth, a respected cancer scientist at Cornell University, performed an exact replication of Gross’s methods, that anyone successfully replicated his results. Suddenly credible, the field of tumour virology took off. By systematically tweaking Gross’s methods, scientists were able to identify a range of viruses that provoked tumours, not just in mice but in rats, hamsters, apes, and cats. The success of today’s HPV vaccine, for example, can be traced back to Gross’s original work and — crucially — Furth’s faithful replication.

The lesson, of course, is that details matter. Changing the parameters of an experiment can lead to changes in the outcome. What looks at first glance like a failure of results replication may in fact be a failure of methods replication. If those methods aren’t carefully described or aren’t carefully followed, then we shouldn’t expect the results to replicate. And so when we ask whether science has a reproducibility problem, we really need to break that question down. Methods reproducibility has to be addressed before we can get a true sense of how reproducible the results might be.

Science in crisis?

The other lesson is that the problems facing science are not new. When we talk about a reproducibility “crisis”, the implication is that reproducibility today is demonstrably worse than it has been historically. But we don’t know that is true. And, as Lenny points out, whatever its problems, science is still making progress. Cancer research again stands as a good example — five year survival rates for many forms of cancer have been inching upwards for several decades.

What matters, Lenny says, is that we can do better. Science doesn’t need to be in crisis for us to recognise the importance of improving reproducibility, both of methods and results. is a step in that direction — and it’s where we’re aiming with Frankl too.

In science, we tend to think of progress in terms of the new ideas, the flashes of brilliant inspiration, or the serendipitous discovery that changes everything. But as Sagan reminds us, that’s only half of the story. The other half, equally important, is the rigorous process of sorting the wheat from the chaff, deciding which ideas are worth pursuing further.

After all, the science stories we remember are the stories that turned out to be true. The heretical ideas that survived scrutiny. The chance observation that, once scientists knew what to look for, replicated again and again.

At Frankl, our mission is to make open science easy and rewarding for scientists. If you’d like to know more, you can read our whitepaper, check out our website, follow us on Facebook and Twitter, or join our Telegram channel.