Most research I publish will be wrong. And I’m OK with that
A computational guy’s take on the “reproducibility crisis”
Science is in the grip of a crisis. A “reproducibility” crisis. Big results, published in big journals by big names, cannot be reproduced by others. Results considered “facts” have turned out to be an illusion. Large-scale team efforts to reproduce scores of studies in psychology and cancer drugs have failed to reproduce the majority of the original results. These failures have caused a sharp intake of breath across science: “is it us too? Can we reproduce our results?”
This all makes for great headlines, editorials, and think-pieces in the glamour journals of science. And, increasingly, in national newspapers. The fervent rhetoric often invokes the thought that the Death of Science (*) will visit us all, one by one…
Into the lab swoops a towering, hooded figure, face obscured in darkness, long fingers, white bones, twirling a sharpened pipette.
“YOU” the Death of Science points at the lab head “YOU HAVEN’T BEEN USING ENOUGH SAMPLES IN YOUR EXPERIMENTS. MOST OF THE EFFECTS YOU’VE FOUND DON’T EXIST.”
“Really?” squeaks the lab head, “I didn’t know, you see the P-value was less than…”
“I’LL STOP YOU RIGHT THERE. YOUR TIME HAS COME. I’VE COME TO TAKE YOU TO THE OTHER SIDE”
“NO YOU IVORY TOWER FOOL. THE OTHER OTHER SIDE”
“Oh thank god. Cambridge is a dump.”
But why on earth do we expect science to be right all the time? Imagine if all published papers were right! Imagine how fast society would progress! (For good or for ill, who knows). It seems we have forgotten the basic perspective on any scientific result: don’t believe it is true. Don’t take published papers at face value.
How basic is this perspective? So basic that it is one of the first things we teach our undergraduate students. When first exposed to journal papers, students have two reactions: (1) I don’t understand a word of this, please can I have my textbook back? (Answer: no, because the textbook is wrong). And (2) it’s published, so it must be true. We train them that this is nonsense. We train them into how to think about papers, how they are, in all probability, wrong. How to not defer to authority, but be constructively sceptical – to ask of a paper basic questions: “do the claims follow from the actual results?”; “What could have been done better?”; “What could be done next?”.
And not because we think the authors of the paper are lying to us. Or crap. But because science is hard. Experiments are hard. There are many uncontrolled, often unknown variables. So we teach that one, individual paper on a set of experiments cannot be taken as the truth. For that, we need to reproduce the results.
Have we forgotten these lessons ourselves? Is science undergoing some form of collective amnesia?
Perhaps because I’m a computational scientist – building models of neurons and bits of brain using maths and computer code – I have a different view: I expect research to be wrong. Because I expect my research to be wrong. Before I’ve even started it. (Incidentally, this is why the computational researchers in any field – systems biology, neuroscience, evolution, etc – are such a miserable bunch).
Models are wrong from the outset. A model tries to capture some essence of an interesting problem – how neurons turn their inputs into outputs, for example. A model does not capture every detail of the system it tries to emulate – to do so would be folly, as this would be like creating a map of a country by building a perfect scale model of its every bump, nook, and cranny. Pretty to look at; useless for navigation. So models are wrong by design: they leave out the detail. They aim to be useful. And because our models are guesses at the true, underlying essence of the problem most of them will be completely wrong, not useful. We expect our models to fail; especially when dealing with something as complex as the brain, or as human behaviour.
Here models also means statistical models. Statistics is, in essence, how to summarise the complex, messy real world into a small set of numbers, small enough that we can hope to understand them. When a study announces a decreased risk of dementia if you spend less time hitting yourself in the head as a child (watching my toddler whack his teddy on his head), then that is a single number, a single summary of the relationship between two quantities (amount of dementia; amount of juvenile bonce-bashing). That single number boils down the messy complexity of hundreds or thousands of data-points – of measurements of hundreds of people’s dementia and, separately, of how much they hit themselves in the head as a child. That number is wrong. We know it’s wrong. We don’t know how much it is wrong. If a little bit wrong, then the effect of decreased risk is still there. If a lot wrong, then the effect is not there.
Here’s the important bit: it doesn’t matter what the statistical test says. The test tells us that the number is likely to fall into some range, given the amount of data we have, and our assumptions about the data. Which include, among other things, that the data were measured correctly (if not, then the number is, of course, wrong). And that we used the right way of simplifying the data down to simple numbers (if we didn’t, then the number has no relationship with reality). And, most importantly, that there is actually, truly an effect in the real world. That it actually exists. If it doesn’t exist, it doesn’t matter how great our statistics are.
The mundane truth is that much of what cannot be reproduced is because the original studies either used crap statistics or were unlucky. (And in some cases the inability of the original authors to understand their own results). No cheating, no faking data, no outright lying. Because of the crisis, we have now had many warnings about doing the wrong statistical tests; about doing the right tests, but without enough samples; and about our unconscious desire to torture the data until it confesses to a result. All need heeding. All will improve science.
(It is, after all, very important that tests of experimental drugs tests are reproducible, because lives depend on those tests. And in the case of drugs tested on particular types of cell, the inability to reproduce the results seems as much down to the cells being mislabelled – they are not what it said on the tin they came in.)
Even then, the crisis seems to be largely one of perspective, not statistics. We need to remember a simple truth: Every paper is an idea – it says “hey look at this, this could be cool”. Not “This is The Truth.”
The Death of Science popped in to have a word with me a few weeks back
“I WAS TOLD YOU DON’T HAVE ENOUGH SAMPLES”
“Mate, I’m analysing some pre-existing data to test ideas, search for clues, get some angle of attack on the brain. I can’t know I don’t have enough samples, until I know the effect size I’m looking for. Besides, I know that whatever I find is wrong. So, you know, I’m fine”
“DO YOU HAVE ANY CUPCAKES?”