I’m going to use a liberal definition of the word “reproducible”. In my mind reproducibility and replicability have the same relation as accuracy and precision. Just as you can precisely miss your target and be inaccurate, you can have an experiment or code that reliably gives incorrect results. When I think of something that is reproducible I think of something that is done correctly. If you tell me you used a certain algorithm to solve a problem it shouldn’t matter what language I implement it in. But if the algorithm is wrong then even if you provide me the code I won’t get the answer I want.
I often see scientists afraid to put their code on GitHub because they are embarrassed about its quality. I’ve never had this fear. I’ve never claimed my code runs as fast as possible or follows any standard conventions. But I do try and guarantee that my code does what it is supposed to, and get scared when I think it might not.
There is nothing more terrifying than using your own software for a project. Any time you get unexpected results you can’t help but think there is a colossal error somewhere and you need to warn everyone about the mistake and notify the journal that you need to issue a correction. Luckily more often than not you look into the problem and it turns out nothing was wrong.
Using your own software is kind of like driving a car you designed. Obviously you believe in your design so you would feel safer driving in your own car than someone else’s, but that doesn’t mean you don’t assume the worst when you hear a strange noise. If you weren’t driving your own car you probably wouldn’t think anything of it, but the fact that you could have made a mistake that affects the lives of thousands will cause you to immediately pull over and get to the bottom of the noise you heard.
If however you built a car that no one else drives, you probably wouldn’t care if it experiences some issues once in a while. And this is exactly what currently happens in scientific research. Each lab has its own way of doing things: its own reagents, its own protocols, its own software. Results in the lab are consistently consistent with previous results in the lab, because the protocols and reagents and software and sometimes even the people have never changed. If someone in the lab suggests there might be something wrong they are told everything is okay, this is how it has always been done.
Scientists must be more trusting than the average citizen. Would we trust restaurants to follow regulations without health inspections? Every lab is isolated and free to do things however they think is best. Never is there a time when someone outside a lab inspects a lab’s practices. And no, radiation safety checks don’t count.
If a lab is making errors, or worse just making up data, it is impossible for researchers outside of the lab to tell. The protocols aren’t available. The data isn’t available. Just some high profile publications with pretty pictures. And do you really think researchers don’t realize this? I’m sure researchers love not having their methods scrutinized. Even if they mess up no one will ever know or be able to prove it.
All of this changes when researchers share data, protocols, code. And no, “as performed in reference X” is not a protocol. When researchers provide enough information to reproduce the study then the study can become reproducible. The chance of mistakes goes down because the fear of having other researchers find mistakes will cause researchers to be more careful. If there are mistakes researchers can notify the authors and a correction can be issued.
Yes, there will be a lot of false alarms where someone thinks there is a mistake when there isn’t. Most of the time when researchers contact me about errors in OncoLnc there aren’t any. But one time there was an issue and the quality of the data in OncoLnc got improved. Isn’t that worth it?