The bioRxiv Wall of Shame

If bioRxiv’s relationship with their policies was on Facebook, it would be “It’s Complicated”. Sometimes bioRxiv allows reviews, sometimes they don’t. Sometimes bioRxiv allows scientific criticism, sometimes they don’t. They aren’t supposed to allow protocols, but this looks like a protocol. They didn’t allow opinion pieces, but after some prominent authors posted one they changed their policies to allow “white papers”. If a Nobel Laureate clearly violates policies it’s swept under the rug, but if a grad student tries to get credit for their work they are given a week’s notice before their preprint is taken down.

Preprints are meant to be a way for authors to quickly share their work. When someone reads a preprint there is an expectation that it is new research not already published, and that any comments made on the preprint may be incorporated into a revision. The preprint community revolves around these assumptions.

Consistent with this, bioRxiv has a policy that preprints must be submitted before acceptance to a journal, although there is some debate about whether that has always been the policy. I happened to serendipitously come across an article in violation of this policy, and notified bioRxiv about it. Not only did they not do anything about the article, but they wouldn’t approve my comment on the article alerting readers that the preprint had violated bioRxiv policies. It’s quite a system they’ve got there: not only do they not enforce their policies, they prevent people from pointing out policies are being ignored.

Although I wasn’t able to post at bioRxiv, I was able to post at PubPeer. But having recently become aware that there are numerous other preprints which are also violating this policy, and that there is a method to identify them, I realized it wasn’t fair to single out this one preprint. So the only solution was to identify these other preprints and also alert the community about them.

Unfortunately, the Crossref data doesn’t have the date of acceptance, which is what I was interested in. So the best I could do is look at which preprints were posted near their publication date, then manually check the date of acceptance, which sometimes required downloading the PDF. This was tedious work, but on the bright side I’m really familiar with journal websites now.

I have a list of 205 preprints which appear to be in clear violation of bioRxiv’s current policy. I went about posting all of these at PubPeer, which served the dual purpose of alerting both the community and the authors of their transgressions since I filled out their email addresses, but people complained about the precious PubPeer feed getting cluttered so I’m going to work with PubPeer to get these posted via an automated process.

This number could be just a fraction of the number of preprints which are actually postprints. There are a lot of preprints which were posted near the publication date, but I couldn’t find an acceptance date at the publisher’s website. I’m fairly sure these preprints were posted after acceptance, but I can’t be positive so I didn’t include those. Also, with my method I’ll miss any preprints which were published months after acceptance. And of course I’ll miss any preprints which aren’t linked to their paper at Crossref — and I have no way of knowing what percent this might be. So taken together, there could be a considerable number of people using bioRxiv as a postprint server…and getting away with it.

It seems that one reason bioRxiv doesn’t view these postprints as a problem is that they consider them to be rare.

Maybe 205 is a small %, but that’s just how many I could find in a quick pass. How many would I have to find for them to think it’s a problem?

The real problem is that they themselves don’t even know how many there are. You can post a postprint and they will have no idea. And worse, even if they find out about it they’ll just say it’s not a big deal.

But it is a big deal. Open science is something that employers are beginning to take into account when hiring. The fact that you can post a bunch of postprints at bioRxiv, then list them on your resume as evidence for your dedication to open science is disgusting. Your preprint should be posted early in the publication process, not after it was accepted, not when it is about to be accepted, and definitely not after it was published. Our preprint has already been covered by The Economist but still hasn’t been submitted to a journal. Our goal wasn’t to publish a paper, it was to share our results and code, which is what open science is supposed to be.

I’m sure some will claim most of these cases simply represent mistakes, or unfortunate coincidences where a preprint just happened to be posted right when it was accepted at a journal. But some authors are habitual line steppers, so I don’t think it’s a coincidence that authors are doing this. I genuinely think authors are waiting for their paper to get accepted, and then, and only then, are posting a preprint as a means to generate interest in their soon-to-be-published paper.

And a lot of people will probably claim that this is a great way for authors to make their work open access. You know what else is a great way of doing that? Posting your work on your website, Figshare, or literally anywhere that doesn’t involve you lying about your manuscript not being peer reviewed.

I do have some sympathy for authors whose paper is accepted, and upon realizing how long it will take for the journal to publish it decide they might as well post it to a preprint server. But not much sympathy. If they were truly interested in sharing their results before publication they would have posted the preprint at the time of submission to the journal. And again, work can be shared without violating the policies of a server.

And by exposing these authors some might claim I’m hurting the preprint movement since authors will be discouraged from posting future preprints. Good, because these authors aren’t posting preprints, they are posting manuscripts which are about to appear online. Their “preprints” won’t be missed.

You can’t expect a cop to catch bank robbers then get upset when they give you a speeding ticket. Data thugs simply have the ability to notice transgressions, and then go the extra step of documenting them, reporting on them, and if necessary, publicizing them. I’m sure Brian Wansink never thought anyone would actually check if his pizza slices added up and thought he had successfully conned the government, Cornell, and seemingly all of food psychology. Similarly, I’m sure authors posting preprints which aren’t actually preprints never thought someone would notice or call them out, and bioRxiv probably thought no one would go to the trouble of doing their job for them. I know it’s hard to believe, but not everyone in academia is a sheep. Some of us are actually paying attention.

Authors who are pretending to do open science should be embarrassed. Unfortunately, my previous attempt at trying to make people feel shame just resulted in conspiracy theories for why clowns were at a Nazi rally.

And the conspiracy theories for why I’m doing this have already started.

Creator of PrePubMed and OncoLnc