BioRxiv: A postprint server?

Europe PMC recently started indexing preprints, which is great because the reason I made PrePubMed over 2 years ago was because no one seemed to be keeping track of these things despite the fact they were undergoing exponential growth and some of the best work was being posted as preprints.

What caught my attention in Europe PMC’s post was this image:

Source: http://blog.europepmc.org/2018/07/preprints.html

This image shows the time between preprint posting and publication in a journal for bioRxiv and PeerJ Preprints. The part that caught my attention was the negative times — bioRxiv appears to have a clear policy regarding this:

This policy seems pretty clear, but I was previously alerted to an article seemingly in violation of this policy via Twitter, and I notified bioRxiv and posted at PubPeer. I was told that since the version posted to bioRxiv wasn’t the accepted version at the journal it was okay, which didn’t seem to make sense, but I left it alone for the time being.

Recently I saw a discussion on Twitter that seemed to reaffirm the policy:

Richard is responding to someone asking why he can’t update his preprint with the final, accepted version. Richard’s response doesn’t seem to make sense because an update to the preprint won’t affect the date associated with the DOI, but he at least made it clear bioRxiv doesn’t want postprints, which is how I had interpreted their policies.

So we’re back to those negative times Europe PMC discovered. If their analysis is correct it would seem there are a number of preprints in violation of bioRxiv’s policy.

For their analysis Europe PMC is using data from Crossref, so one possibility is that the dates in Crossref are not correct. For example, maybe the dates in Crossref are for revisions to preprints and the negative times just indicate revisions which occurred after publication. I was interested enough to see for myself.

I queried the Crossref API for all the bioRxiv DOIs I had and checked the dates against mine. For the most part, once I started my daily indexing I have the dates of the first version. These are also the dates Crossref has.

***An interesting observation I made: bioRxiv allows the titles of preprints to change with each revision, and while Crossref has the date of the first version they seem to have the title of the later versions.***

So Crossref has the correct dates for bioRxiv preprints, but maybe the dates for the published versions were wrong. I went ahead and grabbed the dates of the published versions of bioRxiv preprints and manually checked some of them. This data also looks good.

Looking at some of these negative times it’s hard not to laugh a little. For example, this preprint was posted over 2 years after being published (yes, I know the title and text of the versions are different, but the figures are mostly the same — I’m not sure how similar the versions have to be for them to be considered the same manuscript and get linked up, or how this happens).

BioRxiv has apparently discovered time travel. Someone get the Nature editors on the phone.

Update 20180717:

A day after my post bioRxiv updated the published link. As my screenshot above shows, it previously went to this article, but now goes here. The Crossref API does not yet reflect this change:
The new 2017 article linked by bioRxiv is actually a review which cites their 2014 article. Some of the figures of the review are identical to the 2014 article. The review states they received permission for reusing those figures.

This preprint actually committed a double whammy of violations. BioRxiv recommends only posting at one preprint server and this preprint also got posted at arXiv at almost the same time.

So is any of this a big deal? There’s an argument that allowing postprints at bioRxiv would provide authors a means of making their closed access articles open access. In fact, OSF Preprints seems to not only allow, but actually contains a large number of postprints.

As an indexer of preprints though I do find it annoying that some preprints I’m indexing are not actually preprints. When people subscribe to an RSS at PrePubMed they are expecting to see new, unpublished research, not work published 2, 5, 10 years ago. If preprint servers get flooded with postprints which are not marked as postprints I’m not sure how people will specifically search for preprints. That’s why I had to give up on indexing Figshare preprints — articles labeled as preprints were not preprints.

There’s another thing that bothers me about these postprints. Looking at the time to publication, there are actually quite a few dates which are not negative, but still less than a month. It seems what people are doing is as soon as a paper is accepted they are posting a preprint, sometimes at multiple servers. This not only violates bioRxiv’s policies, but it is sort of gaming the system. A preprint is supposed to be about getting your research out early, but these researchers are basically posting preprints at the same time as publication for what I suspect is a boost in view count, citations, and open science karma.

I can kind of understand why a lot of authors are doing this. In biology researchers are terrified of being scooped, and once an article is accepted at a journal they are now magically safe from being scooped so they might as well post a preprint.

Anyways, I’m curious what bioRxiv plans on doing with these preprints in violation of their policies. If they do nothing that sort of seems like an open invitation to post postprints. While they may not have cared whether I complain about indexing preprints which aren’t actually preprints, maybe they’ll care if Europe PMC views this as a problem.

***For what it’s worth, a journal not enforcing its policies is nothing new. Many journals have data sharing policies which authors treat like speed limits in California. And journals presumably have policies against plagiarism but can’t seem to respond to emails once plagiarism is detected.***

Also, now that preprints made it to the big leagues some additional quality control steps should be added. For example, all of these bioRxiv DOIs aren’t in Crossref and aren’t resolving (they all start with “10.1101/”):

324533, 357814, 066480, 355040, 357152, 357327, 353508, 340091, 357376, 363317, 109900, 361592, 357681, 135814, 118836, 357285, 338376, 357301, 357871, 299925, 358010, 357673, 279323, 357350, 356774, 121137, 342568, 358036, 357897, 365700, 165415, 358085, 351593, 086702, 347146, 357889, 356816, 363150, 357343, 357731, 357004, 357772, 341297, 357806, 356824, 355628, 357707, 361345, 357665, 353557, 358101

P.S. The data in PrePubMed isn’t perfect, sometimes preprints slip through the cracks (and those are only the ones I know about).