Why bioRxiv can’t be the Central Service

So it’s happened again, yet another discussion on Twitter about the Central Service for preprints, with people again asking why bioRxiv isn’t already the Central Service.

I already wrote two blog posts on this topic here, and here, the last time this happened, but I mainly focused on my concerns about the Center for Open Science and only briefly touched on the problems with bioRxiv. I guess the third time is the charm, and to make sure a fourth post isn’t necessary I’ll try to make the problems with bioRxiv as clear as possible.

1. BioRxiv software is not open source

2. Incompatible licenses

This is problematic for a Central Service because that means these preprints can’t be distributed, or even archived by a third party. How these preprints with restrictive licenses are going to be archived in a Central Service is an open question. Whether third parties will be able to text mine these articles is also an open question.

3. No standards

This is a preprint that passed the screening process at bioRxiv. I assume it was submitted by Aaron Ellison, given that he is the only author with a full name listed, and the only author with an email listed when you look at the metadata.

Who are these other authors?

How are we supposed to identify them?

Do they even exist?

When you submit a preprint at PeerJ every author has to make an account, and everyone gets an email when a preprint is submitted, just like at a journal. The submission system at bioRxiv is more lax than predatory journals. I could literally submit a preprint to bioRxiv right now and list a bunch of fictional characters as authors and no one would notice. Or even worse, a PhD student could post a preprint without their mentor getting a notification.

4. Unclear policies

And what are their policies about retracting articles? They have removed articles in the past, but they didn’t leave any sort of note why, just a dead link. I know this because I’ve indexed articles which are no longer available. It appears that an author submitted a very similar article multiple times, and the extra submissions got taken down. It is unclear if this was a mutual decision between bioRxiv and the author or not. It is also unclear why bioRxiv didn’t catch the similar submissions during their screening process.

5. Terrible software

One of the strengths of bioRxiv that I keep hearing is that the site is always up. That may be true, but the preprints are not always available. As you can see from my error logs, during my daily indexing I sometimes encounter an “Access Denied” page or a “500 Internal Server Error”.

6. Missing metadata

To be clear I’m not anti-bioRxiv, I’m pro all preprint servers, and each person should use what they like (if you have 100 authors it could be annoying to get all 100 people to make accounts at PeerJ). I just wanted to make it clear that just because bioRxiv is the most popular preprint server that doesn’t make it the best. You’d think that would be obvious, but stuff like this shows up in my Twitter feed:

Creator of PrePubMed and OncoLnc http://www.omnesres.com/