Why bioRxiv can’t be the Central Service

So it’s happened again, yet another discussion on Twitter about the Central Service for preprints, with people again asking why bioRxiv isn’t already the Central Service.

I already wrote two blog posts on this topic here, and here, the last time this happened, but I mainly focused on my concerns about the Center for Open Science and only briefly touched on the problems with bioRxiv. I guess the third time is the charm, and to make sure a fourth post isn’t necessary I’ll try to make the problems with bioRxiv as clear as possible.

1. BioRxiv software is not open source

BioRxiv is immediately disqualified from being the Central Service because it is licensing a HighWire Press platform. ASAPbio makes it clear in their proposal that all technology for the Central Service must be open source.

2. Incompatible licenses

When you submit a preprint to bioRxiv you have several license choices, including “All Rights Reserved”. Daniel Himmelstein and I previously analyzed the license choices at bioRxiv and found 30% to be “All Rights Reserved”.

This is problematic for a Central Service because that means these preprints can’t be distributed, or even archived by a third party. How these preprints with restrictive licenses are going to be archived in a Central Service is an open question. Whether third parties will be able to text mine these articles is also an open question.

3. No standards

One goal of the Central Service is to establish standards for preprints, which includes making sure there is complete metadata. I’ve used this image before, but it is ridiculous enough to post again.

This is a preprint that passed the screening process at bioRxiv. I assume it was submitted by Aaron Ellison, given that he is the only author with a full name listed, and the only author with an email listed when you look at the metadata.

Who are these other authors?

How are we supposed to identify them?

Do they even exist?

When you submit a preprint at PeerJ every author has to make an account, and everyone gets an email when a preprint is submitted, just like at a journal. The submission system at bioRxiv is more lax than predatory journals. I could literally submit a preprint to bioRxiv right now and list a bunch of fictional characters as authors and no one would notice. Or even worse, a PhD student could post a preprint without their mentor getting a notification.

4. Unclear policies

It isn’t really clear what types of articles bioRxiv does or does not accept. Sometimes they’ll take reviews, sometimes they won’t. Sometimes they’ll accept an article with a lot of authors, sometimes they won’t.

And what are their policies about retracting articles? They have removed articles in the past, but they didn’t leave any sort of note why, just a dead link. I know this because I’ve indexed articles which are no longer available. It appears that an author submitted a very similar article multiple times, and the extra submissions got taken down. It is unclear if this was a mutual decision between bioRxiv and the author or not. It is also unclear why bioRxiv didn’t catch the similar submissions during their screening process.

5. Terrible software

I already mentioned the bioRxiv platform is disqualified from being the Central Service because it is proprietary, but even if it was open source I still wouldn’t want it to be the Central Service. If you compare the user experience at bioRxiv vs PeerJ it’s no contest.

One of the strengths of bioRxiv that I keep hearing is that the site is always up. That may be true, but the preprints are not always available. As you can see from my error logs, during my daily indexing I sometimes encounter an “Access Denied” page or a “500 Internal Server Error”.

6. Missing metadata

This is somewhat of a minor problem, but a problem nonetheless. When a bioRxiv preprint is initially posted for some reason the subject area is missing. As a result, when I do my daily indexing of bioRxiv preprints I miss the subject information, and if I really want it I have to go back and reindex the preprints at a later date. I assume this is a problem with the HighWire platform. But does bioRxiv even know this is an issue? They don’t even know how their own software works so it’s unlikely.

To be clear I’m not anti-bioRxiv, I’m pro all preprint servers, and each person should use what they like (if you have 100 authors it could be annoying to get all 100 people to make accounts at PeerJ). I just wanted to make it clear that just because bioRxiv is the most popular preprint server that doesn’t make it the best. You’d think that would be obvious, but stuff like this shows up in my Twitter feed: