How many preprints should be published?

One thing I’m growing increasingly interested in is how different people interpret the same information completely differently. I’m also growing interested in cases where the same people interpret the same information differently, but that’s a story for another day.

A new preprint came out which analyzes…preprint data. One tidbit in the preprint that caught people’s attention is that around two thirds of bioRxiv preprints end up getting published. Most people trumpeted this figure in a positive light, but then you had people like this:

There’s a lot to keep in mind here. First of all, is that 40% number even accurate? I noticed some issues with Crossref metadata, and a certain percentage of preprints which have been published have not been linked to the published version.

At this point, if I had a Harvard affiliation and was a stan for preprints I’d say something like “given the available data, the percentage of preprints eventually published is indistinguishable from 100%”.

I don’t think whether a preprint is published is a good proxy for quality, but assuming it is, I think there’s a lot of holes we can poke in this 40% fake news number.

If you want to publish something you can find a journal that will publish it. Quilt plots got published at Plos One, the Kardashian index got published at Genome Biology, a preprint search engine which can’t find its own preprint got published at F1000Research…

The point is, if you want your work published, you will likely be able to find a publisher that will gladly take your money. Here is probably the most egregious example I’ve ever seen. I can’t understand a single sentence. It reads as if the text was put into Google translate, then reverse translated, then retranslated about 100 times. A quick glance at the journal is all you need to see it’s predatory. The link for references is spelled wrong and doesn’t work.

So you can get your work published if you really want, which suggests that preprints which aren’t published may have never been sent out for publication. In fact, some people have gone on record saying that they don’t send some of their work to journals. So I don’t think this 40% number should be interpreted as an indication that 40% of bioRxiv preprints are unpublishable.

With that said, a certain percentage are in fact unpublishable in the sense that I do think it would be difficult to find a respectable journal that will publish the work. For example, I think this comment article by the famous Craig Venter is unpublishable, as is this comment article by the famous George Church, as is this comment article by the famous Eric Lander. The reason being that publishing scientific comments is notoriously difficult.

I’m not that familiar with the arXiv, but it is my understanding that comment articles of preprints get directly linked to the preprint being commented on, and comments to these comments are linked, and so on. As a result, I suspect a good number of arXiv preprints (primarily the comment articles) are never published, and yet I don’t see people complaining about how the arXiv has a fake news problem.

There are also other biology preprints which I think are unpublishable due to them not being long enough to be considered a full article worthy of journal space. For example, some people see preprints as a way to report small findings:

I’ve actually already done that. Ideally, three of my papers would have actually just been one paper, but when I wrote the first paper I didn’t know how to make websites, and when I wrote the second paper I didn’t know about Fisher’s method. The third paper could have been a blog post, but seemed like it might be important enough to put in the scientific record.

Ok, so we’ve now identified 3 categories of preprints which may not get published: 
1. The author didn’t send the preprint to a journal for whatever reason.
2. The preprint is a comment and not original research.
3. The preprint reports a small finding.

This leaves the category which I think the fake news guy had in mind: those preprints which were sent out for review but got rejected from every journal. I think this is also what the authors had in mind when they wrote:

any preprint that has not found its way into the peer-reviewed published literature after a certain period of time (we recommend 18 months) should be amended with an explanation of the circumstances that have precluded its review and publication

So then where are the fake news preprints at? Peer review apparently makes something real news, but any preprint that meets the format of a journal article can get published.

When I hear fake news with regards to science I think of low quality, possibly pseudoscience or fraud. The concern is that authors could possibly use bioRxiv to make their work seem legitimate when it’s not. Unfortunately this is a legitimate concern, let me explain.

When it comes to original research, I don’t think we have to worry. If someone posts something crazy like vaccines cause autism, I don’t think respectable news outlets will cover it without first discussing the preprint with experts or looking into the reputation of the authors. Will questionable web sites use the preprint as evidence? Probably, but they would use anything on the internet as evidence and you can’t create policies around what the worst of society might do.

I think the main concern is people posting comment articles, like the ones by Craig Venter, George Church, and Eric Lander. When someone posts legitimate scientific criticism of your work it’s in your best interest to take the time to respond, but a problem arises when delusional PubPeer trolls start to post their criticisms as individual bioRxiv articles, as happened here.

Just as I don’t think people should be required to waste time replying to crazy Twitter trolls spreading lies about them, I don’t think authors should have to deal with crazy bioRxiv comment articles. Obviously these trolls could just instead post their criticisms on PubPeer or Twitter, or in a blog post, but given that bioRxiv has a screening process I do think they unnecessarily give these trolls some legitimacy. As a result, I think bioRxiv should think about subjecting comment articles to additional screening to ensure they are legitimate criticisms. Given how many reader comments they reject I’d advise they just screen articles as strictly as they screen reader comments.

P.S. The preprint which started this discussion has been getting a lot of attention, which I find interesting since most of this info has already been out there. If you wanted to know how long it takes preprints to get published you could have read this post by Europe PMC. If you wanted to know where preprints get published you could have read this post by Crossref. I guess the novel thing in the preprint is that they looked at downloads and impact factor, but I could have basically guaranteed you that highly read preprints get published in more selective journals.

The real useful thing about the work of these authors is their website, which lets you see which preprints the cool kids are talking about if you aren’t on Twitter.