The different flavors of preprints
My Twitter timeline is filled with comments from like-minded people that reinforce my (correct) worldview as well as cause a surge of dopamine that Twitter hopes will get me to log in more often. As a result, I’m typically only exposed to positive comments about preprints. But sometimes I see a couple of haters refer to preprints as “garbage” publications, or faculty claim that they would have to take preprints listed on CVs with a grain of salt, or even people ask what preprints are.
Just like peer-reviewed literature, preprints range in quality. For example, if you want to find some questionable peer-reviewed work just check out PLOS ONE-how you doin’? But that doesn’t mean preprints are simply a random sampling of the literature. If you gave me a pile of preprints and non-preprints there’s a chance I could correctly identify which is which.
Because I index thousands of preprints for PrePubMed I’m exposed to a range of preprints and in my mind have grouped them into several categories. It is important to note that I mainly index and read biology preprints and therefore these categories may not apply to preprints in other disciplines.
BADA BOOM, REALEST GUYS IN THE ROOM
When writing a paper most people have a specific journal in mind for the first submission and therefore present their work accordingly. This means restricting oneself to the layout the journal uses for articles and observing the word limits for each section along with any table, figure, supplemental data limits. This also means writing for a certain audience, for example field specific journal vs. multidisciplinary journal.
Do you know what it doesn’t mean?
It doesn’t mean the science is presented in the best way possible.
Preprints in this category don’t give a damn about formatting or word limits, they just present the work in the authors’ vision. They may contain detailed explanations or examples and may contain colorful language or humor to make a point. These preprints will likely eventually get published, but before submission will require significant editing to meet the journal’s policies.
This category encompasses the majority of biology preprints. What most people who submit preprints are doing is as soon as they submit their paper to a journal they post an exact copy as a preprint. These preprints are fairly easy to identify since they look identical to published papers. They will have multiple figures, supplemental data, and a long reference list. These can be difficult to distinguish from the first category.
This category of preprints really bothers me, although I suppose it shouldn’t. These preprints attempt to present themselves as legitimate research, but they will never be submitted to a journal since the authors know they have no chance of being accepted, although with PLOS ONE you never know-how you doin’?
These preprints are essentially just blog posts, but with LaTeX formatting and some references. They are usually very short, typically one or two pages, contain no figures, analyses, data, or code, and frankly there are blog posts out there of much higher quality.
So why do these preprints bother me so much? Ironically it’s the same reason that journals exist and scientists value their brands. I don’t want a preprint that I spent months on sitting next to a preprint that is so bad that I would be embarrassed to include it on my blog. It is precisely this type of preprint that leads some to believe that preprints are not serious work and I don’t want my work to be viewed that way.
I’m all about sharing one’s work, but I believe work should be shared in the correct medium. When I am perusing preprints I want to be perusing work which would pass peer review, which isn’t to say anything about its quality, but is more a reflection of whether the work looks like it might provide some value. It is important to remember that preprints get indexed by Google Scholar and as a result these works dilute the literature and potentially cause someone to miss important research. They also dilute preprints, but there are so few preprints published that it is difficult to miss an important preprint in your field if you are specifically searching preprints. But this could become a problem in the future.
Look, I understand that it is cool to have your blog post indexed by Google Scholar and it will likely be read by more people than if you posted it on your blog. But you have to remember that preprints are permanent, they can’t be taken down. As a result, you are forever attaching your scientific reputation to these questionable articles. I would never hire or collaborate with someone who thinks these questionable articles are worthy of being posted as a preprint. If this is work they are proud of I don’t want to see any more of their work.
These are even worse than the blog posts. They contain scientific content, but are short enough to just be an abstract. They basically just report a single finding or experiment. These are fairly common at Figshare which does not perform any sort of quality control, and I actually don’t mind people using Figshare in this way since it is meant as a data repository and not as a preprint server, I just wish they wouldn’t be labeled as “Paper”. The problem is that because of their prevalence on Figshare it is hard to imagine anyone perusing Figshare for preprints, and I have had to put restrictions in place on PrePubMed so that only a fraction of the thousands of preprints that are added to Figshare each day get indexed.
Like the blog posts I do view these as a problem for preprint servers and would prefer they get posted to a repository such as Figshare, and labeled as “Figure” or “Dataset” instead of “Paper”.
Some people claim they see a lot of preprints which are scientifically unsound or contain numerous errors. I haven’t encountered these so I can’t confirm they exist.