Why we preprint.
Our lab’s goal is to develop and enable methodological advances and computational systems that make analysis of genomic “big data” as routine in biology labs as PCR. We focus on developing computational methods and performing analyses that are robust, reproducible, and extensible, which we define as:
- robust: our analytical code can be re-run and re-applied to new or updated datasets.
- reproducible: we provide source code to directly replicate the analyses performed in our manuscripts.
- extensible: we provide code in a manner and of a quality that others can build on it to make new discoveries.
Preprints address an important problem.
Preprints allow our lab to provide a timely description of results to accompany source code repositories that we release. For our submitted papers, we have focused on providing a parallel open-source code repository that reproduces the analyses described within the manuscript. This code is provided through a hosted version control system such as github or bitbucket. To put these methods into biology labs, we’ve also provided webservers that make these methods and results available to users without programming knowledge.
We’ve found that this means that our methods and webservers are being used, and sometimes these uses are even being published [1, 2], before our own papers are accepted for publication. In part, this may be due to a lengthening in the peer review and publication process, but it also seems that people are using our code more quickly. We were contacted by another researcher who had used ADAGE to analyze a unique dataset shortly after the code was made available. With the widespread use of hosted version control systems like bitbucket, code is readily discoverable and researchers seem more apt to use existing code. This should be a good thing for science, and it increases the need for preprints.
Preprints are now widely accepted.
Thanks to the hard work of others who recognized the value of preprints before we did [Desjardins-Proulx et al. PLOS Biology, Vale, PNAS], we aren’t alone in using preprints. Preprints are now widely accepted at journals and publishers that focus on both traditional [Nature, Science, PNAS, Royal Society] and open [PLOS, eLife] distribution models.
Our positive preprint experiences.
We just started submitting preprints for our papers last month. It’s really wonderful to be able to talk in detail about work that’s still under review. I recently gave a talk at UC Davis that was blogged before we started uploading preprints. It was wonderful to read about excitement for our work but frustrating that we hadn’t shared the paper even though the underlying source code was available. Now with the preprint online, it’s much easier to provide context for the code.
I’ve also been very happy with how our recent preprints have been discussed. In our analysis of cross-platform normalization methods for microarray and RNA-seq data [R package, evaluation code] we evaluated a few straightforward techniques to address a concern that reviewers frequently expressed to us on papers and grants. Our analysis showed that, for machine learning applications, cross platform normalization is feasible. We’ve gotten detailed feedback on the preprint. From our experience thus far, readers of preprints seem to be particularly interested in the topic of the manuscript. The comments that I’ve received have been as valuable as some reviews obtained through the traditional peer review system. Addressing these comments, as well as reviewer comments, will make our paper stronger than addressing reviewer comments alone.
Our negative preprint experiences.
While our experiences have been largely positive, I don’t want to leave the impression that it’s now smooth sailing with preprints in biology. I’ll give a vignette of the type of difficulties that you can encounter.
Four molecular subtypes of high-grade serous ovarian cancer have been observed [Tothill, TCGA]. These subtypes were observed through unsupervised clustering, though some samples did not cluster cleanly and were filtered out. Subsequent analyses revealed that over 80% of samples belonged to more than one subtype. We performed a rigorous, systematic, and comprehensive unsupervised analysis of subtypes across five populations including American, Australian, and Japanese women, which represents the largest such analysis to date. We identified subtypes in each population and comprehensively characterized correlations between subtype-specific gene expression both within and between populations. We found that two subtypes were highly robust across populations, but the two remaining described subtypes were not as consistent. We also uploaded the preprint and provided source code.
When we were selecting journals that would be appropriate venues for this work, we identified a candidate AACR journal. AACR did not explicitly define a preprint policy, though they opposed author and sponsors’ websites, and was not listed in a list of journals by preprint policy on Wikipedia. We expected that AACR, whose mission states that it “accelerates the dissemination of new research findings among scientists and others dedicated to the conquest of cancer,” would allow preprints. After contacting the provided e-mail address [firstname.lastname@example.org], we got the bad news:
Despite difficulties, we’re going to keep using preprints.
From our preprint experiences, I’ve been convinced that this method of dissemination complements the standard peer review process and enhances the scientific value of both. The pre-publication review process provides an additional opportunity for the vetting and review of our manuscripts. My experience has been that these self selected reviewers have a strong and detailed domain knowledge. Their feedback complements the standard review process where editor selected peers provide feedback.
What changes can we make to improve preprint acceptance?
As scientists, we serve as both the generators and the gatekeepers of content for publishers. We decide where to submit our manuscripts, and we review manuscripts without compensation. Because these are the levers that we have available to us, I’m going to take the following steps moving forward:
- I will now decline invitations to review manuscripts from journals that lack an affirmative preprint policy.
- I will contact journals to inform them if their preprint policy is the reason that I choose not to submit a paper to them.
UPDATE: On 11/18/2015 I was contacted by AACR in an e-mail indicating that they are considering changing their preprint policy at a leadership meeting early next year. If you’d like to provide input, the e-mail provided email@example.com as a contact point for preprint feedback.
UPDATE 2: On 21/6/2017 I was contacted by AACR that they have changed their preprint policy. Thanks to everyone who contacted them!
Casey Greene is an Assistant Professor of Systems Pharmacology in the Perelman School of Medicine at the University of Pennsylvania and a Moore Investigator in Data-Driven Discovery. He occasionally tweets as @GreeneScientist.