Some thoughts on #ASAPBio

State Space
4 min readFeb 14, 2017

--

#ASAPBio aims to accelerate dissemination of scientific knowledge by making it easier to share preprints. You can read their mission statement on their website.

I have cursorily followed ASAPBio since its inception. Their view on a “Central Service (CS)” (http://asapbio.org/benefits-of-a-cs) caught my attention, and this document is about this idea of a CS. It is obvious, even to me, that ASAPBio folks are doing good work in an area that is becoming more and more important, with the increase in the number and complexity of scientific results and the difficulty in scaling the classical journal model.

That being said, it is unclear to me why building a CS (ostensibly from scratch) is a good use of this opportunity.

I think it is better to build on top of existing infrastructure an entirely distributed preprint ecosystem. For example, “anybody” should be able to run a preprint server — even a small lab. Conferences, symposia, summer schools, lab retreats, etc. could all run their own servers. Such radical decentralization is common in most other tech and should / will be common in scientific literature too, as current solutions continue to face scaling problems. Mirroring and synchronization protocols would keep all content current, versioned, and highly available.

Second, while decentralized, an ASAPBio preprint system should provide a common API to communicate with (and amongst) such preprint servers, much like the WWW runs on a (mostly) standardized IP + DNS + HTML + Browser stack. A standardized, simple, and accessible (i.e., robust support for many popular programming languages) API will enable people to write better search engines, UIs, topical content aggregators, and other unforeseeable apps. If done well, it will enable direct and optimal use of the data generated by the scientific community, data that sits behind paywalls and in opaque file formats today. This will also steer content generators to provide content in an appropriately consumable form.

Third, this will force ASAPBio and interested parties to think carefully about how to host heterogeneous content and make it a “first-class” citizen in the literature world. For example, biological lab protocols, which are currently presented (and repeated) as free-form text in the manuscript, could be represented as controlled vocabulary documents, compiled into a methods catalog, and referred to in papers. It is likely that there are other data types with specific needs. Figuring out this data model is perhaps where the real hard work lies.

I see ASAPBio as being a catalyst in bringing all stakeholders to the table and accelerating a standardization process. I would like its mission to not only be technical (build or host preprint servers) but also as continuing to enable and protect free and open dissemination of scientific results. This might initially involve building a community of stakeholders, then building or embracing standards, then nurturing development of useful software tools, but always, protecting unencumbered and speedy access to reliably curated tax-payer funded research results.

Ultimately, when ASAPBio is successful, a scientific content generator would download the ASAPBio software build, spin up a server, and install apps — preferred UI, connections to various bots, etc., as necessary. Immediately, any content shared would be available across the “ASAPBioNet” to all users. Apps that are interested in a specific science domain (“CRISPR in C. elegans”) or specific data type (“all genomic editing sequence results on WNT pathway genes”) or both, will begin including the new server in their “crawl” list. Perhaps, there will also be apps implementing a new, better, and more open peer review process as well. Attribution to authors and funding agencies would be automatic and standardized. Commercial entities could also support the effort & play, providing valuable & better targeted products and services. Corporations would not need to reinvent literature search software (“bot that tracks glyphosphate studies in bees”) and could plug directly into all the data & know-how embedded in literature. Academic software could also find a place in this network, if it doesn’t have a better place already. And this does not rule out the idea of a Central Server: you can do both in this model.

Some technical questions still remain: the need for perpetual archival of everything, difficulty in building a standard model that covers all types of data & documents (but see e.g. semantic web), etc.; but these questions will need to be addressed by any such project at scale, and they aren’t impossibly difficult today. Finally, if there are so many servers, there is still the problem of curating content, filtering out junk, and boosting high quality content to the top. I personally like the MathOverflow model of community policing. (MO is an online Q/A community for research-level questions in mathematics.) I believe that a similar model may be good enough and will scale well. I like the idea of rewarding community members with badges and some sort of reputation score. If done right, it could become a useful additional measure of contributions.

Finally, because I am unfamiliar with the #bioarxiv software, it is unclear to me whether it could be leveraged as a starting point — I believe that it could be. At least its governance structure, user base, lessons learned, and brand identity would be a great place to sow new ASAPbio seeds. User adoption is difficult, so any leg up would be useful.

I welcome thoughtful comments here or on twitter.

--

--