My concerns regarding the ASAPbio Central Service and Center for Open Science

Jordan Anaya
8 min readFeb 26, 2017

--

Note: I have a follow up post here.

Preprints are clearly the future of scientific communication, but currently face multiple obstacles. To address these problems, ASAPbio has proposed the creation of a Central Service that at a minimum will aggregate and archive preprints as well as make them easily searchable and readable, and at most will be a Web 2.0 nirvana that will perform complex file conversions and automated screening.

This is currently a topic of conversation because ASAPbio recently released a Request for Applications (RFA), and obtained their first funding commitment. Twitter quickly erupted with misguided anxiety, as it is prone to do. It appears many people were under the impression the Central Service was going to replace bioRxiv and the other preprint servers. I’m not sure if it is their fault for being misinformed or ASAPbio’s fault for not being more clear about the goals of the Central Service.

There were some legitimate concerns raised about the Central Service however. Casey Greene has articulated some of his. As the creator of PrePubMed, and someone who keeps tabs on the other available options for finding preprints, I have my own set of concerns.

To understand my perspective, I will first provide some background about PrePubMed, which currently indexes new preprints bidaily, provides a strict search engine which tries to identify author names like PubMed, and contains other tools such as an RSS feed for custom search strings.

I am not a software engineer. I simply had some interest in web development, read some tutorials, and watched some youtube videos. I had recently developed OncoLnc, and was interested in learning a little bit more web development and working on a little more complicated site. At the time there didn’t appear to be an index of preprints or clear method to find them except for Google Scholar, so I thought I would try and make a site similar to PubMed and learn as I went. I’m fairly happy with the result.

I specifically made PrePubMed similar to PubMed to try and put pressure on PubMed to index preprints, but there are no indications that this will happen any time soon. As such, I never intended PrePubMed to be a permanent solution to indexing and finding preprints, and simply hoped it would be a temporary method to find preprints until an institutional solution arose. I also hoped that an index of preprints would increase the credibility of preprints.

Since launching PrePubMed the Center for Open Science (COS) launched their own service for finding preprints.

The COS also announced their intention to apply for the RFA, which is when I started to get concerned about the Central Service.

I suspect the COS often escapes criticism because it is synonymous with open science, and an attack on the COS might be mistaken as an attack on open science. But it has been done before. It is possible to criticize the individual without criticizing the movement.

I support the ASAPbio’s creation of the Central Service, and hope it replaces PrePubMed so that I don’t have to continue maintaining the site. I just worry that the Central Service will discourage competition, and thus incentive to respond to criticism. If I and others entrust the Central Service to index preprints, we are putting our eggs in one basket, and if that basket turns out to be a single developer such as the COS, then we could end up with an unreliable and unresponsive system.

Currently there is no motivation to develop a preprint search engine, which is why I believe the Central Service is indeed needed. For example, what motivation do I have to continue to maintain or add to PrePubMed? I do add simple feature requests such as dates in search results or an RSS feed, or indexing new life science preprint servers, but I don’t have any reason to spend a bunch of time learning more web development and adding complex features. I only continue to maintain PrePubMed because people use it, and I believe it is the best option for finding life science preprints.

Similarly, what motivation does the COS have to maintain their preprint search tool and ensure it is accurately indexing preprints? I assume they don’t have funding specifically for a preprint search tool, although I do think they have funding for SHARE, of which preprints are a part. As I will describe, it is clear they are not motivated to have an accurate index, or to respond timely to requests. Hopefully these criticisms will serve as a motivation.

OSF Preprints by the COS represents an important advance in this area. They index servers which I do not and will not index, and have a state-of-the-art technology stack with Elasticsearch and a modern front-end framework. Although they built a fancy race car, they forgot to fill the tank with gas.

When they launched their tool back in September people quickly pointed out they were missing large numbers of preprints. When questioned, Brian Nosek responded that they were still in the process of “harvesting”.

That drip must be of molasses, because months later and they are still missing large numbers from bioRxiv and PeerJ Preprints, and appear to be missing preprints from other servers as well. I have five preprints scattered among bioRxiv and PeerJ Preprints, and to this day not a single one is indexed by OSF Preprints.

You can also tell they are missing preprints because bioRxiv should have over 7,000 preprints, but they indicate they only have around 4,000 indexed.

Why does this matter?

Just think about it. How can you launch a tool, suggesting it should be the go to tool to find preprints, when you know you are missing preprints? What if someone is exclusively using this tool to find preprints? They could be missing extremely important work, such as mine ;). It is borderline unethical for them to promote a tool as ready for use when they know it is defective.

I could understand if they somehow accidentally missed some preprints, but when they say it will be fixed and it doesn’t get fixed what are we supposed to think? Do they not think it is important that they are missing preprints? Or is this indicative of their skill as developers?

Not only are they missing preprints from outside preprint servers, but they are missing preprints from their OWN preprint servers.

Even more concerning is they don’t reply when confronted with these problems.

And when a new outside preprint server, preprints.org, requests to be indexed by OSF Preprints it takes them months. And once finally indexed, of course they are missing many preprints, preprints.org does not show up in their list of providers, and the preprints that are indexed from preprints.org don’t contain hyperlinks back to preprints.org.

They actually might not be aware of this problem, but as we’ve seen, it is unlikely notifying them will result in a swift fix.

As is customary for the COS, when it comes to promoting their own products there is no delay. You will notice they happily list all the preprint servers which they host as providers, but conspicuously leave out preprints.org. It’s almost as if they don’t want you to know that there is a server out there that accepts preprints from any subject area.

When an organization has these clear conflicts of interest, I don’t know if I would feel comfortable having them lead the development of the Central Service. Why wouldn’t they give priority to the preprint servers they host, which they have already shown a tendency to do?

I think the COS can make useful contributions to the Central Service, but if an organization which hosts its own preprint server(s) is involved in the Central Service there needs to be checks and balances to ensure they don’t use the Central Service to push out their competitors.

And when handing out funding for the Central Service, I also wonder if ASAPbio will consider the financial need of the provider. The COS already has millions in funding; I don’t know if yet more funding should be concentrated in a single organization.

A key feature that ASAPbio wants for the Central Service is for it to be open and for developers to easily build upon the data and tools. In theory, this is also a stated goal of the COS. But in practice not so much.

As an example take their SHARE API. As an indexer of preprints, I was curious about their API since they index servers I don’t. After playing around with it for a while, I managed to get a call that worked. I thought maybe it took me a while to get working because I’m not a professional developer. Turns out I wasn’t alone.

Okay, an N of 2 isn’t much, but consider this. I wrote this API call which used to work. Now you have to remove “.raw” to get the call to work. Umm, did they announce this change somewhere? If it wasn’t for their stated mission you might think they don’t want people to successfully use their API.

Couldn’t they have a running blog describing how to use their tools, and any changes they make? I wrote a blog post describing how PrePubMed works. I can only hope the Central Service has an easier learning curve for developers wanting to use the tools they develop.

Preprints are essential for speeding up scientific communication, democratizing research, and altering the status quo of relying on journal brands to judge research, which has been the single most detrimental development in modern science history.

Competition is good for fostering innovation, but we need to work together at all levels to advance the preprint movement. For example, bioRxiv is the most popular biology preprint server, but I much prefer the technology of PeerJ. Look at this, you can click on your metrics and see all of your referrers.

I wish PeerJ would share their technology with bioRxiv (bioRxiv also has a tendency for articles to not be accessible for short periods of time).

I know the Central Service is only meant as an aggregator, but I think the Central Service should require providers to meet a minimum technology requirement. If the provider cannot meet that requirement with their own technology, the Central Service should provide them with the technology (preferably with something like PeerJ’s). Seriously, can you get detailed metrics from any publisher other than PeerJ?

And the COS already has good technology for storing and searching for preprints, I just don’t understand why they are missing so many and don’t seem bothered by this.

As a result, if all the Central Service did was coordinate the sharing of technology that already exists that would be a huge step forward. If every preprint server was as nice as PeerJ, and every single preprint was indexed in the COS Elasticsearch database, scientific publishing might actually start to look modern.

--

--