A visit to the Center for Open Science

I’ve never been a fan of critics. It is easier to be a critic than a creator. But criticism is not inherently bad, and it’s not as if critics want whatever product they are consuming to be flawed, they want it to be great, and providing criticism is the only way they can effect change.

This was the goal of my last post about the Center for Open Science (COS) and their preprint search tool OSF Preprints. And to make sure change happened, I trekked down to the COS and met with Jeffrey Spies. After our discussion I now have a broader perspective of the goals of the COS and the obstacles they face.

Because of the success and resulting size of the COS, it is easy to view them as an evil empire that is hogging all of the funding for open science.

Yes, they constantly promote their own products like other companies, but the COS is not a company, it is a non-profit that makes all of its tools freely available and open source. No one at the COS has any equity in the organization, and while I’m sure the developers are paid well, no matter how successful the COS becomes their only reward is knowing that they are contributing to open science.

I guess one reason I am uneasy about the COS is it is the only organization of its kind. As we saw with the recent AWS S3 outage, there are dangers in relying on a single service. I don’t want there to be a single COS, I want there to be 10 COSs, 100 COSs. But there aren’t, and we need to hope that the COS continues to be successful. The COS recognizes the importance of its mission, and knows it can’t afford to fail, which is why they must spend a lot of resources on securing funding and ensuring they are following legal guidelines.

Perspective colors everyone's thinking, and as an indexer of preprints I expected a fellow preprint indexer and someone applying to be the Central Service to have an accurate preprint index. But preprints are only a small component of the OSF SHARE database, and OSF SHARE is only one of the many things the COS works on. Even with their team of developers, maybe indexing preprints wasn’t high enough on their priority list to devote any resources to quality control of their indexing.

Our perspective is also affected by who we interact with and what we can see. I mainly check the pulse of the scientific community by Twitter, but Twitter is a small and skewed slice of scientists. Some researchers on Twitter may be experiencing problems with OSF Preprints and not appearing to get answers to their questions, but there could be many researchers not on Twitter who are happy with OSF’s service, and people may be getting their questions answered by email or phone calls.

There are many ways to contact the COS. I prefer Twitter because it is basically a public form of email. If your question gets answered other people can see the solution, but Twitter may not be the best way to contact the COS. I think they might want to consider making sure threads on Twitter receive a resolution, but I also understand it takes time to monitor Twitter. I’m not really sure what the best solution is for efficiently answering questions and ensuring others also have access to those questions and answers.

We need organizations like the COS. Even my favorite publisher PeerJ, which offers a free preprint service, gave me lifetime publishing for 99 bucks, and has excellent technology, is a for profit company, and their technology is not open source. If someone wanted to set up a preprint service with PeerJ’s technology they couldn’t.

When bioRxiv was launched they had to license an ancient HighWire platform. If OSF Preprints was around back then maybe bioRxiv could have just used that for free, and it would be more reliable than what they have (although the OSF platform could use a visual makeover to look more like a preprint server instead of an OSF project).

Side note for those who think bioRxiv is best thing since sliced bread. My web server has been consistently encountering a problem while indexing bioRxiv. Here are some recent error logs:

What seems to be happening is when you try to view a new preprint sometimes you get an “Access Denied” page. If you follow preprint discussions on Twitter you will see many other users also find bioRxiv articles to be temporarily unavailable. This isn’t that big of a deal, I’ve taken some steps to make sure my indexing of bioRxiv is more robust, and I’ll take more steps if necessary.

There are also some other minor problems that I encounter with bioRxiv indexing, but this post isn’t about the problems with bioRxiv. I support bioRxiv since it is the primary driving force behind the uptick in preprint use by biologists:

But it is always good to point out these problems, just in case they aren’t aware of them and have the ability to fix them.

We must be careful however in assuming that these problems can be fixed. It is dangerous to assume that they are hearing about these problems from users and are simply choosing to do nothing about it. Maybe they are doing the best they can, and the HighWire platform is shitting the bed once in a while, I don’t know. When complaining about a service we should at least entertain the possibility that the maintainers are doing the best they can and there could be a reason why things are done the way they are.

Take PrePubMed for example. I don’t allow users to use any non-ASCII characters when searching names. Why? Because ‘Merica, that’s why:

No, the truth is when I developed PrePubMed I didn’t understand Unicode. I still don’t really understand Unicode, but now I have a little better grasp, and I think I might be able to handle names with non-ASCII characters. I’m just waiting for someone to complain about how xenophobic I am for not allowing them to make use of their fancy foreign keyboard. And maybe I’ll fix it, depends on my mood and who asks. Or maybe I’ll just tell them to type their name how God intended.

All of this just points to the need of the Central Service. The physics community has all their preprints at arXiv, which has an API and some interesting offshoots such as GitXiv, Arxiv Sanity Preserver, and SciRate. Why do physicists get these nice things but biologists don’t? Is it because more physicists have web development experience than biologists? Or is it because biology preprints aren’t centralized and there isn’t a single API to access them?

There still seem to be a lot of biologists who think we don’t need a Central Service. They are happy with bioRxiv, and think we should just support bioRxiv. To those people I present this paper:

If the authors connected their ORCID IDs maybe this would be acceptable, but come on, how are we supposed to identify who authored this paper? There’s no way a journal would allow this.

Which is precisely why we need the Central Service. We need standards for posting preprints. We need an API with all the preprints so developers can easily add value. We need plagiarism and other quality control checks so that funders can confidently allow preprints to be included in grants.

Preprints have a lot of skeptics, and I’m sure these skeptics are just waiting for a valid excuse for why we need journals and shouldn’t take preprints seriously. Preprints will at some point become standard practice in biology, but if the Central Service fails we could suffer a major setback.

If I could develop the Central Service on my own I would, but I can’t. We need organizations like the COS that have the expertise and resources to tackle these problems. Although we may not agree with what the Central Service should be, or how the COS devotes its resources, we need to throw our support behind the Central Service and COS. For my part, instead of tweeting to the OSF I’ll be emailing them problems I experience and pointing out what data they are missing. If others have opinions on what the Central Service should be I would advise sending emails to ASAPbio or sharing your vision in a blog post instead of tweeting into the void.