Peer review is about drawing informed, but often arbitrary lines between what can and cannot be shared.

A modern vision for peer review

Amy J. Ko
Bits and Behavior
Published in
13 min readDec 8, 2019

--

For much of the 18th and 19th century, academic publishing was a wild west of editorial practices. Some journals used editors at gatekeepers, while others practiced early forms of peer review, with editors asking for opinions of work submitted for publication. The function of peer review was wildly different than it is today. It wasn’t until the early 20th century that scientists actually started formally distributing submissions for peer review, asking other scientists with appropriate expertise to evaluate submissions. As this practice spread, peer review ultimately became institutionalized across academia and its publications. And, of course, academia and the world more broadly now treat peer review as the gold standard in vetting the quality of scholarly work.

Of course, peer review is far from perfect. Like most academics, I like to complain about its failings. I’ve written previously about my peer review wishlist, enumerating some of its many problems:

  • Expertise is scarce, which often leads to reviews that poorly judge the quality or significance of a discovery.
  • Time is scarce, which often leads to reviews that are rushed, poorly considered, or terse.
  • Review criteria are often unspecified, which leads to wild variation, unreasonable requests, and unpredictable outcomes.
  • Reviewers can be mean, which discourages new researchers from continuing careers in academia.
  • Software for peer review is often unusable, making it harder to have meaningful discussion with reviewers, meta reviewers, associate editors, and editors.
  • Senior scholars gate-keep the contributions of newcomers, applying old standards of evidence and old ideas to new perspectives and evidence.
  • Reviewers often set unreasonably high bars to protect the reputation of conferences and journals, slowing progress.
  • Peer review only happens once prior to publication; all other critique happens either in publications that cite the work, which is hard to find, in private, hidden from public view, or through retractions.

While my prior post suggested many process changes that might improve peer review (explicit criteria, reviewer training, more liberal publishing), I think those are only incremental improvements. Lately I’ve begun to think we need more radical changes to the entire paradigm of peer review. So in this post, I’m going to share a vision for peer review that throws almost all of our practices out, going back to the essence of peer review, while leveraging our modern, connected world.

Peer review is about evaluating specific minimum criteria for publication, but it is often used for more.

What is peer review for?

In essence, peer review has always been about a few basic things:

  1. Verifying the soundness of a publication’s methods for answering a research question,
  2. Ensuring the answer can be replicated with similar methods, and,
  3. Ensuring the work builds upon prior work, so there is an accurate record of the history of discovery.
  4. Ensuring the work is comprehensible to a broad audience of scholars, if not more broadly.

Evaluating these criteria involves understanding the research questions the publication poses, the methods used to answer the question, and the validity of the analysis. These are the kinds of flaws that, if left unchecked, can erode trust in the science and scholarship. After all, what most distinguishes science from other forms of discovery is our careful attention to validity and sound reasoning.

Of course, as we all know, no single paper can be “true.” In science and every other form of scholarship, truth—whatever that might mean for a discipline’s collection of epistemologies—emerges over time, after people discuss, debate, replicate, argue, and verify. Discovery is a collective effort that happens over time and across academia, not a solo effort that happens one paper at a time, siloed in disciplines.

There are many other criteria often reviewed as part of peer review, but that I don’t believe are essential to the three goals of peer review above:

  • Relevance. I believe that researchers are quite poor at judging whether anyone will care about a discovery in the future. We try to predict this with conference paper awards in some disciplines, but we’re usually wrong: highly cited work is often a surprise. And why are we asking peer reviewers to predict who will be interested anyway? Just let people read the work and decide for themselves. Lastly, relevance is an artifact of the notion of a “venue” for publication, such as a journal or conference. In our connected world where search is the front door to finding new work, not centralized curation, what is the value in segregating publications by venue? We have many more options for curation.
  • Writing. This clearly matters, as it influences other important criteria like soundness and replicability. Incomprehensible writing, or even writing that’s missing some key details, shouldn’t be considered ready for publishing. However, but writing is implicit to an assessment of soundness and replicability. There are many judgements about writing that appear in peer review that are really just matters of opinion or style (complaints about section ordering, preferences about additional topics to discuss, arguments about passive voice, etc.)
  • Significance. Discovery should move forward. But is it really essential for publication? The replication crisis would suggest that only valuing novelty is problematic, and that we actually should have more work that verifies rather than just advances with new ideas. And can other researchers really judge how significant something is? Major awards in science often come decades after a discovery because everyone knows we can’t judge significance until much later. Why not publication all the incremental work too? Finally, much of significance judgements is about whether a work is “good enough” for a high status journal or conference, which is much more about reputation than discovery.

In essence, I’m view peer review as far more narrowly about judging validity and clarity than other qualities that are better judged by future collective use of publications.

We have the internet now. Separate publishing from peer review, integrate community, open the process, and unify all of academia.

A modern vision for peer review

If the core purpose of peer review is to verify soundness, replicability, and grounding in prior work, what would a modern peer review process look like that focuses only on these, while avoiding all of the problems I enumerated earlier? To begin, we have to start from some different principles:

  • Openness. Current peer review models use a gatekeeping model, which leads to delay, discouragement, and concentration of power. Starting from a principle of open publishing would ensure that anyone can share anything—just as anyone already can on the internet—and then commence with peer review. Like at arXiv.org, this would make sharing immediate, eliminating delays in sharing. But unlike arXiv, peer review would be built into the platform.
  • Transparency. Current peer review models keep peer review private, hiding the critiques that reviewers make, hiding earlier versions of publications, and hiding editorial judgements. All of this confidentiality leads to many of the problems I listed earlier, disincentivizing both high quality initial submissions from authors who will face no penalty for sharing unpolished work, and disincentivizing constructive critique. We should pursue the opposite principle, keeping the entire peer review process open and archiving, incentivizing high quality work and high quality review.
  • Unity. Current peer review models, which are aligned with specific journals and conferences, segregate discovery, reinforcing unhelpful disciplinarity. We should return to the original state of academia, where there was one single unified community, with no artificial barriers between one community and another, where new work builds upon prior work, regardless of its community of origin.

To implement these principles, I envision a single digital library for all of academia, where anyone can submit a paper for publication, and anyone can review submitted works. It would be authors responsibility to solicit reviews, but there would also be many incentives for unbiased, high quality review. The review itself would be public, and the judgement of whether something is considered “peer reviewed” is collectively determined, incremental, and revokable. Let’s call this hypothetical digital library Academe.

How would this work? Let’s walk through a hypothetical example:

My Ph.D. student and I had a manuscript we’re preparing for review. We knew it was going to be reviewed in public, so we were very diligent about making sure everything is just so, because we wanted to get it quickly peer reviewed without any revision. That included ensuring the paper wasn’t too verbose, because otherwise we won’t be able to find reviewers willing to read. But it also had to have enough detail to be replicable and sufficiently grounded in prior work.

We published the manuscript to Academe. We entered our names, affiliations, and conflicts so Academe could disallow conflicted reviewers from reviewing. We also entered a set of expertise tags (Interview, Survey, Multivariate Regression, Communities of Practice, Interest Development) to indicate the expertise required to review the paper. The manuscript appeared immediately online, open to review, but with no peer reviewed badge, because we did not yet have at least 3 reviews from peer experts. However, to avoid implicit bias, the manuscript appeared anonymously to everyone else, so that readers and potential reviewers could not know our identity.

As part of submitting, we selected two dozen potential reviewers that were tagged with the expertise tags we chose. Because the Ph.D. student and I had high Academe karma—I because I review a lot and because they’re a newcomer, so they were granted enough karma to last the duration of her Ph.D. program—we decided to spend a lot of that karma on review requests, because we know who’s qualified to review, and wanted to get the right people. Each person we tagged received notifications about a request to review, just as they would in current peer review models for a journal or conference. Of the two dozen we invited, only two thought the article was worth their time to read, and the rest declined. And of the two that considered, only one seriously considered, because she was low on Academe karma, and needed more for a manuscript she’s preparing. She accepted, and we had one review pending.

We were still short two reviewers, but we remained hopeful that others would stumble upon our article in the #computingeducation channel. After a few days, two more people found the article when browsing the channel’s feed for papers to review to increase their karma, and they decided to review. Over the course of a month, all three reviewers read the work, and as part of the review form, were asked to explicitly declare the criteria they’re using to evaluate the work. They chose some criteria from an existing library of criteria, such as “Soundness”, defined as a logical connection between the research questions and analyses, and “Replicability,” defined as the ability for others with comparable expertise as the authors to replicate the results, and “Groundedness,” defined as the extent to which the paper adequately addresses prior work. Each of the two reviewers also chose their own unique criteria. One chose “Novelty,” defined as the how surprising the results are to someone familiar with the literature, and the other chose “Provocativeness”, defined as the extent to which the paper might engender debate in the #computingeducation channel. The two reviewers wrote their evaluations of the paper against each of the criteria, and for each, decided whether the paper meets the criteria sufficiently, then submitted.

But the reviewers’ reviews weren’t posted publicly yet. To actually earn their Academe karma, other members of the #computingeducation community also in search of karma, needed to act as metareviewers, review our three reviews for quality. Over the course of a week, the three reviews are were evaluated, two deemed sufficiently sound and constructive to publish. The other review was a little nasty, and so the meta-reviewer provided some feedback and asked for revisions. Eventually, the third reviewer improved their review, and it was eventually published.

Only two of the reviewers believed the paper had met the Soundness, Replicability, and Groundedness criteria, the minimum requirement for peer review. The third reviewer requested revisions related to Replicability, asking for more details about our methods. Additionally, they noted that the paper also required expertise in Interrater Reliability, which meant that the paper could not be peer reviewed until it had also been reviewed by someone with that expertise. Fortunately, one of the other reviewers already had that expertise, so another review wouldn’t be required.

The requests were reasonable, and so we quickly prepared a revision including the additional details, and posted to Academe. The HTML submission format allowed the work to be easily authored, easily revised, and easily compared to the original submission, showing a clear history of changes for reviewers and future visitors. Our original submission and its reviews became archived but still visible and the new version, with no reviews, became live. All three reviewers were notified of the revision, and could quickly see the delta between the previous and new submission. The reviewer who noted the paper hadn’t met the Replicability criteria before updated their review and granted the Replicability criteria and submitted. The other two reviewers read the other reviewer’s revised review, and made minor updates to their review as well. All three reviews were live, all indicating having met the minimum criteria for peer review in the channel.

Immediately after, three things happened. Our names and affiliations were revealed to all future viewers of the paper on Academe. My Ph.D. student and I received a notification of our paper having met minimum criteria for peer review. And a big beautiful badge appeared on the page for our publication, so that everyone knew it had been peer reviewed. The Ph.D. student and I added it to our C.V.s, and shared links to it on social media.

Over time, as more people read our work, more people wrote reviews. Some reviews added additional criteria. Others disagreed with prior reviewers criteria. Each time a new review was posted, all the previous reviewers were notified so they could reconsider their own reviews. At one point, we even lost our Replicability criterion, and therefore our peer review badge, because someone tried to replicate our findings but was missing a critical detail. We once again revised the paper, and regained the peer review badge. The paper had a long, vibrant life over the course of a decade, and was revised in minor ways to ensure a more complete archive.

This vision for peer review has many potential benefits beyond this scenario, especially for anyone viewing a paper not part of the process:

  • The platform could support pre-registration of studies, creating a single place to share hypotheses and study designs that are later tested.
  • Name changes and other minor revisions and corrections could be easily supported.
  • Articles would have entire version histories, rather than just the single, unchangeable versions in current digital libraries.
  • Visitors to an article could see who reviewed it, what their reviewing history is, and what their expertise is, to make an interpretation of the quality of peer review.
  • Visitors would see the entire discourse about the paper and its strengths and weaknesses, not only at time of publication, but over it’s entire history of being cited.
  • It would be easier to determine what work is popular, contentious, or uninteresting, because the activity of sharing work and the activity of publishing it would be in the same place.

The Academe, would therefore be more than just a peer review platform, but a new platform for transparent scholarship.

What about…

This platform wouldn’t solve everything. In fact, it would raise some new questions, amplify old problems, and lead to dramatic change in the conduct of scholarship in academia and beyond.

  • What about conferences? In computing, we’ve long used conferences as a place for archival work. This would completely eliminate that practice. What would we use conferences for? Anything we want, other than boring conference presentations, such as curating and discussing the best work from Academe to discuss, planning new research, building community, like the rest of academia does. We might even have fewer conferences, better utilizing travel funding and reducing carbon emissions.
  • What about journals? They would shut down. Yes, that would make it harder to decide what work is good, because we wouldn’t be able to use journals (and conferences) as proxies for quality. In my view, good riddance. Let the work stand on its own, and let people read it, and reviews of it, if they want to judge its merits.
  • What about tenure and promotion? Nothing really needs to change here. We’d find exciting new ways of signaling rigor, quality, and impact, like how many reviews paper attracts, or the tags that respected researchers give to papers. If anything, merging peer review and academic discourse into one platform would create many more better signals, rather than eliminate them.
  • What about older publications? I’m sure there’s a way we could manage the existing archive. It would involve lots of really complicated IP negotiations with existing publishers to allow content to appear in the new unified platform. It might unfold like the move from physical to digital media for music, with labels slowly opening their catalogs and finding new ways to monetize their content (or perhaps better yet, just open their content and shut down).
  • What about publishers? Do we really need them? Everything would be open and free, as scholarship should be. We’d have to start an global not-for-profit that would have some serious bandwidth bills, but I suspect that charging a nominal publication fee of $1 per paper at millions of publications per year would be more than enough revenue to pay for the traffic and a team of designers and engineers to maintain the site.
  • What about new researchers? They might have a harder time getting their work reviewed because reviews would no longer be guaranteed by conferences and journals, but that’s what advisors and collaborators are for. Independent researchers might have the hardest time with this system, but they could always review others’ work to gain status.
  • What about gaming the system? There are some interesting challenges here about junk science. It’d be easy for people in a system like this to post something, make some fake accounts, write reviews that ensure peer review. I’m sure we’d invent some good ways of detecting fake reviews and certifying researchers as reputable. But it’d probably be the same mess that it is now, with fake journals publishing fake work. Perhaps bringing that into a single system would make it easier to detect.

There’s no better time than now

I’m under no illusion that any of this change would be easy. In fact, I’m highly skeptical that such a monolithic vision would ever be implemented. But science, the broader enterprise of scholarship, and the academic institutions that are most concerned with supporting these endeavors, cannot be complacent. We need to invest in making our practices more transparent, more trustworthy, and more open to the world. We can’t let short term goals like yet another publication get in the way of our long term need for better publishing infrastructure. Let’s no longer invest in small, incremental changes to improving our current practices. Let’s build the future we need, even if it means radical change. What’s academia for, if not pursuing radical ideas?

--

--

Amy J. Ko
Bits and Behavior

Professor, University of Washington iSchool (she/her). Code, learning, design, justice. Trans, queer, parent, and lover of learning.