My peer review wishlist

Amy J. Ko
Bits and Behavior
Published in
7 min readNov 12, 2018

It’s been about a month since I’ve posted here. Why? I’ve been reading instead of writing. I’ve been reviewing SIGCSE papers, ICSE papers, IEEE Transactions on Software Engineering papers, ACM Transactions on Computing Education papers. I’ve been editing my 7 wonderful doctoral students submissions. I’ve been guiding two students’ NSF graduate research fellowships and one other undergraduate graduate school applications. And I’ve been reading a few dozen CRA Outstanding Undergraduate nominations.

Now, some people don’t like reviewing. I happen to love it, especially peer review! I (occasionally) learn about new ideas, I help my academic community improve it’s scholarship, and I get to solve a very specific kind of puzzle: what can I say in 500–1000 words that will transform the author’s perspective on their own work? The teacher in me savors the chance to try to infer from someone’s writing how they’re thinking, and then find just the right words to shift their thinking toward my own research aesthetic. I don’t always succeed (especially in busy months like this), but I usually have fun doing it.

Unfortunately, many of the researchers in my communities don’t have as much fun reviewing. And I don’t blame them: we get short timelines, the volunteer work is mostly thankless (due to anonymity), and there’s hardly any interaction with authors or other reviews, aside from highly asynchronous and highly impersonal discussions on poorly maintained, poorly designed groupware like PCS or EasyChair (no slight against the developers who maintain these: you have hard, highly under resourced jobs just keeping these sites running).

After 15 years of reviewing, however, there are a few things that I think might make people actually enjoy it, and would make me want to do even more of it. Here’s my wishlist for academic peer review.

Evaluate against explicit criteria

Nothing drives me more crazy than a box that says “Write your review” here. These completely unguided peer review processes fail in two ways: they result in widely varying opinions about research, but they also fail to signal to new reviewers what aspects of research a community values or how to evaluate them. There isn’t just one kind of reviewing, there are infinite, and it’s an editor’s job to narrow the scope.

The International Conference on Software Engineering has made some progress on this recently, adding specific criteria such as:

  • Soundness. Are the works claims supported by the arguments and evidence presented?
  • Novelty. How much does the work advance our knowledge?
  • Clarity. How clear is the writing and presentation of the work?
  • Replicability. With some domain expertise, could the work (technical, empirical, or otherwise) be replicated?

What’s great about these is that they reduce many forms of implicit bias in the review process, compelling reviewers to address each dimension. I’ve used them to structure my reviews, and to make sure I’m being fair to what ICSE wants to select for.

Now, the ICSE process still leaves a lot to be desired. These criteria don’t apply to all types of scholarship, which excludes some type of novel work. Reviewers have widely varying ideas about these criteria, which still results in a lot of diversity in their assessments. And there’s no guidance from program chairs about where the “bar”: could a paper meet three of these pretty well, but fail at novelty, and still be published? And the last one is pretty easy to fix: post these on the call for papers so everyone knows what they’re being judged against.

Despite these limitations, I think all peer review processes should have explicit criteria. Communities should come together to craft and evolve them over time.

Train reviewers

One of the issues with explicit criteria above is that different reviewers don’t have consistent ability to judge each of them. Why not? Because we don’t teach researchers to review research.

I’ve always found this to be the biggest gap in doctoral education. Ph.D. students need as many opportunities to practice reviewing research against explicit criteria as possible, but few are invited to review until they’re senior. And if they are invited to review, conferences and journals rarely give them any training.

I remember the first time I reviewed for a conference. It was a CHI paper, probably in 2006 or 2007, just before I’d finished my Ph.D. I got a box that said “Write your review here.” And my first thought was, “What I’m I supposed to review? Everything? Just share my opinion? I have a lot of opinions, are you sure all of them matter?” Of course, they did all matter, and so did everyone else’s, and so the paper got wildly varying scores, which probably only confused the authors. The wildly varying reviews I had always received at CHI (and still do to this day) continue to be disorienting.

As a Ph.D. advisor, I fail at training my students too. When do I teach my doctoral students to evaluate others’ work? Who’s work do I have them evaluate if reviewing is confidential? Reading groups are one place to do this, but they’re often focused on reading the best and most relevant work in the field, not on the work that needs help. Most reviewing is of papers that need help.

Personally, I don’t think it’s advisor’s job to decide the criteria by which papers get reviewed. I think that’s an academic community’s job, and that conferences and journals are where we make these criteria explicit and train on them. The next time I program chair a conference or become editor of a journal, I’m going to do training for all reviewers.

Publish everything, including reviews

When I tell people about this idea, they think I’m crazy. Before you think the same, hear me through.

Here’s the basic idea: when we submit something for publication, we should conduct our normal review processes, but then publish anything that authors want published, on the condition that all reviews of their work are published as well, and open to further public review by everyone in academia. Alternatively, the authors could decide to withdraw their work and improve it further. The future of peer review should be open, ongoing, and transparent, with every work in the world subject to (moderated) eternal critique. For example, I should be able to go back to my older papers published in the ACM Digital Library and post a comment, saying, “This paper is rubbish. This other paper I published has a much better argument.” And I should be able to do the same to my peers. And everyone, including the public, should be able to view our comments on all published work, allowing scientific communication to proceed in public.

Why do this?

  • First, we’re past a time when we have to worry about printing costs. Storage is cheap and our documents are small.
  • Second, we already read and evaluate everything. What’s the point in doing all of that writing and reviewing, only to let 75% of it go unread by academia?
  • Third, why artificially gatekeeper work we believe has value, even if the work isn’t perfect?
  • Fourth, by rejecting so much work, we reject our communities’ ideas and efforts, which is demoralizing.
  • Fourth, the opacity of peer review is harming the public’s opinion of science. Show them how the sausage is made.

Most arguments against the vision above boil down to this: academics want to use conference and journal publications to signal merit, in addition to further research. “We must reject papers in order for accepted papers to have value,” they say, “Otherwise, how would we know what work is good and bad?” The ridiculous thing about this argument is that in the short and long term we don’t exclusively use publication to decide what work is good and bad. We use letters of recommendation, we use best paper awards, we use 10-year most influential paper awards, we look at citations. We all know that conferences and journals — even the top ones — are full of papers that aren’t that great, because our peer review processes aren’t that great. There’s little harm in losing one signal when we already have so many.

Moreover, we’d gain new signals. What kind of dialog about a published work has ensued since publication? Who’s talking about the work? How are they talking about it? Just imagine how evaluating a faculty candidate would change: it wouldn’t just be a list of published papers, but instead, a vibrant set of threads about the meaning and significance of a publication — or silence, which says something else.

Other arguments against this vision focus on fears that people won’t review anymore if their reviews become public. To me, this is just a sign that we aren’t properly educating public intellectuals. Are our egos so fragile that someone seeing our well-reasoned critiques would shatter our reputations? Or are we afraid that our critiques aren’t so well-reasoned?

Obviously, there’d be a lot to figure out to make such a model work. If there’s any community that can do it, it’s computing and information sciences, especially human-computer interaction.

Bandwidth limits our vision

Of course, implementing any of the changes above requires time. And given that we’re already volunteering, time is limited. Some of us need to commit some of our time, especially tenured professors like us, to make these changes a reality. Let’s show the rest of academia what peer review can be. Who’s with me?

--

--

Amy J. Ko
Bits and Behavior

Professor, University of Washington iSchool (she/her). Code, learning, design, justice. Trans, queer, parent, and lover of learning.