A very tall stack of documents, hoisted by a group of 10 very small stick figures, each using a different strategy to keep the stack lifted. Some use their strength, some use tools, some use a stool, some help others be tall enough.
We can only solve this problem together.

Sustainable peer review via incentive aligned markets

Amy J. Ko
Bits and Behavior

--

For those stumbling upon this, note that peer review is not my main jam. That would be design-oriented scholarly envisioning about more equitable computing futures. Consider this a passionate digression into academic wonk in service of that mission.

I occasionally write about peer review (A Modern Vision for Peer Review, My Peer Review Wish List). For those who’ve read my thoughts, you already know I think it’s very broken and have lots of ideas for how we might fix it. And I’m not alone: it’s been a perennial topic since the earliest days of science, and a site of collective imagining around many different competing goals of fairness, discovery, equity, and learning. And it has long been a site of experimentation, with new reviewing processes, new platforms, new training, and new policy.

But as much experimentation as there is, in my 20 years of peer review experience, I generally don’t see experiments that tackle the root cause of most problems in peer review: the incentives.

The incentive problem is pretty simple, and has its roots in academic institutional goals and the broader social structure of collective discovery. Researchers are incentivized to make progress and disincentivized to evaluate progress. There are many ways this misalignment has been characterized: research over service, publish or perish, etc. No matter how we describe it, the basic economic dynamic is the same: we write about more discoveries than we care to read, creating a perpetual labor scarcity that taxes our limited time.

This isn’t a property of culture, or practices, or training, or other variations in the implementation of peer review. It is inevitable: until the volume of work we create is linked to with the volume of work we evaluate, we will always struggle with scarcity.

Misalignment Problems

This fundamental misalignment between reading and writing causes numerous problems in peer review:

  • Difficulty recruiting expert reviewers because they have net negative incentives to agree. This is of course the most fatal flaw of the expertise misalignment, because it defeats the whole purpose of peer review.
  • Poor quality reviews stem from no consistent incentive to write high quality reviews, other than reputational risk, which appears to be insufficient to prevent poor quality reviews.
  • Editors and program committee chairs can’t ask for improved reviews because they often have little leverage over their volunteer reviewers, because they are desperate for reviews.
  • Reviewers often say racist, sexist, hegemonic things because there is little consequence when they do. This is due partly to anonymity, but also because editors can’t push back: if they do, they might not get the review and have to spend even more time seeking volunteer reviews.
  • Reviewer skill is often lacking because of the lack of learning contexts for growing it. And those learning contexts don’t exist because there’s no incentive to create them.
  • Some people end up doing far more work than they create because of a sense of duty, or worse, obligation, out of fear of retribution from more senior people.
  • Graduate students are exploited for review labor because they have little power to say no in times of labor shortage.
  • Few people want to take editorial roles because most of the labor is begging others to do thankless, disincentivized work.
  • Some researchers indiscriminately generate peer review labor because there is no cost to submitting (myself included). But they often do not do an equal amount of labor to compensate, or cannot, because they aren’t invited to review, or are otherwise overcommitted.

Some might quibble that there are some incentives to review. And there are: reading a community’s work, being recognized for your expertise, a sense of duty and altruism, and the reputation bump of being selected for an editorial board or conference program committee are all modest incentives. But most of them are tiny relative to publishing incentives, and come with a cost of overwork, burnout, and guilt, because time is a fixed resource for each individual.

In the case of senior faculty like myself, it is even worse, because the incentives basically disappear. As a full professor, I don’t need to read the work (because I see what’s coming through peer review in fundraising), I don’t need a reputation bump, and my sense of duty and altruism is more than met by my other service and administrative commitments. And yet, my lab is more productive than ever, creating more reviewing labor than ever.

So if these modest incentives don’t work, why has peer review lasted centuries without collapsing? In essence, our current incentives mirror a Ponzi scheme. Senior researchers, whether they intend it or not, cannot be said no to, because they offer the promise of resources such as jobs, promotion, and tenure inherent to our rank-based hiring and promotion structures. But senior faculty themselves can easily say no without consequence, even though we often have the expertise. This dynamic is a socially destructive one, because it requires senior people to demand free labor, and requires junior people to say yes to maintain opportunity and employment, at least until they themselves become senior, and inevitably exploit those below them, even when they don’t want to. Everyone engaged in peer review knows this to be true; some are more comfortable with exploitation being at the center of this work than others. And I don’t see all of academia abandoning rank any time soon.

Ponzi schemes eventually fail, and I believe we are seeking peer review’s Ponzi scheme fail, triggered by the pandemic. Most of those in editorial positions, myself included, have watched this unsustainable dynamic slowly crumble. The number of declines has increased an order of magnitude and have not gone back down. Review quality, in my experience as a conference program chair and journal editor, has gone down as senior reviewers focus on publishing and administrative crises and junior researchers are further exploited to fill the gaps. We are watching the slow failure of a system that has always been unsustainable, broken by a collective realization of its exploitative unfairness. And I know, from countless emails from declining reviewers and editors, that many are loathe to return to the previous status quo, and sometimes wracked with guilt at having to say no. It’s not working anymore, and everyone knows it.

Incremental Solutions Fail

Since these problems have long been known, there have been many proposals for how to fix them. But all those I’ve heard, implemented, and seen attempted ultimately are iterative tweaks at the edges, only slowing peer review’s inevitable collapse:

  • Pay reviewers. In nearly all cases, this does not create more time, but does create warped incentives for reviewing. It would certainly help junior reviewers who are under-compensated. But it doesn’t fix the root cause and assumes that there’s some revenue stream to do it (maybe for profit publishers, but much of academia is self-subsidized and fueled by not for profit digital libraries). The $200–800 NSF pays certainly doesn’t motivate me to spend 40 hours on panel reviews, flights, and meetings.
  • Charge for submission. This creates regressive inequities in who can submit, and comes with all kinds of global complications with financial currency markets.
  • Create more community, mentorship, and training around reviewing. This is necessary, but doesn’t change the economics of people’s time. So the effect is palliative.
  • Recognize reviewers. All this does is create a tiny repetitional incentive alongside a massive disincentive. And in some cases, it threatens reputational incentives, when it creates a perception that someone is better at evaluating discoveries than making them. There’s also a disincentive to receive such recognition: it might mean that someone is invited to do even more reviewing than they were before.
  • Create reviewing standards. Again, yes, but without any incentive or ability to enforce following them, it won’t change behavior. As someone who has written many of these guidelines, I also know that most don’t read them. It also risks erecting epistemic barriers to discovery. We need this, but they don’t address the labor scarcity.

An Anti-Capitalist Proposes a Market

So what do we do? I actually think the problem is relatively simple, at least conceptually. And as someone who is by default anti-capitalist, it’s one I dislike having to propose. But here it is: we need a market.

Before my fellow market-skeptic comrades balk, here’s how it would work:

  • You review at a level of quality an editor deems meets their reviewing standards and you get a token.
  • You want to submit something for review? You pay some tokens. This is probably 3 in most cases, to align the amount of work a submission creates with the amount of reviewing work done. But this can obviously vary by community.
  • Junior folks get some number of free tokens to last how ever many years a community decides they might need to learn to be a good reviewer. Maybe enough to get through a PhD, or close to it. And advisors could gift more tokens to them if they run out, exchanging their expertise for their students’ ability to publish.
  • Editors, associate editors, program committee chairs and members all get tokens proportional to their editing workload. I edit ~150 submissions a year for ACM TOCE for example, each taking 10–60 minutes, so maybe I’d get one token per 10 papers.
  • People who take on other peer review related service (publications boards, steering committees, etc.), might get some tokens for their service, to incentivize peer review maintenance work.
  • Communities could devise all other kinds of equitable token distributions, for parental leave, invisible labor from mentorship and marginalization. For example, my community might give extra tokens to Black women who take on a disproportionate amount of mentorship, diversity, equity, and inclusion service, and emotional labor caused by generations of racist, colonial structures built into academia.
  • Researchers could freely gift their tokens to help people facing crises, gifting time and freedom to focus on the crisis without having to deal with guilt of reviewing declines.
  • Non-student newcomers to a community could also receive tokens when they first join a community, to reduce barriers to interdisciplinary.

All this would require is some centralized research community database of tokens, perhaps linked to ORCID IDs. (And explicitly not a public database like blockchain, allowing gifting to be private). And then every time someone reviews, the editor in charge credits a token and every time someone submits, they deduct tokens. No guilting or shaming of free loaders or burnout for altruists, just a simple exchange and accounting of expert labor. To ensure accountability for editors granting and deducting tokens, there could be some kind of independent auditing process run annually, on things like anomalous gifting or reviewing outliers.

A More Placid Peer Review

I predict a market like this would quickly improve all of the problems I noted above:

  • Reviewer recruiting instantly becomes easier because everyone suddenly needs tokens to meet their highest institutional priority: to publish. Researchers would eagerly volunteer to be on program committees, editorial boards to preserve their ability to publish. All of the fraught manipulation that goes into compelling or shaming reviewers to volunteer goes away, and editors would likely have expanded choices of experts from the surplus.
  • Editors could set standards for reviews and enforce them because they can withhold tokens until a reviewer meets basic reviewing standards. This includes removing hateful, oppressive, dehumanizing language, or whatever standards a community commits to, since including it might threaten someone’s ability to publish in the future.
  • Reviewer skills would increase because they would need to review to publish, and would have an incentive to closely examine reviewer training and participate in reviewing community building to ensure they successfully meet reviewing standards.
  • Reviewers driven by altruism and duty would no longer face burnout because it would be perfectly transparent how much work they need to do to participate, which they would not have to exceed.
  • Graduate students would be less likely to be exploited because advisors and community members would not be forced to compel students to review to address reviewing labor shortages.
  • Incentives to take on service roles such as editor positions would greatly increase, because they would come with guaranteed tokens for publishing and free them from other reviewing work.
  • Researchers would have a strong disincentive to submit too many papers because there would be a cost to it. No more freeloading.

Such a market would also bring new benefits that aren’t possible now:

  • It would change the job of editing from one of begging for the best experts available to one of judging and selecting appropriate expertise from an abundance of volunteers. “You’re a great fit for one of my submissions, but I already have the reviews I need for my submissions. I’ll let you know if one falls through.
  • Reviews could be more on time because editors could have policies that make tokens depend on timely reviews, or offer incentives for quick turnaround. “One more week to get me that review. Do you need an extension?
  • A market would enable pro-social gifting of time to friends and colleagues. “I did three reviews for you for your birthday and also learned about some cool new work! Win-win!
  • Scholars would have increased agency over when they review, allowing them to bank reviewing labor to make space for things like sabbatical, vacation, etc, without threat to reputation. “I’m getting married in June, so I did all my reviewing earlier this year”.
  • This system would enable communities to raise or lower the token cost of publishing to respond to shifts in community size and productivity, raising costs to reduce community workload and lowering costs to encourage more work. “We just got a bunch of newcomers to large language model ethics and need to increase the price of submission temporarily to address a deficit in reviewing labor
  • Communities could create spaces to donate tokens to community members in need. I could imagine many people with excess tokens giving philanthropically to address equity gaps. “Hey y’all, I was going to review to submit this paper, but we just lost our house to flooding. Can you chip in so I can submit?”
  • In some cases, this market might help nudge communities to shift from quantity of publications to quality. It creates an incentive to not waste tokens by submitting work that isn’t quite done, and disincentivizes splitting contributions across multiple papers. “Do we submit this as two, and take on a bunch of reviewing labor, or just do one, so we can focus on that grant proposal deadline?
  • Communities could come up with fun names for tokens that reflect their values and interested. “We’re particle physicists, and so obviously our tokens are called atoms.”

Challenges for Market Design

I’m not going to foolishly claim this would be perfect. A market like this would come with all the problems of any market. Some would try to exploit it, some would try to circumvent it, some on the boundaries of a field would not be able to participate as equally in it if their publishing and reviewing is split across communities that do and don’t use a system like this, or have different currencies. It could create some warped incentives to review work for which one has no expertise, just to get a token. And so it would need all the things that markets need to work: thoughtfully designed regulation to prevent abuse, collapse, etc.

Of course, many of these problems exist already in the status quo. They’re just not as visible and are considered acceptable because our currency is one of power and exploitation. Having an explicit accounting of reviewing labor would allow us to identify these problems and give us explicit ways of trying to address them, unlike the power-driven interpersonal system we have now, which requires fraught interpersonal pleas, cat herding, and rigid power structures built into longstanding academic hierarchies.

There is also the lurking expertise matching problem, which peer review has always struggled with. There are never enough experts and they are never available at the right time, and so papers often do not get the reviews they need. In current structures in closed reviewing committees (e.g., many conferences in CS that only allow PC members to review), this could be worsened, as some experts would have more agency to cap their reviewing labor, creating expertise shortages. One way around this might be giving editors and PC chairs the ability to offer higher token compensation for areas with expertise shortages. The dynamics of this could be tricky, but no more tricky than they are now, where we just settle for mismatched expertise and try to solve it through coercion and pleading. A currency would at least give us a transparent tool to entice experts.

A related problem is marginalized research topics. There could be situations where someone wants to publish, has no tokens, but also cannot find any opportunities to review, since their expertise is not needed. This might be unavoidable: when it happens now, it’s also hard to publish, because reviewing communities often lack the expertise to properly review topics among the margins. But authors could try to persuade others to gift tokens to submission (and maybe even drum up reviewers in the process), generating interest in the topic. And they can always do what happens now: start a new venue, and build a new community of reviewing, creating reviewing opportunities by creating opportunity.

Aligning incentives with academic publishing incentives doesn’t work for every institution. For example, some industry experts may not have an an incentive to publish, or to review. So they wouldn’t be affected by this change, and would still need to donate their time. (They might even choose to donate their tokens, since they don’t need them). But if they do want to publish, they can’t do so without contributing some reviewing labor.

Another challenge is spanning interdisciplinary boundaries. For example, suppose a scholar does a bunch of reviewing in another community, but it doesn’t use a system like this. It would be necessary to offer newcomers, just like new graduate students, some initial tokens to encourage interdisciplinary participation. But newcomers would eventually have to contribute to offer reviewing labor to participate. If other communities do use a system, but with a different currency, then there’s also a problem of exchange rates. I think academic norms don’t very much though, so scholars might just tolerate that their tokens in one community might be worth less in a different community.

There’s also nothing about a market itself that would improve review quality. That requires standards and enforcement (regulation, essentially), and accountability of those charged with enforcement. Program chairs and editors-in-chief would ultimately have to be charged with evaluating the editing work of program chairs and associate editors in order to ensure quality. And if reviews are still confidential, then there would still be no accountability about review quality. See my broader visions for peer review about how we might deal with that.

Finally, peer review is facing looming threats of large-language model fraud, with people generating research papers with AI. The same risks currently exist for AI-generated reviews. Creating a market could increase incentives to generate review, because there would be something of value for creating a review. But I actually think the opposite would happen, because there would also be a cost on submission. People looking to generate fake reviews to submit real papers would have to get past motivated and compensated human evaluators with expertise, rather than the status quo, where editors are uncompensated.

Let’s Pilot!

I want to try this. And it wouldn’t take much to prototype and experiment with something like it. For example, the conference I just attended, the ACM SIGCSE Technical Symposium, is run by the ACM SIGCSE Board. The board could commit to a three year community-wide experiment:

Year 1

  • All SIGCSE conferences and the ACM TOCE journal would create a shared database of reviewing contributions for a year. We might need to build a simple web app for having ORCID linked accounts, an audit trail of reviews completed and papers submitted, a way to transfer tokens to others, and a special role of editor to enter reviews completed. We could eventually build ways of importing exports from reviewing systems.
  • Conference steering committees and journal editorial boards would commit to the pilot formally.
  • Everyone would get some number of tokens to start with. The first year would be defining equitable rules, soliciting community input about the types of compensation they believe is fair for scholar’s different roles, contexts, and seniority, and optionally accounting for some amount of historical publishing and reviewing data.

Year 2

  • Pilot in all ACM SIGCSE conferences and the ACM journal, TOCE, crediting review tokens and debiting submission tokens as described above. Non-ACM journals like CSE could opt in.
  • Each conference would track the problems that emerge through community feedback after each review process. The journal, ACM TOCE, would provide biannual data and feedback.

Year 3

  • Implement improvements based on year 2 feedback.
  • Hold an advisory vote at the end of year 2 to decide whether to revert to the old system or keep the new system and improve it.
  • Write a public report to communicate to academia more broadly the outcome of the trial and the rationale for the keep/abandon decision.

By the end, I predict a vast majority would vote to keep the new system, and that many other communities in academia would be eager to hear our experiences and adapt them. In 5–10 years, I could imagine much of academia and public funding agencies shifting to this system, and we could finally be done with our three century old exploitative model.

Let’s Goooooooooooo

Ultimately, this change is about shifting labor and reducing unnecessary labor. The system would ensure that everyone contributes enough work to cover the work they create, and as part of this, reducing our overall work by eliminating unhelpful work, and giving us agency in when we do reviewing work. These are things we all want; we just have to agree on a way to do it.

Markets aren’t a solution to everything but I think they are a solution to this. History has proven that when regulated well, they are great ways to make the most of a scarce resource — and certainly better than one that centers rank, seniority, and exploitation.

Did I miss some fatal flaw? Do you also want to try this? What other benefits or risks do you see? Share your thoughts or run with the idea—I don’t own it and don’t care if I get credit. But in computing education or HCI, I am more than happy to help create this, if not just out of self interest: As Editor-in-Chief of ACM TOCE, I’m tired of begging people to review, my board is tired of begging people to review, and I’m sure our community is tired of constantly decline. And I’m happy to take the blame if it goes horribly wrong; that’s how desperate I am as an editor.

P.S.

The computing ed tokens should be called taterthoughts, because tater tots are tasty. And also, potatoes are hardy roots rich with assets that can grow in many conditions and in many varieties with gentle guidance from gardeners and good soil — a quirky metaphor for students, teachers, and schools — and I like the idea of reframing reviews as the thoughts of a potato, like a kind of passive, curmudgeonly, and lumpy form of sustenance to keep resource poor academics alive in times of famine.

See how fun this is???

--

--

Amy J. Ko
Bits and Behavior

Professor, University of Washington iSchool (she/her). Code, learning, design, justice. Trans, queer, parent, and lover of learning.