Scholarship should be open, inclusive and slow

Emily M. Bender
8 min readSep 6, 2023

--

This tweet of mine from a few days ago has been getting a lot of negative attention this week, I think because of some hostile QTs, and largely in the context of other people debating ACL’s anonymity policy.

As a result, I thought I should take some time to lay out my thoughts on peer review and access to scholarly publishing, in a format that has more room for nuance.

I should acknowledge that I am on the ACL’s Executive Committee (this year’s VP, next year’s President) and that the ACL currently has a working group looking into updating/revising the anonymity policy. I am not on that working group, though it will almost certainly bring proposals forward to the Exec for a vote, at which point I will have a vote. In this blog post, I am writing in my personal capacity and not speaking for the ACL.

I’d also like to clarify that what I describe here is what I see as the ideal we should be striving towards. This blog post is not presenting any policy proposals.

Photo of redwood trees and a lush forest understory. There is fog between the tree trunks.
Michael Schweppe, CC BY-SA 2.0, via Wikimedia Commons

Values

There are two values in this debate that I think have broad, though perhaps not universal, agreement:

Scholarship should be open: The results of scientific and other scholarly work should be accessible to the broad public, and not locked up behind paywalls. This is important for both the goal of scholarship (often publicly funded) benefiting society and the goal of research communities becoming more diverse.

Scholarship should be inclusive: A diverse research community does better research because it benefits from more perspectives AND no one should be prevented from participating in the research they want to do because of racism, sexism, ableism, classism, xenophobia, etc. (We are a long way from achieving this goal.)

A third value I think is less widely held, but it is important to me and I hope many others:

Scholarship should be slow: We engage in science and other scholarship to learn about our world and to serve our communities. In the best cases, we develop and substantiate new ideas firmly rooted in what has gone before and our new ideas respond to and uplift human values.

If we are under constant pressure to churn papers out quickly — either to appease the bean counters’ publication metrics or to lay claim to being the first to have some idea (or even less meaningfully, to being the first to hit some high water mark on some leaderboard) — we don’t have time to:

  • Thoroughly understand how our work connects to what has gone before;
  • Explore the ethical implications of our work, including seeking input from impacted people and groups as necessary;
  • Engage in research that crosses disciplinary boundaries, which requires building understanding across diverse vantage points;
  • Perform thorough and careful evaluation (including error analysis);
  • Provide and maintain infrastructure for genuine reproducibility.

(Fast scholarship also tends to be uninclusive, being accessible primarily to those who can drop everything in pursuit of deadlines and likely therefore don’t have caregiving responsibilities, community mutual aid commitments, disabilities or language barriers, and probably do have a wife at home to look after their needs.)

In this context, I highly recommend Min-Yen Kan’s COLING 2018 keynote:

[Link for slides]

The reason I think this third value is less widely held is the number of people responding to my tweet lauding the “fast progress” of work in machine learning and “AI” which they argue is fueled by the ability for anyone at any time to toss their non-peer-reviewed work on arXiv.

What peer review gets us

Scholarship is a conversation. We develop and test ideas and then write up the ideas and how we tested them to share with other scholars (and the broader community). Publications build on other publications and respond to still others. Peer review, when it is working well, doesn’t guarantee truth or correctness, but it does mean: This was examined critically and thoughtfully by 2–4 independent people with relevant knowledge who found it sufficiently solid (given their own knowledge) and worthy of others’ attention.

I’ve frequently seen it claimed that putting something up on arXiv for everyone to see functions as a kind of post-publication peer review: the idea being that good papers will get traction while bad ones will be ignored. Unfortunately, it doesn’t work this way: When we serve as reviewers, we are charged with critical analysis of the work. When we are looking for work to build on, we are frequently much less critical, and end up seeking out papers that say what we want to hear.

As a case in point, consider these three papers up on arXiv, which have all been cited to justify prompting ChatGPT to output what looks like scores in the form of NLP evaluation metrics as if they were actual results.

(As I write, these papers have been cited 45, 9 and 95 times respectively. 9 citations might be the equivalent of being ignored. 45 and 95 are not.)

Reply guys also like to point to papers that were rejected from conferences and then put up on arXiv and went on to be (in their telling, rightfully) highly influential. And I don’t doubt that this happens! But it’s a fallacy to argue that just because some worthy papers were rejected by peer reviewed venues, every paper that is rejected, is worthy.

What anonymous review gets us

Anonymous review (where reviewers do not have information about author’s identities nor institutional affiliation) is important in at least two ways:

  1. It raises the chances that the papers are reviewed on their own merits. Sloppy work by famous researchers should be viewed as sloppy work, not work by famous researchers.
  2. It levels the playing field for outsider researchers (somewhat): if all papers are anonymous at review time, an unknown researcher will have their ideas evaluated in the same way as a famous one.

I say “somewhat” under point 2 there because there are many, many shibboleths that indicate insider status, even without author identity or affiliation in play. Writing skill, writing style, etc will also impact reviewing. (At COLING 2018, we attempted to address this with an authorship mentoring program.)

Where preprints have a place

The most obvious positive use case for preprints is right in the name: It’s access to a paper that will be printed (published) but hasn’t been yet. In other words, when a paper has passed peer review but won’t appear for a while, posting a preprint gives access to other scholars and the broad public without further delay. (And if the publication venue is paywalled, it also circumvents the paywall.) For all I know, this is what arXiv was originally meant for or even how it’s used in other fields. In the parts of CS that are closest to my research (computational linguistics) or feed the current AI hype (AI/ML), that’s emphatically not how it’s being used.

A second use case is for papers that are under review, but have truly urgent information. I’m thinking of things like studies about the transmissibility of SARS-COV-2 early in the pandemic, or other such truly time sensitive information. In this case, it is very important (especially given intense public interest) for it to be clear that these papers have not yet been peer reviewed. (And if the peer review is meant to be anonymous, then the manuscript should be shared anonymously.)

A third use case is for papers that have failed to find a home, but that the authors believe provide a useful point of reference anyway. Here, a personal example is “Towards better interdisciplinary science: Learnings from COLING 2018”, which Leon Derczynski and I published as a technical report through ITU Copenhagen after it was rejected from a journal. It’s a report-back from our experience as PC co-chairs for COLING 2018 which we wrote in the hopes of informing future conference organizers and also demystifying the conference review and associated processes for junior scholars. By publishing it as a technical report, we give it a permanent home but also clearly indicate that it has not passed peer review. (The reviewers in this case thought the audience was too narrow; we disagree.)

A fourth case involves a hybrid of #2 and 3 — papers that are taking a while to find a home or are newly submitted but might be of interest to the community while under review, crucially with the fact that they are still under review clearly indicated.

How arXiv does harm

I see at least three different kinds of harm that come from arXiv and the way it is used in the ML/AI parts of CS.

  1. Encouraging flag planting: Many people seem to use arXiv (and object strongly to being told they shouldn’t or can’t) to timestamp certain results. But why would someone who values their research time work on such perishable topics? If a paper wouldn’t be interesting three months from now, that seems like a pretty clear indication that it’s not all that interesting now. Another way of putting this is: if you’re doing work that you’re worried will get scooped, you’re probably not asking very interesting or original questions.
  2. Leaning into biases: ArXiv doesn’t allow anonymous preprints, so of course the papers by big names and from big labs are going to get more attention and of course all of the usual well-documented biases against minoritized groups are going to come into play. In other words, arXiv as an end-run around peer review works against inclusivity.
  3. Eroding the value of peer review: When arXiv becomes such a center of gravity that even published papers are also put on arXiv for attention, and when everyone just cites arXiv versions of papers without checking if they have appeared somewhere with peer review, then we lose the value of the (expensive yet important) peer review process. The point of peer review is gatekeeping (of papers, not authors, obviously): so that someone coming from outside can find papers that have been critically appraised by knowledgeable peers.

These also have knock-on effects: the flag-planting culture leads to ever more pressure to “move fast”, impeding our ability to engage in slow scholarship. It also means that everyone’s attention is focused only on the latest papers, thinning the fabric of the scholarly conversation. The erosion of the value of peer review also allows what are effectively fraudulent citation rings (equivalent of fraudulent review rings) to hide in plain sight. Alex Hanna has some choice words to say about this in Episode 11 of Mystery AI Hype Theater 3000.

Alternatives that are available

Rejecting arXiv (and arXiv culture) doesn’t mean rejecting open and inclusive science. There are several things we can do instead:

  1. Publish in non-predatory, peer-reviewed, open venues. The ACL Anthology is a sterling example of this, hosting content not only from ACL events but also many other venues in computational linguistics/NLP.
  2. Explore anonymous preprints for papers under review. OpenReview does some of this. It could do more. ArXiv could allow anonymous preprints, too.
  3. Continue to use preprints but with clear indication of the status of the paper.
  4. Use careful citational practices, always indicating the publication venue for something, even if it is also found on arXiv. (This takes more effort than it should, not least because Google Scholar privileges arXiv in its results.)

And what of synthetic text?

Finally, it’s worth spending a few moments thinking about the spectre of synthetic text (i.e. the output of LLMs) and what that will do. People making plausible looking papers cheaply and throwing them at peer reviewed venues is going to put a lot of strain on the peer review process. At the same time, it’s going to make peer review more important than ever. We are already facing pollution of our information ecosystem and the need to level-up our critical evaluation of everything that arrives as text or electronic audio/video.

I noted above that people who are accessing papers in the course of their research are often looking for papers that say what they want to hear. And what is ChatGPT designed for if not outputting exactly what we want to hear?

--

--

Emily M. Bender

Professor, Linguistics, University of Washington// Faculty Director, Professional MS Program in Computational Linguistics (CLMS) faculty.washington.edu/ebender