A potential approach to address the explosion of NLP paper submissions

The number of paper submissions to *CL conferences is constantly growing. It is overwhelmingly difficult to keep up with current literature, and the reviewer workload is becoming heavier with each conference. I would like to propose a different point of view of the problem and a potential way to mitigate it by creating a new submission (and publication) type which undergoes a different reviewing processes.

The explosion of (specific type of) submissions

Many of the publications in the field describe incremental work in the form of making small changes to existing models, using standard architectures and evaluating on some common dataset for the task. Let’s call them task-specific SOTA papers for the sake of this discussion. The task-specific SOTA papers contribute not just to the increase in the number of submissions, but also in the pace of research, subsequently making the field highly competitive. Many researchers including myself feel we are in a rat race to publish our models before someone else publishes better performance. Essentially, paper rejections often mean death penalty for the paper, as it is very likely that the paper wouldn’t still be relevant by the next submission deadline. At the same time, submitting an analysis paper to *CL conferences may get you reviews such as “it’s more like a class project than a research paper”.

But although task-specific SOTA papers are often considered more “engineery” than “sciency”, such papers, whose main contribution is improved performance, have value for real-world usages and especially for industrial purposes. The real issue is not whether they should or shouldn’t be published but rather whether they should be published at the expense of the more “sciency” papers. Having all the papers in the same pile (or, more accurately, piled by areas) pushes aside many high-quality submissions of other types.

A proposed way to solution: a new paper type

My inspiration comes from both the paper types in COLING 2018 and the SemEval shared task model. The COLING 2018 program chairs Emily Bender and Leon Derczynski defined 6 different paper types, one of them being the “NLP Engineering Experiment” category. It was defined as “testing a hypothesis about the effectiveness of a technique for a task” (a broader definition than my definition of task-specific SOTA, which included more than half of the submissions to COLING 2018). In shared tasks, the task and that data are common among all the participating systems. All model description papers are accepted, making the review process a quick sanity check for readability.

My suggestion is to hold task-specific SOTA papers to different standards than other types of papers. Let’s break the “intro-background-model-results-analysis” structure. We don’t need to read for the millionth time how important natural language inference or machine comprehension are. We don’t need an almost identical description of all well-known baselines. Instead, we do need a technical description of the model, well-documented attached code, and the model’s performance, which can be reported in a leaderboard.

Authors may still choose to submit an “NLP Engineering Experiment” as a regular conference submission rather than as a task-specific SOTA paper. In this case, they would be judged by the regular standards. In order to have their paper accepted, they would have to go the extra mile, for instance by providing an insightful analysis or by reporting performance gains across tasks and datasets.


  1. Task-specific SOTA submissions will become shorter and faster to review, reducing reviewer workload.
  2. The review process of task-specific SOTA papers will focus on specific things like reproducibility and readability, stressing these important issues and at the same time alleviating the authors’ need to write redundant literature reviews.
  3. The acceptance rate for task-specific SOTA papers will be high (similar to shared task submissions), reducing the number of submissions that lose relevancy between one rejection and the next conference deadline.
  4. The acceptance rate for task-specific SOTA papers will not affect the acceptance rate of the other conference submissions, making the program more diverse.

Open Questions:

  1. Should task-specific SOTA publications be part of the conferences, i.e. included in the proceedings and presented at the conferences?
  2. Subsequently, should we tie the submissions of task-specific SOTA papers to the conference deadlines or make it possible to submit such papers in every day of the year (similarly to the reviewing model recently suggested by Omer Levy)?
  3. Can a previously published task-specific SOTA paper be extended to a full paper (similarly to non-archival workshop papers or extending a conference paper and submitting it to a journal)?
  4. What would incentivize researchers to submit a task-specific SOTA paper? Would it count as a conference paper, a workshop paper, a shared task system description (higher acceptance rates) or a pre-print (no peer review)?

COLING 2018 made a step in a positive direction, with the goalto create a program of high quality papers which represent diverse approaches to and applications of computational linguistics written and presented by researchers from throughout our international community”. I suggest we take it a step further and define a clearer distinction between types of publications. As a field, we need to decouple the immediate progress in specific NLP applications from the potential future progress that insightful papers have to offer to our field.

Postdoctoral researcher at the Allen Institute for Artificial Intelligence (AI2) and the University of Washington, working on Natural Language Processing.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store