SemEval discussions @ NAACL 2019

Ted Pedersen
Jun 15 · 4 min read

The SemEval workshop took place during the last two days of NAACL 2019
in Minneapolis, and included quite a bit of discussion both days about the future of SemEval. I enjoyed this conversation (and participated in it), so wanted to try and share some of what I think was said.

A few general concerns were raised about SemEval — one of them is that many teams participate without then going on to submit papers describing their systems. Related to this is that there are also participants who never even really identify themselves to the task organizers, and in effect remain anonymous throughout the event. In both cases the problem is that in the end SemEval aspires to be an academic event where participants describe what they have done in a form that can be easily shared with other participants (and papers are a good way to do that).

My own informal estimate is that maybe a half of participating teams submit a paper, and then half of those go on to attend the workshop and present a poster. So if you see a task with 20 teams, perhaps 10 of them submit a paper and maybe 5 present a poster. SemEval is totally ok with teams that submit a paper but do not attend the workshop to present a poster. That has long been the case, and this was confirmed again in Minneapolis. The goal then is to get more participating teams to submit papers. There was considerable discussion on the related issues of why don’t more teams submit papers, and how can we encourage (or require) the submission of more papers?

One point made is that SemEval participants are sometimes new to our community and so don’t have a clear idea of what a “system description paper” should consist of, and so might not submit papers because they
believe it will be too difficult or time consuming, or they just don’t know what to do and fear immediate rejection. There was considerable support for the idea of providing a paper template that would help new authors know what is expected.

It was also observed that when teams have disappointing results (not top ranked) they might feel like a paper isn’t really necessary or might even be a bad idea. This tied into a larger discussion about the reality that some (many?) participants in SemEval tasks focus on their overall ranking and less on understanding the problem that they are working on. There was discussion at various points about how to get away from the obsession with the leaderboard, and to focus more on understanding the problem that is being presented by the task. A carefully done analysis of a system that doesn’t perform terrifically well can shed important light on a problem, while simply describing a model and hyperparameter settings that might lead to high scores may not be too useful in understanding that same problem.

One idea was for each task to award a “best analysis paper” and potentially award the authors of that paper an oral presentation during the workshop. Typically nearly all presentations at SemEval are posters, and so the oral slots are somewhat coveted and are often (but not always) awarded to the team with the highest rank. Shifting the focus of prizes and presentations away from the leaderboard might tend to encourage more participants to carry out such analysis and submit papers.

That said, a carefully done analysis paper can be fairly time consuming to create and may require more pages than the typical 4 page limit. It was suggested that we be more flexible with page limits, so that teams could submit fairly minimal descriptions, or go into more depth on their systems and analysis. A related idea was to allow analysis papers to be submitted to the SemEval year X+1 workshop based on system participation in year X. This might be a good option to provide since SemEval timelines tend to be pretty tight as it stands.

Papers sometimes tend to focus more on the horse race or bake off (and so analysis is limited to reporting a rank or score in the task). However, if scores or rankings were not released until after papers were submitted then this could certainly change the nature of such papers. In addition, a submitted paper could be made a requirement for appearing on the leaderboard.

There is of course a trade off between increasing participation and increasing the number of papers submitted. If papers are made into requirements then some teams won’t participate. There is perhaps a larger question for SemEval to consider, and that is how to increase the number of papers without driving away too many participants.

Another observation that was made was that some teams never identify themselves and so participate in the task but are never really involved beyond being on the leaderboard. These could of course be shadow accounts created by teams who are already participating (to get past submission limits?), or they could be accounts created by teams who may only want to identify themselves if they end up ranking highly. Should anonymous teams be allowed to participate? I don’t know that there was a clear answer to that question. While anonymous participation could be a means to game the system in some way, it might also be something done by those who are participating contrary to the wishes of an advisor or employer, If teams are reluctant to identify themselves for fear of being associated with a “bad” score perhaps it could be possible for teams to remove scores from the leaderboard.

To summarize, I got the sense that there is some interest in both increasing the number of papers submitted to SemEval, and also in making it clear that there is more to the event than the leaderboard. I think there were some great ideas discussed, and I fear I have done a somewhat imperfect job of trying to convey those here, but I don’t want to let the perfect be the enemy of the good enough, so I’m going to go ahead and send this around and hope that others who have ideas will join in the conversation in some way.

Ted Pedersen

Written by

Computer Science professor at the University of Minnesota, Duluth. Natural Language Processing and Computational Linguistics. http://www.d.umn.edu/~tpederse