10 Things We Learned at NAACL’21
The Virtual NAACL 2021 Conference took place last week with more than 2500 attendees and 500 papers about recent advances in computational linguistics and natural language processing. If you’d like to dive deep into the research papers, check out the full proceedings — but you can also just start with a summary of 10 short lessons we learned at the conference!
1. Reformulating tasks with [MASK]
In his keynote, Hinrich Schütze talked about how humans often learn from task descriptions rather than from a large numbers of training samples. This insight raises the question whether we can integrate task descriptions into the training of NLP models. The idea is to translate task descriptions into something that can be used as a task for masked language model training. For example, the task of anaphora resolution, a subtask of coreference resolution, can be expressed as a task to find the masked word in the following three sentences: “Sally arrived. Nobody saw her. In the previous sentence, the pronoun “her” refers to MASK.”
The pattern that describes the task is “In the previous sentence, the pronoun “p” refers to MASK.” The authors call their approach PET (Pattern Exploiting Training) and show that PET leverages task descriptions for better few-shot learning. Interestingly, the added benefit of task descriptions becomes smaller with an increasing number of training samples. With 1000 samples, there is almost no difference. The slides used in the talk are published here.
2. Relevant, Irrelevant and the Third Class of Document in Ranking
In the tutorial “Pretrained Transformers for Text Ranking: BERT and Beyond” by Andrew Yates, Rodrigo Nogueira and Jimmy Lin, we learned there can be more types of documents in a ranking problem than relevant and irrelevant ones. Ranking documents about COVID-19 as in the TREC-COVID collection should consider a third type of documents that are misleading because of misinformation. Showing these misleading documents as top ranking results would be even worse than showing irrelevant documents. This insight needs to be taken into account when training and evaluating ranking models.
The tutorial also made us aware of “Approximate nearest neighbor Negative Contrastive Estimation” (ACNE), which is an alternative to Dense Passage Retrieval (DPR). Both approaches use positive samples and hard negative samples to train a retrieval model. DPR chooses hard negatives that are semantically similar to positives, for example, with BM25. In contrast to that, ANCE chooses hard negatives that are specific to the model that is being trained. This selection of hard negatives is updated during the learning process so that the model is shown harder negatives as training progresses. A 155-pages written summary of the tutorial is on arXiv. Great work!
3. Registered Reports for Slow Science in NLP
The paper “Preregistering NLP research” by Emiel van Miltenburg et al. won the Best Thematic Paper Award — сongratulations! The paper starts a discussion on preregistration, which means to describe the steps and expectations of a research project before carrying it out. It is rarely seen in NLP so far, but it would be great if our community could change that. A preregistrations answers questions that are good to ask oneself when starting a research project, and we think it can be especially helpful for PhD students who just got started. Example questions are “How will you define outliers in your data and what rules will you apply for excluding observations?” or “How will you measure output quality?” or “What kinds of errors do you expect to find with an error analysis?”. If you are interested in this topic, we recommend watching the video on YouTube. Don’t forget to check out the winners of the other awards at NAACL 2021 here.
4. What if Companies Had Their Own Word Embedding?
Besides the main research track, NAACL also had an industry track. “Query2Prod2Vec: Grounded Word Embeddings for eCommerce” by Federico Bianchi et al. won the best industry paper award in this track. It’s about language grounding, where the meaning of the product names is extracted from eCommerce interactions. Search queries in an online shop and the items a user clicks in the search results help to learn better product embeddings than using only the product name and word2vec. As a result, similar products are closer to each other in the embedding space. Unfortunately, code and data haven’t been published yet.
5. Ask Questions on Tables, Not Just Text
Dense retrieval and question answering not only work on texts but also on tables. “Open Domain Question Answering over Tables via Dense Retrieval” by Jonathan Herzig et al. presents an efficient dense table retriever based on the TAPAS encoder by the same authors and demonstrates that end-to-end question answering of a BERT-based retriever can be improved from from 33.8 to 37.7 exact match. It’s great to see that they published code and models and we are sure that somewhere, someone is already building on their results continuing this research direction.
6. Decontextualized Sentences
Splitting up a text document into passages and interpreting them separately sometimes creates the problem where context from one passage is needed to understand another passage. Eunsol Choi et al. address this problem with their paper “Decontextualization: Making Sentences Stand-Alone”. For example, imagine the following paragraph, where the last sentence needs to be decontextualized.
Paragraph: “Croatia national football team have appeared in the FIFA World Cup on five occasions. Their best result thus far was reaching the 2018 final.”
Decontextualized Sentence: “The Croatia national football team’s best result thus far in the FIFA World Cup was reaching the 2018 final.”
The paper has been published earlier in the TACL journal, which gives authors the chance to present their work also at a conference. Thereby, TACL combines the advantages of the reviewing process of a journal with options for revisions besides strict accept/reject world and being part of a conference.
7. Fill the Blank
Guanghui Qin and Jason Eisner presented their paper “Learning How to Ask: Querying LMs with Mixtures of Soft Prompts”, which relates to the patterns mentioned in the keynote by Hinrich Schütze. The paper describes how to extract knowledge from language models with the fill-in-the-blank paradigm. They extend the idea of using words in prompts (questions to the language model) to “soft prompts” that do not necessarily consist of real words but “soft words.” These “soft words” are embeddings that do not correspond to an actual word in the vocabulary and the most effective soft prompts can be learned. A picture is worth a thousand words, so have a look at this slide. Last but not least—сongratulations with the Best Short Paper Award!
8. What Really Goes on Inside BERT?
The survey paper “A Primer in BERTology: What We Know About How BERT Works” by Anna Rogers et al. summarizes еру findings from more than 150 studies on BERT models. For example, “BERT takes subject-predicate agreement into account when performing the cloze task” is a finding that is then put into context with references to related work. What’s also great about this paper is that all key findings are highlighted in bold font, making it easier to read.
9. Explainability and Interpretability of Question Answering Models
There were several birds-of-a-feather sessions and meetups on question answering. A personal highlight was a discussion with Greg Durrett, Behzad Golshan and Yufang Hou about explainability and interpretability of question answering models. Techniques like pruning, perturbation, integrated gradients, and saliency maps have been used to make text classifiers interpretable and we were talking about how to apply these techniques to question answering models. We agreed that they have some obvious limitations but are worth trying out. This is a nice research topic and we are hoping to see more papers on that in the upcoming conferences.
10. Virtual Conferences — Here to Stay?
Most of you probably already had your own experiences with a variety of virtual conferences during the past year. It was great to see how nicely everything worked out at NAACL. With whova, underline, gather.town, and Zoom, the only challenge was about the time zones differences, but also avoiding to get lost in the virtual conference space while searching for the right session. With SIGIR and ACL, there are two more fully virtual conferences coming up in the next few months. With the right tools we are pretty confident these events will be a success. We’re also very much looking forward to attending EMNLP in a “hybrid” format in November.