Our Highlights and Best Papers of EMNLP 2021 Conference
The hybrid EMNLP 2021 conference took place last week with more than 491 on-site attendees and 3156 online attendees, the latter including three members of the deepset team presenting our research. EMNLP is a well-established conference for research on Empirical Methods in Natural Language Processing taking place every year since 1996. This year, there were submissions from 11,425 authors and it was a hybrid event for the first time. Here are our highlights and a selection of the award-winning papers and the ones we found most interesting.
1. Keynote on Cross-Document NLP
The keynote “Where next? Towards multi-text consumption via three inspired research lines” by Ido Dagan from Bar-Ilan University, Israel, was about three research directions to bring multi-text consumption forward: interacting with NLP applications, modeling multi-text information, and representing minimal information units. For example, interesting topics for future work are evaluation methods for interactive summarization, multi-hop question answering, cross-document language modeling (predicting masked tokens based on multiple documents with Longformer), proposition-level alignment across documents and using question-answer pairs as representations of information.
The slides are not published but Ido Dagan is a co-author of several recent publications on that topic: Further, one of the attendees shared her notes here. Thanks to Zhijing Jin!
2. Answer Similarity For Evaluation of Question Answering
The evaluation of question answering models relies on ground-truth annotations. However, if the correct answer to a question is the name of an entity, its aliases are typically not annotated and thus won’t be recognized as correct answers. In their paper “What’s in a Name? Answer Equivalence For Open-Domain Question Answering”, Si et al. mined aliases from knowledge bases and used them as additional ground-truth answers in evaluation and training. It was inspiring to see that Si et al. had the same motivation as our team for our paper on semantic answer similarity (SAS) but ended it up with a very different approach. We had a nice chat about connections between the two approaches. Enjoy watching this funny 7-minute video summary of the paper by Si et al. and reading this blog post about SAS.
3. Multi-Domain Multilingual Question Answering
The tutorial by Avirup Sil and Sebastian Ruder was two-fold with one part covering multi-domain question answering and the other multilingual question answering. Our highlights were the overview of many multi-domain datasets by Avirup Sil and Sebastian Ruder mentioning our semantic answer similarity (SAS) as an alternative evaluation method. The slides have been shared here.
The tutorial was also refreshing because it took place just one day after the Workshop on Machine Reading for Question Answering (MRQA). So after the technical deep dive into the most recent papers, it was good to be provided with a well-structured summary of the findings of the last few years.
4. EMNLP Best Paper Awards
As every year, several papers have been selected for best paper awards or honorable mentions.
The best long paper award has been won by Liu et al. for the paper “Visually Grounded Reasoning across Languages and Cultures”. They created a multilingual dataset for Multicultural Reasoning over Vision and Language (MaRVL), which consists of statements from native speakers about pairs of images that can be used for reasoning by discriminating whether each grounded statement is true or false. It can be downloaded from here.
Yang et al. won the best short paper award for their paper ”CHoRaL: Collecting Humor Reaction Labels from Millions of Social Media Users”. They collected and analyzed a dataset of Facebook posts related to COVID-19 and humor reaction labels. It’s great to see the effort and the acknowledgment received for this data-focused work. While there is no link to the dataset in the paper, the authors promised to share the data and labels freely with academia, so you would need to contact them directly.
However, our interest was caught by one of the papers honorably mentioned as an outstanding paper:
“SituatedQA: Incorporating Extra-Linguistic Contexts into QA” by Zhang and Choi. They argue that open-retrieval QA benchmarks should incorporate extra-linguistic context, such as temporal or geographical context and found that roughly 16.5% of NQ-Open have context-dependent answers.
5. Topical Diversity at EMNLP Conference and its Workshops:
Natural language processing is such a diverse field and this conference and its workshops showed it once again! There was a paper about “Cartography Active Learning”, where most informative instances for labeling are identified during training on a text classification dataset. Another paper was about “Low Resource Quadratic Forms for Knowledge Graph Embeddings” where a computationally efficient approach for link prediction between entities and relations of knowledge graphs is presented.
“Rethinking the Objectives of Extractive Question Answering” addressed the independence assumption for modeling the span probability (probability of start and end index) in extractive question answering. It’s hard to describe that diversity so have a look at the proceedings to see for yourself.
6. Table Extraction, Table Retrieval and Table Question Answering
After having integrated Table Retrieval and Table Question Answering in Haystack, it’s great to see that these topics gain more and more attention in the research community as well. In particular, there is “Topic Transferable Table Question Answering” by Chemmengath et al., who automatically generate question-answer pairs from tables using a T5 SQL-to-Question model. Another paper at EMNLP is FINQA: A Dataset of Numerical Reasoning over Financial Data based on the FinTabNet dataset of tables extracted from financial reports.
7. Model Robustness and Out-of-Domain Performance
Bartolo et al. presented their paper “Improving Question Answering Model Robustness with Synthetic Adversarial Data Generation”, where they address model robustness and out-of-domain performance by using diversity-promoting question-answer generation to mitigate the sparsity of training sets. Their approach leads to better robustness on twelve different datasets and they find that using the synthetic data improves out-of-domain performance across all MRQA tasks (including the challenges of domain corpora, variation in questions, adversarial examples, and noise) by about 10%.
8. Interactions at Hybrid Conferences — Best of Both Worlds?
We decided against traveling to the Caribbean for just a few days although we are definitely looking forward to attending conferences on-site again. Still: The hybrid setup made it possible to attend panel discussions and tutorials live and watch recordings of other talks at any time. We met so many people, especially at the poster sessions in Gather! For example, we had a chat with Patrick Lewis about Dense Passage Retrieval and Table Retrieval (tutorials are available here and here), Maxime De Bruyn (congratulations on the best paper award!) who attended one of our Open NLP Meetup events this year, Martin Fajcik (congratulations to the honorable mention!), Joumana Ghosn who had an inspiring idea on evaluating GermanQuAD after machine-translating the data to English and then back to German, Chenglei Si who has been working on answer similarity in question answering, Alisha Zachariah who is working in the data science team of an insurance company, Pranav Maneriker who is a Ph.D. student at The Ohio State University, Toni Kukurin who is a Senior Research Engineer at Bloomberg.
If we didn’t have the chance to meet at EMNLP and you would like to have a chat about neural search, feel free to reach out to us. For example, by joining our Discord community or our Open NLP Group with the next virtual meetup event taking place in January 2022. If you are interested in trying out Haystack — our NLP framework for neural search and question answering, check our repo on GitHub (and hopefully, give us a star too)!