Insights from ACL 2018
The 56th Annual Meeting of the Association for Computational Linguistics (ACL) was held this year between July 15–20 in Melbourne, Australia. As the top conference in computational linguistics and natural language processing, ACL continues to witness notable growth. The conference received this year a total of 1,544 papers submissions (1,018 long papers and 526 short papers), out of which only 256 long papers and 125 short papers were accepted, resulting in an average acceptance rate of 24.7%.
This year’s conference delved into a plethora of areas. Topics that gained particular focus this year include information extraction and text mining, machine learning, machine translation, and document analysis. Compared to last year’s topics, the percentage of information extraction papers has decreased; while the percentage of machine learning and machine translation papers remains stable.
Trending this Year at ACL
The NLP field has witnessed exponential growth, stated Marti Hearst, the president of ACL, in her opening remarks. Current models are now capable of performing tasks that were unimaginable decades ago, such as recognizing and producing human speeches. However, in other areas we are still far away from the goal of human-level language comprehension and progress can still be pushed, particularly in the field of deep natural language understanding, which requires multi-sentence contextualization .
Deep Learning is Ubiquitous!
Not surprisingly, deep learning models continue to gain traction and were the dominant machine learning method at ACL. We can even see that deep learning approaches in NLP are becoming more and more mature. We see fewer papers trying to invent new and fancier neural networks and more papers attempting to dig “deeper” into the understanding of deep learning and its limitations. Specifically, in the area of understanding deep learning representations, this year’s conference featured more works investigating the properties of learned language embeddings by probing them for specific syntactic or semantic properties, e.g., quantifiers, syntactic constituent, subject-verb agreement  .
NLP as an Engineering Discipline
In his talk, Mark Johnson urged researchers to strive towards making NLP a more rigorous engineering discipline where we could make reliable estimates about model performance and data requirements:
“Just imagine if engineers would build bridges like this. We just keep pouring concrete into the river until the bridge looks stable enough” .
For example, NLP researchers usually cannot estimate in advance how much data they require to solve a particular task at some accuracy level. Another trend observed at ACL was the presence of experiments  evaluating how state-of-the-art models perform when advanced linguistic understanding is required. There were also papers   calling for stronger baselines to assess improvements more efficiently.
ACL Community Growth
The ACL community is growing at a rapid pace with an increasing number of submissions every year. The business meeting, which was a part of the conference’s program, addressed this issue and highlighted the need for more reviewers and sponsors. Regarding stats, the conference had this year 1,443 reviewers and around 30 sponsors, which need to be increased to keep up with the growth.
SAP Presence at ACL
SAP was a supporting sponsor at ACL this year, and had a strong presence with two SAP-sponsored papers accepted by two SAP Ph.D. students; featuring some of our research projects in the NLP field. The two papers are Recursive Neural Structure Correspondence Network for Cross-domain Aspect and Opinion Co-Extraction and Exploiting Document Knowledge for Aspect-level Sentiment Classification.
Recursive Neural Structure Correspondence Network for Cross-domain Aspect and Opinion Co-Extraction — Wenya Wang, Sinno Jialin Pan
This paper is co-authored by Wenya Wang from Nanyang Technological University (NTU) and SAP Innovation Center in Singapore. The paper proposes a novel approach for domain adaptation for aspect-based sentiment analysis. Aspect-based sentiment analysis aims at extracting fine-grained sentiment expressed in free text (typically reviews). An example of this is the sentence, “The fish burger at this restaurant is the best in town, but the service is slow.” In this sentence, there is both a positive sentiment towards the food’s quality and the aspect term “fish burger” (expressed by the opinion term “best”) and negative sentiment about the aspect “service” (expressed by the opinion term “slow”). Aspect-based sentiment analysis has the potential to extract detailed information from consumer reviews and goes beyond the classic document-level or sentence-level sentiment classification task.
However, aspect-based sentiment analysis requires token-level supervised training data, which is expensive to obtain for new domains. Wenya’s work reduces the need for labeled in-domain training data through an unsupervised domain adaptation method. The method assumes the availability of labeled training data in the source domain, plus unlabeled data in the target domain. Wenya’s work highlights how the syntactic relations between aspect and opinion words are quite robust across domains (e.g., adjective-noun modifier relations). Her method creates auxiliary tasks to predict the type of relations in a syntactic dependency tree to “pivot” information from the source domain to the target domain. The method was tested on three benchmark datasets and demonstrates state-of-the-art results.
Exploiting Document Knowledge for Aspect-level Sentiment Classification — Ruidan He, Wee Sun Lee, Hwee Tou Ng, Daniel Dahlmeier
Also the second SAP paper, co-authored by Ruidan He, from the National University of Singapore (NUS) and SAP Innovation Center Singapore, focuses on aspect-based sentiment analysis and the problem of overcoming the lack of labeled training data. The idea of Ruidan’s paper is using document-level sentiment classification (for which plenty of labeled data exists) as a resource to improve aspect-based sentiment analysis.
The paper presents two approaches to achieve this goal. The first approach is using transfer learning, in which the document-level sentiment classification task is used to pre-train an LSTM classifier before fine-tuning the model on the word-level aspect-based sentiment task. The second approach is a multi-task learning approach that trains the LSTM jointly on the word-level aspect classification and the document-level classification. The experimental outcome shows that both approaches improve the results and that combining both methods achieves better results in most experiments.
Our Top Picks of ACL Papers
ACL featured this year many interesting papers, tackling a wide range of topics. Our team selected a few papers, representing key subjects across the NLP community.
Style Transfer Through Back-Translation — Shrimai Prabhumoye, Yulia Tsvetkov, Ruslan Salakhutdinov, Alan W. Black
The technique of style transfer in images is gaining a lot of traction recently, and catching up in the text space as well. This paper applies style transfer for rephrasing text to contain some specific stylistic properties without any change in the context’s meaning. This technique is evaluated on three different style transformations: sentiment, gender, and political slant.
A latent representation is derived for each input sentence using a neural machine translation model as shown in Figure 1. It is assumed that by using a language translation model, the latent representation would reduce the current stylistic properties; while preserving the meaning. A bi-directional LSTM network is used as a decoder to generate a sequence of tokens. Two such decoders are used to regenerate the input sentence in two different styles using the latent representation. Finally, a convolutional neural network-based classifier is used to distinguish between the two styles given as output by the decoders (Figure 2). The classification loss is used as feedback to guide the decoders.
This work can be further used in any dialogue evaluation system to generate dialogues in a uniform manner and reduce any form of bias.
Long Short-Term Memory as a Dynamically Computed Element-wise Weighted Sum — Omer Levy, Kenton Lee, Nicholas FitzGerald, Luke Zettlemoyer
Long Short-Term Memory (LSTM) was introduced as a network that can overcome the vanishing gradient problem of simple RNNs. It is stated that the gated additive recurrent connections within LSTM enable them to do so. This paper presents an alternative perspective on gates, claiming that gates also have modeling strength and are not just limited to mitigating the vanishing gradient problem.
This work conducts a range of experiments on different ablations of LSTM, where its gating mechanism is decoupled from the embedded simple RNN in one form or the other:
- LSTM — GATES: LSTM units without any gating mechanism
- LSTM — SRNN: the SRNN, which is supposedly the recurrent unit, is replaced with a simple linear transformation in the basic LSTM unit
- LSTM — SRNN — OUT: in the second variant the output gate is removed
- LSTM — SRNN — HIDDEN: in the second variant the hidden state is removed from each gate
The author re-conducted the experiments on a few mainstream NLP tasks that use LSTM units in the initial experiment, such as language modeling, question answering, dependency parsing and machine translation. In the re-conducted experiment, the LSTM units are replaced with the ablated variants of LSTM listed above. All experiments reused the hyperparameters that were tuned for LSTM.
The results of these experiments show that the evaluation metric drops when the ablated variant is the one where gates are removed. The experiments performed at par to a standard LSTM-based network when other ablated variants were used. This result strongly indicates that the recurrence is computed from the element-wise weighted sum of context-independent functions of the input. In other words, the gating mechanism is doing the heavy lifting in modeling the context within the text.
The Natural Language Decathlon: Multitask Learning as Question Answering — Bryan McCann, Nitish Shirish Keskar, Caiming Xiong, Richard Socher
The advancement of deep learning has revolutionized the NLP field. Deep learning models have been developed for multiple NLP tasks and frequently outperform earlier state-of-the-art models. The paper takes on the challenge of developing a unified task-unspecific model by suggesting phrasing the following ten NLP tasks as question-answering tasks and solving them with the same universal model: question answering, machine translation, summarization, natural language inference, sentiment analysis, semantic role labeling, relation extraction, goal-oriented dialogue, semantic parsing, and commonsense pronoun resolution. Accordingly, the paper introduces a dataset of pure context-question-answer triplets spanning across all ten tasks and consisting of open source datasets designed for the individual tasks. It also presents a new model architecture consisting of multiple BiLSTMs that learn all tasks without any task-specific parameters: multi-task question answering network (MQAN).
This work will undoubtedly stimulate the development of further unified models on that dataset as well as trigger investigations about which cluster of tasks gives rise to most synergies.
Probabilistic FastText for Multi-Sense Word Embeddings — Ben Athiwaratkun, Andrew Gordon Wilson, Anima Anandkumar
Word embeddings such as Word2Vec and Glove, trained on large unlabeled corpora, have become a crucial resource for NLP. To cope with unusual words or misspellings, FastText extends the former methods by decomposing each word into its character n-grams. Those methods, however, still struggle if words with multiple semantic meanings are essential. The paper introduces probabilistic FastText (PFT) that combines FastText with an approach to represent words by a Gaussian multimodal distribution. The distribution captures uncertainty, and the different modes represent multiple semantic meanings for a word. In the PFT model, each word is represented by a Gaussian mixture density with the meaning of each component given by the some of its n-grams.
A Neural Architecture for Automated ICD Coding — Pengtao Xiey, Haoran Shix, Ming Zhangx and Eric P. Xingy
The paper explores an applied work about assigning ICD diagnosis codes to patients based on free-text diagnosis descriptions. Some hospitals employ teams of coders to review the diagnosis descriptions prepared by the physicians and subsequently assign the standardized ICD codes to them. The classification model could help automate this task. The model is developed using the publicly available MIMIC-III dataset containing 59K patient visits, including diagnosis descriptions and ICD codes. ICD codes are organized in a hierarchical structure, and every code has a formally and precisely worded textual description. The model uses an LSTM encoder to translate the diagnosis description to a latent vector. Latent vectors for the ICD codes are created by a tree-of-sequence LSTM that is based on an LSTM encoder using their textual description and a bi-directional tree LSTM to incorporate the hierarchical structure. To compensate for the stylistic differences, the experiment uses adversarial techniques to make the two types of latent vectors (diagnosis description vs. codes) indistinguishable. In the end, attention mechanisms use the one-to-one or one-to-many techniques. The paper reports on a work-in-progress with a current sensitivity/specificity of 0.29/0.33, which leaves room for improvement.
A Simple End-to-End Question Answering Model for Product Information — Tuan Lai, Trung Bui, Sheng Li and Nedim Lipka
This paper was part of the 1st Workshop on Economics and Natural Language Processing. It proposes a simple deep learning question-answering model that can assist shoppers in their purchase decisions. The model is planned to be used for a generic web service to support retailers. Shoppers usually have various questions about products’ descriptions and specifications, and this modal can automatically answer these questions. The modal was developed using a dataset of 7,119 questions created using MTURK and specifications from 153 different products. The model takes a question-specification pair as input, creates latent vectors from both using a BiLSTM and derives the matching scoring based on the two vectors using a fully-connected layer.
This year, ACL delved into a breadth of hot topics in the NLP domain. Clearly, deep learning remains the dominant approach to solving NLP tasks. However, there is an increased focus on developing a better understanding of deep learning embeddings and evolving deep learning into a robust engineering technology. Topics related to testing deep learning models for specific linguistic capabilities, applying state-of-the-art models to harder tasks, and establishing strong baselines also gained significant attention. Several papers also introduced new versions of network architectures and novel methods for question answering, sentiment analysis, and style transfer. ACL 2018 also witnessed the launch of the Asia-Pacific Chapter of the Association for Computational Linguistics (AACL), which will mainly focus on conferences and regional events in the Asia-Pacific region.
ACL 2018 was a witness to the rapid strides made in the NLP field. The presence of more sponsorships, therefore, becomes crucial to keep pace with the continuous progress in the field and the speedy growth of the ACL community. SAP had a significant presence at ACL this year and, given a large number of NLP-based use cases in our enterprise solutions, we will certainly maintain close ties to the NLP community.