Interesting Stuff at EMNLP (part II)

8 min readNov 22, 2018

This is a second part of my notes, for the first one please refer to part I.

The papers which drew my attention are described in what I call twitter style, i.e. the main idea and/or important features of work in question. For such stuff I actually have a twitter: http://twitter.com/madrugad0, so you like this paper consider to follow me there. As before, all illustrations belong to their rightful owners, mostly to papers’ authors.

Learning To Split and Rephrase From Wikipedia Edit History

https://arxiv.org/pdf/1808.09468.pdf

New paraphrase dataset from Google; it would be nice to see analog for Russian also; this dataset contains data for splitting long phrases into smaller ones; dataset http://goo.gl/language/wiki-split

Neural Latent Extractive Document Summarization

https://arxiv.org/pdf/1808.07187.pdf

Authors from Microsoft use latent variables to represent sentence inclusion in summary, they train whole model with REINFORCE; they use seq2seq as so called compression model to produce a probability of a sentence in summary being derived from sentence in original document.

Improving Reinforcement Learning Based Image Captioning with Natural Language Prior

https://arxiv.org/pdf/1809.06227.pdf

simple yet brilliant idea: to improve readability of RL generated caption add a loss of n-gram generated being found in the original corpus; code for paper https://github.com/tgGuo15/PriorImageCaption

Joint Aspect and Polarity Classification for Aspect-based Sentiment Analysis with End-to-End Neural Networks

https://arxiv.org/pdf/1808.09238.pdf

joint solution for sentiment and aspects, authors present results on GermEval; based on simple CNN over GloVe

QBLink: A dataset and baselines for sequential open-domain question answering

http://aclweb.org/anthology/D18-1134

new dataset of interconnected question-answer pairs with baselines; dataset http://sequential.qanta.org

Semantic Linking in Convolutional Neural Networks for Answer Sentence Selection

http://aclweb.org/anthology/D18-1133

Authors propose an usage of CNNs to produce embeddings enriched with semantic meaning showing close to the top results with one simple architecture

Disambiguated skip-gram model

http://aclweb.org/anthology/D18-1174

Really nice idea to use a skip-gram to handle multiple meanings of word; linear time inference, which is better that Bartunov’s pervious work on topic

Memory, Show the Way: Memory Based Few Shot Word Representation Learning

http://aclweb.org/anthology/D18-1173

key-value storage for some abstract knowledge storing, retrieving prototype from memory by context and averaging it with actual context embedding gives zero-shot word embedding

Leveraging Gloss Knowledge in Neural Word Sense Disambiguation by Hierarchical Co-Attention

http://aclweb.org/anthology/D18-1170

co-attention trick for word sense disambiguation; also authors use hierarchy to achieve improvement in the task

Streaming word similarity mining on the cheap

http://aclweb.org/anthology/D18-1172

simple linear method to measure similarity of words by counting second-order co-occurences; shows performance comparable to fastText and GloVe

Dissecting Contextual Word Embeddings: Architecture and Representation

https://arxiv.org/pdf/1808.08949.pdf

paper about ELMo describing why contextual representations are better than simple word embeddings

Structured Alignment Networks for Matching Sentences

https://s3-us-west-2.amazonaws.com/ai2-website/publications/Structured+Alignment+Networks+for+Matching+Sentences.pdf

authors learn latent dependency trees and latent matching for them; they use specific case of decomposable attention

Universal Sentence Encoder

https://arxiv.org/pdf/1803.11175.pdf

USE from Google; there has been a demo on EMNLP

Learning Sentiment Memories for Sentiment Modification without Parallel Data

https://arxiv.org/pdf/1808.07311.pdf

authors detect sentiment words using attention and then use the vocabulary to check if sentiment is inversed and meaning is not changed; code for paper https://github.com/lancopku/SMAE

Cross-Pair Text Representations for Answer Sentence Selection

http://aclweb.org/anthology/D18-1240

Context and Copying in Neural Machine Translation

http://aclweb.org/anthology/D18-1339

exhaustive study of copying in machine translation from Philipp Koehn

Training Deeper Neural Machine Translation Models

https://arxiv.org/pdf/1808.07561.pdf

So called transparent attention from Google; this is actually attention in the depth, i.e. attention on deeper layer outputs; improves SotA

Breaking the Beam Search Curse: A Study of (Re-)Scoring Methods and Stopping Criteria for Neural Machine Translation

https://arxiv.org/pdf/1808.09582.pdf

a study of stopping criteria and scoring methods for beam search, new criteria inspired by BLEU definition achieving 2 BLEU score points on ZhEn MT

Learning When to Concentrate or Divert Attention: Self-Adaptive Attention Temperature for Neural Machine Translation

https://arxiv.org/pdf/1808.07374.pdf

temperature in this work is an additional gate for softmax to be more extreme or smooth; code for paper https://github.com/lancopku/SACT

Loss in Translation: Learning Bilingual Word Mapping with a Retrieval Criterion

https://arxiv.org/pdf/1804.07745.pdf

retrieval objective function (convex relaxation, in fact) for bilingual mapping learning; code available: https://github.com/facebookresearch/fastText/tree/master/alignment/

Multi-Head Attention with Disagreement Regularization

https://arxiv.org/pdf/1810.10183.pdf

the regularization on disagreement of head outputs if actually improves performance, simple and shiny

Generating Natural Language Adversarial Examples

https://arxiv.org/pdf/1804.07998.pdf

simple algorithm of perturbing the NLP models with replacement of words by their synonyms with context check using GloVe; code for paper https://github.com/nesl/nlp_adversarial_examples

Accelerating Asynchronous Stochastic Gradient Descent for Neural Machine Translation

https://arxiv.org/pdf/1808.08859.pdf

Number of tricks to improve convergence for NMT from Microsoft; unfortunately without code

Towards Two-Dimensional Sequence to Sequence Model in Neural Machine Translation

https://arxiv.org/pdf/1810.03975.pdf

new unit with two inputs for NMT

End-to-End Non-Autoregressive Neural Machine Translation with Connectionist Temporal Classification

http://aclweb.org/anthology/D18-1336

authors add CTC (widely used in ASR field, essentially it is Viterbi algorithm on hypotheses matrix) to NAT; improves SotA for EnRo

Prediction Improves Simultaneous Neural Machine Translation

http://aclweb.org/anthology/D18-1337

in the task of online translation we could predict the future words in the source sentence and improve its translation

Iterative Document Representation Learning Towards Summarization with Polishing

https://arxiv.org/pdf/1809.10324.pdf

iterative improvement of text representation, analogous to memory networks with their hops; code for paper https://github.com/yingtaomj/Iterative-Document-Representation-Learning-Towards-Summarization-with-Polishing

APRIL: Interactively Learning to Summarise by Combining Active Preference Learning and Reinforcement Learning

https://arxiv.org/pdf/1808.09658.pdf

code for paper https://github.com/UKPLab/emnlp2018-april

Controlling Length in Abstractive Summarization Using a Convolutional Neural Network

http://aclweb.org/anthology/D18-1444

gate for CNN to control output length for summarisation

Adapting the Neural Encoder-Decoder Framework from Single to Multi-Document Summarization

https://arxiv.org/pdf/1808.06218.pdf

Frustratingly Easy Model Ensemble for Abstractive Summarization

http://aclweb.org/anthology/D18-1449

output selector is surprisingly improves quality for summarisation; authors release 128 pretrained models used in this work: https://research-lab.yahoo.co.jp/en/software/

MSMO: Multimodal Summarization with Multimodal Output

http://aclweb.org/anthology/D18-1448

Learning to Encode Text as Human-Readable Summaries using Generative Adversarial Networks

https://arxiv.org/pdf/1810.02851.pdf

great work on unsupervised text summarisation; unfortunately without code

Improving Neural Abstractive Document Summarization with Structural Regularization

http://aclweb.org/anthology/D18-1441

specific structural features of the architecture employ the fact that summarisation consists of word and sentences at the same time; they use joint attention on words and sentences and specific structural loss

Paragraph-level Neural Question Generation with Maxout Pointer and Gated Self-attention Networks

http://aclweb.org/anthology/D18-1424

Unsupervised Natural Language Generation with Denoising Autoencoders

https://arxiv.org/pdf/1804.07899.pdf

interesting work from Google on natural language generation from structure to text using classic seq2seq; they train denoising auto-encoder corrupting its input only for frequent words

Towards a Better Metric for Evaluating Question Generation Systems

https://arxiv.org/pdf/1808.10192.pdf

in this work authors present several techniques for question generation and measure their quality by answerability, i.e. if the question is presented in the original text

Diversity-Promoting GAN: A Cross-Entropy Based Generative Adversarial Network for Diversified Text Generation

http://www.aclweb.org/anthology/D18-1428

authors propose to use language-model based reward for discriminator, initially trained on real text; code for paper https://github.com/lancopku/DPGAN

QuaSE: Sequence Editing under Quantifiable Guidance

https://arxiv.org/pdf/1804.07007.pdf

style transfer for text with measurable parameter values; code for paper https://bitbucket.org/leoeaton/quase

Better Conversations by Modeling, Filtering, and Optimizing for Coherence and Diversity

https://arxiv.org/pdf/1809.06873.pdf

Paraphrase Generation with Deep Reinforcement Learning

https://arxiv.org/pdf/1711.00279.pdf

inverse reinforcement learning for paraphrase generation; they key idea in training evaluator for matching

AirDialogue: An Environment for Goal-Oriented Dialogue Research

http://aclweb.org/anthology/D18-1419

Google’s approach to make ParlAI, I hope it will be more convenient; I need to be said that ParlAI is more universal from conceptual side: all the tasks (like NER or ChitChat) there are presented as dialogs between teacher and learner, the AirDialogue in contrast is solely devoted to goal-oriented dialogs

A Self-Attentive Model with Gate Mechanism for Spoken Language Understanding

http://aclweb.org/anthology/D18-1417

self-attention application for slot-filling (and intent recognition)

Stylistic Chinese Poetry Generation via Unsupervised Style Disentanglement

http://nlp.csai.tsinghua.edu.cn/~yangcheng/publications/emnlp2018.pdf

combination of poetry generation and controllable text generation

Bottom-Up Abstractive Summarization

https://arxiv.org/pdf/1808.10792.pdf

key idea is to make mask for content selection, masks are trained as smooth relaxation

Word Sense Induction with Neural biLM and Symmetric Patterns

https://arxiv.org/pdf/1808.08518.pdf

a study of WSI applications of bidirectional recurrent language models; code for paper https://github.com/asafamr/SymPatternWSI

Similarity-Based Reconstruction Loss for Meaning Representation

http://aclweb.org/anthology/D18-1525

authors present specific similarity loss for auto-encoder training proven to suit better for paraphrase detection task; interestingly, this paper was written in 10 days from scratch according to authors

Conditional Word Embedding and Hypothesis Testing via Bayes-by-Backprop

https://www.nyu.edu/projects/spirling/documents/emnlp18.pdf

new model of word embedding from Cho et al.

Grammar Induction with Neural Language Models: An Unusual Replication

https://arxiv.org/pdf/1808.10000.pdf

latent syntactic tree learning with neural networks, claimed to be first successful approach; code for paper https://github.com/yikangshen/PRPN

That’s it! Of cause I have missed many great works, but I hope I gave an reasonable overview which empowers you to see additional materials from the conference and/or arXiv. Dive in!