Interesting Stuff at EMNLP (part II)

Valentin Malykh
8 min readNov 22, 2018

This is a second part of my notes, for the first one please refer to part I.

The papers which drew my attention are described in what I call twitter style, i.e. the main idea and/or important features of work in question. For such stuff I actually have a twitter:, so you like this paper consider to follow me there. As before, all illustrations belong to their rightful owners, mostly to papers’ authors.

Learning To Split and Rephrase From Wikipedia Edit History

New paraphrase dataset from Google; it would be nice to see analog for Russian also; this dataset contains data for splitting long phrases into smaller ones; dataset

Neural Latent Extractive Document Summarization

Authors from Microsoft use latent variables to represent sentence inclusion in summary, they train whole model with REINFORCE; they use seq2seq as so called compression model to produce a probability of a sentence in summary being derived from sentence in original document.

Improving Reinforcement Learning Based Image Captioning with Natural Language Prior

simple yet brilliant idea: to improve readability of RL generated caption add a loss of n-gram generated being found in the original corpus; code for paper

Joint Aspect and Polarity Classification for Aspect-based Sentiment Analysis with End-to-End Neural Networks

joint solution for sentiment and aspects, authors present results on GermEval; based on simple CNN over GloVe

QBLink: A dataset and baselines for sequential open-domain question answering

new dataset of interconnected question-answer pairs with baselines; dataset

Semantic Linking in Convolutional Neural Networks for Answer Sentence Selection

Authors propose an usage of CNNs to produce embeddings enriched with semantic meaning showing close to the top results with one simple architecture

Disambiguated skip-gram model

Really nice idea to use a skip-gram to handle multiple meanings of word; linear time inference, which is better that Bartunov’s pervious work on topic

Memory, Show the Way: Memory Based Few Shot Word Representation Learning

key-value storage for some abstract knowledge storing, retrieving prototype from memory by context and averaging it with actual context embedding gives zero-shot word embedding

Leveraging Gloss Knowledge in Neural Word Sense Disambiguation by Hierarchical Co-Attention

co-attention trick for word sense disambiguation; also authors use hierarchy to achieve improvement in the task

Streaming word similarity mining on the cheap

simple linear method to measure similarity of words by counting second-order co-occurences; shows performance comparable to fastText and GloVe

Dissecting Contextual Word Embeddings: Architecture and Representation

paper about ELMo describing why contextual representations are better than simple word embeddings

Structured Alignment Networks for Matching Sentences

authors learn latent dependency trees and latent matching for them; they use specific case of decomposable attention

Universal Sentence Encoder

USE from Google; there has been a demo on EMNLP

Learning Sentiment Memories for Sentiment Modification without Parallel Data

authors detect sentiment words using attention and then use the vocabulary to check if sentiment is inversed and meaning is not changed; code for paper

Cross-Pair Text Representations for Answer Sentence Selection

Context and Copying in Neural Machine Translation

exhaustive study of copying in machine translation from Philipp Koehn

Training Deeper Neural Machine Translation Models

So called transparent attention from Google; this is actually attention in the depth, i.e. attention on deeper layer outputs; improves SotA

Breaking the Beam Search Curse: A Study of (Re-)Scoring Methods and Stopping Criteria for Neural Machine Translation

a study of stopping criteria and scoring methods for beam search, new criteria inspired by BLEU definition achieving 2 BLEU score points on ZhEn MT

Learning When to Concentrate or Divert Attention: Self-Adaptive Attention Temperature for Neural Machine Translation

temperature in this work is an additional gate for softmax to be more extreme or smooth; code for paper

Loss in Translation: Learning Bilingual Word Mapping with a Retrieval Criterion

retrieval objective function (convex relaxation, in fact) for bilingual mapping learning; code available:

Multi-Head Attention with Disagreement Regularization

the regularization on disagreement of head outputs if actually improves performance, simple and shiny

Generating Natural Language Adversarial Examples

simple algorithm of perturbing the NLP models with replacement of words by their synonyms with context check using GloVe; code for paper

Accelerating Asynchronous Stochastic Gradient Descent for Neural Machine Translation

Number of tricks to improve convergence for NMT from Microsoft; unfortunately without code

Towards Two-Dimensional Sequence to Sequence Model in Neural Machine Translation

new unit with two inputs for NMT

End-to-End Non-Autoregressive Neural Machine Translation with Connectionist Temporal Classification

authors add CTC (widely used in ASR field, essentially it is Viterbi algorithm on hypotheses matrix) to NAT; improves SotA for EnRo

Prediction Improves Simultaneous Neural Machine Translation

in the task of online translation we could predict the future words in the source sentence and improve its translation

Iterative Document Representation Learning Towards Summarization with Polishing

iterative improvement of text representation, analogous to memory networks with their hops; code for paper

APRIL: Interactively Learning to Summarise by Combining Active Preference Learning and Reinforcement Learning

code for paper

Controlling Length in Abstractive Summarization Using a Convolutional Neural Network

gate for CNN to control output length for summarisation

Adapting the Neural Encoder-Decoder Framework from Single to Multi-Document Summarization

Frustratingly Easy Model Ensemble for Abstractive Summarization

output selector is surprisingly improves quality for summarisation; authors release 128 pretrained models used in this work:

MSMO: Multimodal Summarization with Multimodal Output

Learning to Encode Text as Human-Readable Summaries using Generative Adversarial Networks

great work on unsupervised text summarisation; unfortunately without code

Improving Neural Abstractive Document Summarization with Structural Regularization

specific structural features of the architecture employ the fact that summarisation consists of word and sentences at the same time; they use joint attention on words and sentences and specific structural loss

Paragraph-level Neural Question Generation with Maxout Pointer and Gated Self-attention Networks

Unsupervised Natural Language Generation with Denoising Autoencoders

interesting work from Google on natural language generation from structure to text using classic seq2seq; they train denoising auto-encoder corrupting its input only for frequent words

Towards a Better Metric for Evaluating Question Generation Systems

in this work authors present several techniques for question generation and measure their quality by answerability, i.e. if the question is presented in the original text

Diversity-Promoting GAN: A Cross-Entropy Based Generative Adversarial Network for Diversified Text Generation

authors propose to use language-model based reward for discriminator, initially trained on real text; code for paper

QuaSE: Sequence Editing under Quantifiable Guidance

style transfer for text with measurable parameter values; code for paper

Better Conversations by Modeling, Filtering, and Optimizing for Coherence and Diversity

Paraphrase Generation with Deep Reinforcement Learning

inverse reinforcement learning for paraphrase generation; they key idea in training evaluator for matching

AirDialogue: An Environment for Goal-Oriented Dialogue Research

Google’s approach to make ParlAI, I hope it will be more convenient; I need to be said that ParlAI is more universal from conceptual side: all the tasks (like NER or ChitChat) there are presented as dialogs between teacher and learner, the AirDialogue in contrast is solely devoted to goal-oriented dialogs

A Self-Attentive Model with Gate Mechanism for Spoken Language Understanding

self-attention application for slot-filling (and intent recognition)

Stylistic Chinese Poetry Generation via Unsupervised Style Disentanglement

combination of poetry generation and controllable text generation

Bottom-Up Abstractive Summarization

key idea is to make mask for content selection, masks are trained as smooth relaxation

Word Sense Induction with Neural biLM and Symmetric Patterns

a study of WSI applications of bidirectional recurrent language models; code for paper

Similarity-Based Reconstruction Loss for Meaning Representation

authors present specific similarity loss for auto-encoder training proven to suit better for paraphrase detection task; interestingly, this paper was written in 10 days from scratch according to authors

Conditional Word Embedding and Hypothesis Testing via Bayes-by-Backprop

new model of word embedding from Cho et al.

Grammar Induction with Neural Language Models: An Unusual Replication

latent syntactic tree learning with neural networks, claimed to be first successful approach; code for paper

That’s it! Of cause I have missed many great works, but I hope I gave an reasonable overview which empowers you to see additional materials from the conference and/or arXiv. Dive in!