Interesting Stuff at EMNLP (part II)
This is a second part of my notes, for the first one please refer to part I.
The papers which drew my attention are described in what I call twitter style, i.e. the main idea and/or important features of work in question. For such stuff I actually have a twitter:, so you like this paper consider to follow me there. As before, all illustrations belong to their rightful owners, mostly to papers’ authors.
Learning To Split and Rephrase From Wikipedia Edit History
New paraphrase dataset from Google; it would be nice to see analog for Russian also; this dataset contains data for splitting long phrases into smaller ones; dataset
Neural Latent Extractive Document Summarization
Authors from Microsoft use latent variables to represent sentence inclusion in summary, they train whole model with REINFORCE; they use seq2seq as so called compression model to produce a probability of a sentence in summary being derived from sentence in original document.
Improving Reinforcement Learning Based Image Captioning with Natural Language Prior
simple yet brilliant idea: to improve readability of RL generated caption add a loss of n-gram generated being found in the original corpus; code for paper
Joint Aspect and Polarity Classification for Aspect-based Sentiment Analysis with End-to-End Neural Networks
joint solution for sentiment and aspects, authors present results on GermEval; based on simple CNN over GloVe
QBLink: A dataset and baselines for sequential open-domain question answering
new dataset of interconnected question-answer pairs with baselines; dataset
Semantic Linking in Convolutional Neural Networks for Answer Sentence Selection
Authors propose an usage of CNNs to produce embeddings enriched with semantic meaning showing close to the top results with one simple architecture
Disambiguated skip-gram model
Really nice idea to use a skip-gram to handle multiple meanings of word; linear time inference, which is better that Bartunov’s pervious work on topic
Memory, Show the Way: Memory Based Few Shot Word Representation Learning
key-value storage for some abstract knowledge storing, retrieving prototype from memory by context and averaging it with actual context embedding gives zero-shot word embedding
Leveraging Gloss Knowledge in Neural Word Sense Disambiguation by Hierarchical Co-Attention
co-attention trick for word sense disambiguation; also authors use hierarchy to achieve improvement in the task
Streaming word similarity mining on the cheap
simple linear method to measure similarity of words by counting second-order co-occurences; shows performance comparable to fastText and GloVe
Dissecting Contextual Word Embeddings: Architecture and Representation
paper about ELMo describing why contextual representations are better than simple word embeddings
Structured Alignment Networks for Matching Sentences
authors learn latent dependency trees and latent matching for them; they use specific case of decomposable attention
Universal Sentence Encoder
USE from Google; there has been a demo on EMNLP
Learning Sentiment Memories for Sentiment Modification without Parallel Data
authors detect sentiment words using attention and then use the vocabulary to check if sentiment is inversed and meaning is not changed; code for paper
Cross-Pair Text Representations for Answer Sentence Selection
Context and Copying in Neural Machine Translation
exhaustive study of copying in machine translation from Philipp Koehn
Training Deeper Neural Machine Translation Models
So called transparent attention from Google; this is actually attention in the depth, i.e. attention on deeper layer outputs; improves SotA
Breaking the Beam Search Curse: A Study of (Re-)Scoring Methods and Stopping Criteria for Neural Machine Translation
a study of stopping criteria and scoring methods for beam search, new criteria inspired by BLEU definition achieving 2 BLEU score points on ZhEn MT
Learning When to Concentrate or Divert Attention: Self-Adaptive Attention Temperature for Neural Machine Translation
temperature in this work is an additional gate for softmax to be more extreme or smooth; code for paper
Loss in Translation: Learning Bilingual Word Mapping with a Retrieval Criterion
retrieval objective function (convex relaxation, in fact) for bilingual mapping learning; code available:
Multi-Head Attention with Disagreement Regularization
the regularization on disagreement of head outputs if actually improves performance, simple and shiny
Generating Natural Language Adversarial Examples
simple algorithm of perturbing the NLP models with replacement of words by their synonyms with context check using GloVe; code for paper
Accelerating Asynchronous Stochastic Gradient Descent for Neural Machine Translation
Number of tricks to improve convergence for NMT from Microsoft; unfortunately without code
Towards Two-Dimensional Sequence to Sequence Model in Neural Machine Translation
new unit with two inputs for NMT
End-to-End Non-Autoregressive Neural Machine Translation with Connectionist Temporal Classification
authors add CTC (widely used in ASR field, essentially it is Viterbi algorithm on hypotheses matrix) to NAT; improves SotA for EnRo
Prediction Improves Simultaneous Neural Machine Translation
in the task of online translation we could predict the future words in the source sentence and improve its translation
Iterative Document Representation Learning Towards Summarization with Polishing
iterative improvement of text representation, analogous to memory networks with their hops; code for paper
APRIL: Interactively Learning to Summarise by Combining Active Preference Learning and Reinforcement Learning
code for paper
Controlling Length in Abstractive Summarization Using a Convolutional Neural Network
gate for CNN to control output length for summarisation
Adapting the Neural Encoder-Decoder Framework from Single to Multi-Document Summarization
Frustratingly Easy Model Ensemble for Abstractive Summarization
output selector is surprisingly improves quality for summarisation; authors release 128 pretrained models used in this work:
MSMO: Multimodal Summarization with Multimodal Output
Learning to Encode Text as Human-Readable Summaries using Generative Adversarial Networks
great work on unsupervised text summarisation; unfortunately without code
Improving Neural Abstractive Document Summarization with Structural Regularization
specific structural features of the architecture employ the fact that summarisation consists of word and sentences at the same time; they use joint attention on words and sentences and specific structural loss
Paragraph-level Neural Question Generation with Maxout Pointer and Gated Self-attention Networks
Unsupervised Natural Language Generation with Denoising Autoencoders
interesting work from Google on natural language generation from structure to text using classic seq2seq; they train denoising auto-encoder corrupting its input only for frequent words
Towards a Better Metric for Evaluating Question Generation Systems
in this work authors present several techniques for question generation and measure their quality by answerability, i.e. if the question is presented in the original text
Diversity-Promoting GAN: A Cross-Entropy Based Generative Adversarial Network for Diversified Text Generation
authors propose to use language-model based reward for discriminator, initially trained on real text; code for paper
QuaSE: Sequence Editing under Quantifiable Guidance
style transfer for text with measurable parameter values; code for paper
Better Conversations by Modeling, Filtering, and Optimizing for Coherence and Diversity
Paraphrase Generation with Deep Reinforcement Learning
inverse reinforcement learning for paraphrase generation; they key idea in training evaluator for matching
AirDialogue: An Environment for Goal-Oriented Dialogue Research
Google’s approach to make ParlAI, I hope it will be more convenient; I need to be said that ParlAI is more universal from conceptual side: all the tasks (like NER or ChitChat) there are presented as dialogs between teacher and learner, the AirDialogue in contrast is solely devoted to goal-oriented dialogs
A Self-Attentive Model with Gate Mechanism for Spoken Language Understanding
self-attention application for slot-filling (and intent recognition)
Stylistic Chinese Poetry Generation via Unsupervised Style Disentanglement
combination of poetry generation and controllable text generation
Bottom-Up Abstractive Summarization
key idea is to make mask for content selection, masks are trained as smooth relaxation
Word Sense Induction with Neural biLM and Symmetric Patterns
a study of WSI applications of bidirectional recurrent language models; code for paper
Similarity-Based Reconstruction Loss for Meaning Representation
authors present specific similarity loss for auto-encoder training proven to suit better for paraphrase detection task; interestingly, this paper was written in 10 days from scratch according to authors
Conditional Word Embedding and Hypothesis Testing via Bayes-by-Backprop
new model of word embedding from Cho et al.
Grammar Induction with Neural Language Models: An Unusual Replication
latent syntactic tree learning with neural networks, claimed to be first successful approach; code for paper
That’s it! Of cause I have missed many great works, but I hope I gave an reasonable overview which empowers you to see additional materials from the conference and/or arXiv. Dive in!