Interesting Stuff at EMNLP (part II)
This is a second part of my notes, for the first one please refer to part I.
The papers which drew my attention are described in what I call twitter style, i.e. the main idea and/or important features of work in question. For such stuff I actually have a twitter: http://twitter.com/madrugad0, so you like this paper consider to follow me there. As before, all illustrations belong to their rightful owners, mostly to papers’ authors.
Learning To Split and Rephrase From Wikipedia Edit History
https://arxiv.org/pdf/1808.09468.pdf
New paraphrase dataset from Google; it would be nice to see analog for Russian also; this dataset contains data for splitting long phrases into smaller ones; dataset http://goo.gl/language/wiki-split
Neural Latent Extractive Document Summarization
https://arxiv.org/pdf/1808.07187.pdf
Authors from Microsoft use latent variables to represent sentence inclusion in summary, they train whole model with REINFORCE; they use seq2seq as so called compression model to produce a probability of a sentence in summary being derived from sentence in original document.
Improving Reinforcement Learning Based Image Captioning with Natural Language Prior
https://arxiv.org/pdf/1809.06227.pdf
simple yet brilliant idea: to improve readability of RL generated caption add a loss of n-gram generated being found in the original corpus; code for paper https://github.com/tgGuo15/PriorImageCaption
Joint Aspect and Polarity Classification for Aspect-based Sentiment Analysis with End-to-End Neural Networks
https://arxiv.org/pdf/1808.09238.pdf
joint solution for sentiment and aspects, authors present results on GermEval; based on simple CNN over GloVe
QBLink: A dataset and baselines for sequential open-domain question answering
http://aclweb.org/anthology/D18-1134
new dataset of interconnected question-answer pairs with baselines; dataset http://sequential.qanta.org
Semantic Linking in Convolutional Neural Networks for Answer Sentence Selection
http://aclweb.org/anthology/D18-1133
Authors propose an usage of CNNs to produce embeddings enriched with semantic meaning showing close to the top results with one simple architecture
Disambiguated skip-gram model
http://aclweb.org/anthology/D18-1174
Really nice idea to use a skip-gram to handle multiple meanings of word; linear time inference, which is better that Bartunov’s pervious work on topic
Memory, Show the Way: Memory Based Few Shot Word Representation Learning
http://aclweb.org/anthology/D18-1173
key-value storage for some abstract knowledge storing, retrieving prototype from memory by context and averaging it with actual context embedding gives zero-shot word embedding
Leveraging Gloss Knowledge in Neural Word Sense Disambiguation by Hierarchical Co-Attention
http://aclweb.org/anthology/D18-1170
co-attention trick for word sense disambiguation; also authors use hierarchy to achieve improvement in the task
Streaming word similarity mining on the cheap
http://aclweb.org/anthology/D18-1172
simple linear method to measure similarity of words by counting second-order co-occurences; shows performance comparable to fastText and GloVe
Dissecting Contextual Word Embeddings: Architecture and Representation
https://arxiv.org/pdf/1808.08949.pdf
paper about ELMo describing why contextual representations are better than simple word embeddings
Structured Alignment Networks for Matching Sentences
authors learn latent dependency trees and latent matching for them; they use specific case of decomposable attention
Universal Sentence Encoder
https://arxiv.org/pdf/1803.11175.pdf
USE from Google; there has been a demo on EMNLP
Learning Sentiment Memories for Sentiment Modification without Parallel Data
https://arxiv.org/pdf/1808.07311.pdf
authors detect sentiment words using attention and then use the vocabulary to check if sentiment is inversed and meaning is not changed; code for paper https://github.com/lancopku/SMAE
Cross-Pair Text Representations for Answer Sentence Selection
http://aclweb.org/anthology/D18-1240
Context and Copying in Neural Machine Translation
http://aclweb.org/anthology/D18-1339
exhaustive study of copying in machine translation from Philipp Koehn
Training Deeper Neural Machine Translation Models
https://arxiv.org/pdf/1808.07561.pdf
So called transparent attention from Google; this is actually attention in the depth, i.e. attention on deeper layer outputs; improves SotA
Breaking the Beam Search Curse: A Study of (Re-)Scoring Methods and Stopping Criteria for Neural Machine Translation
https://arxiv.org/pdf/1808.09582.pdf
a study of stopping criteria and scoring methods for beam search, new criteria inspired by BLEU definition achieving 2 BLEU score points on ZhEn MT
Learning When to Concentrate or Divert Attention: Self-Adaptive Attention Temperature for Neural Machine Translation
https://arxiv.org/pdf/1808.07374.pdf
temperature in this work is an additional gate for softmax to be more extreme or smooth; code for paper https://github.com/lancopku/SACT
Loss in Translation: Learning Bilingual Word Mapping with a Retrieval Criterion
https://arxiv.org/pdf/1804.07745.pdf
retrieval objective function (convex relaxation, in fact) for bilingual mapping learning; code available: https://github.com/facebookresearch/fastText/tree/master/alignment/
Multi-Head Attention with Disagreement Regularization
https://arxiv.org/pdf/1810.10183.pdf
the regularization on disagreement of head outputs if actually improves performance, simple and shiny
Generating Natural Language Adversarial Examples
https://arxiv.org/pdf/1804.07998.pdf
simple algorithm of perturbing the NLP models with replacement of words by their synonyms with context check using GloVe; code for paper https://github.com/nesl/nlp_adversarial_examples
Accelerating Asynchronous Stochastic Gradient Descent for Neural Machine Translation
https://arxiv.org/pdf/1808.08859.pdf
Number of tricks to improve convergence for NMT from Microsoft; unfortunately without code
Towards Two-Dimensional Sequence to Sequence Model in Neural Machine Translation
https://arxiv.org/pdf/1810.03975.pdf
new unit with two inputs for NMT
End-to-End Non-Autoregressive Neural Machine Translation with Connectionist Temporal Classification
http://aclweb.org/anthology/D18-1336
authors add CTC (widely used in ASR field, essentially it is Viterbi algorithm on hypotheses matrix) to NAT; improves SotA for EnRo
Prediction Improves Simultaneous Neural Machine Translation
http://aclweb.org/anthology/D18-1337
in the task of online translation we could predict the future words in the source sentence and improve its translation
Iterative Document Representation Learning Towards Summarization with Polishing
https://arxiv.org/pdf/1809.10324.pdf
iterative improvement of text representation, analogous to memory networks with their hops; code for paper https://github.com/yingtaomj/Iterative-Document-Representation-Learning-Towards-Summarization-with-Polishing
APRIL: Interactively Learning to Summarise by Combining Active Preference Learning and Reinforcement Learning
https://arxiv.org/pdf/1808.09658.pdf
code for paper https://github.com/UKPLab/emnlp2018-april
Controlling Length in Abstractive Summarization Using a Convolutional Neural Network
http://aclweb.org/anthology/D18-1444
gate for CNN to control output length for summarisation
Adapting the Neural Encoder-Decoder Framework from Single to Multi-Document Summarization
https://arxiv.org/pdf/1808.06218.pdf
Frustratingly Easy Model Ensemble for Abstractive Summarization
http://aclweb.org/anthology/D18-1449
output selector is surprisingly improves quality for summarisation; authors release 128 pretrained models used in this work: https://research-lab.yahoo.co.jp/en/software/
MSMO: Multimodal Summarization with Multimodal Output
http://aclweb.org/anthology/D18-1448
Learning to Encode Text as Human-Readable Summaries using Generative Adversarial Networks
https://arxiv.org/pdf/1810.02851.pdf
great work on unsupervised text summarisation; unfortunately without code
Improving Neural Abstractive Document Summarization with Structural Regularization
http://aclweb.org/anthology/D18-1441
specific structural features of the architecture employ the fact that summarisation consists of word and sentences at the same time; they use joint attention on words and sentences and specific structural loss
Paragraph-level Neural Question Generation with Maxout Pointer and Gated Self-attention Networks
http://aclweb.org/anthology/D18-1424
Unsupervised Natural Language Generation with Denoising Autoencoders
https://arxiv.org/pdf/1804.07899.pdf
interesting work from Google on natural language generation from structure to text using classic seq2seq; they train denoising auto-encoder corrupting its input only for frequent words
Towards a Better Metric for Evaluating Question Generation Systems
https://arxiv.org/pdf/1808.10192.pdf
in this work authors present several techniques for question generation and measure their quality by answerability, i.e. if the question is presented in the original text
Diversity-Promoting GAN: A Cross-Entropy Based Generative Adversarial Network for Diversified Text Generation
http://www.aclweb.org/anthology/D18-1428
authors propose to use language-model based reward for discriminator, initially trained on real text; code for paper https://github.com/lancopku/DPGAN
QuaSE: Sequence Editing under Quantifiable Guidance
https://arxiv.org/pdf/1804.07007.pdf
style transfer for text with measurable parameter values; code for paper https://bitbucket.org/leoeaton/quase
Better Conversations by Modeling, Filtering, and Optimizing for Coherence and Diversity
https://arxiv.org/pdf/1809.06873.pdf
Paraphrase Generation with Deep Reinforcement Learning
https://arxiv.org/pdf/1711.00279.pdf
inverse reinforcement learning for paraphrase generation; they key idea in training evaluator for matching
AirDialogue: An Environment for Goal-Oriented Dialogue Research
http://aclweb.org/anthology/D18-1419
Google’s approach to make ParlAI, I hope it will be more convenient; I need to be said that ParlAI is more universal from conceptual side: all the tasks (like NER or ChitChat) there are presented as dialogs between teacher and learner, the AirDialogue in contrast is solely devoted to goal-oriented dialogs
A Self-Attentive Model with Gate Mechanism for Spoken Language Understanding
http://aclweb.org/anthology/D18-1417
self-attention application for slot-filling (and intent recognition)
Stylistic Chinese Poetry Generation via Unsupervised Style Disentanglement
http://nlp.csai.tsinghua.edu.cn/~yangcheng/publications/emnlp2018.pdf
combination of poetry generation and controllable text generation
Bottom-Up Abstractive Summarization
https://arxiv.org/pdf/1808.10792.pdf
key idea is to make mask for content selection, masks are trained as smooth relaxation
Word Sense Induction with Neural biLM and Symmetric Patterns
https://arxiv.org/pdf/1808.08518.pdf
a study of WSI applications of bidirectional recurrent language models; code for paper https://github.com/asafamr/SymPatternWSI
Similarity-Based Reconstruction Loss for Meaning Representation
http://aclweb.org/anthology/D18-1525
authors present specific similarity loss for auto-encoder training proven to suit better for paraphrase detection task; interestingly, this paper was written in 10 days from scratch according to authors
Conditional Word Embedding and Hypothesis Testing via Bayes-by-Backprop
https://www.nyu.edu/projects/spirling/documents/emnlp18.pdf
new model of word embedding from Cho et al.
Grammar Induction with Neural Language Models: An Unusual Replication
https://arxiv.org/pdf/1808.10000.pdf
latent syntactic tree learning with neural networks, claimed to be first successful approach; code for paper https://github.com/yikangshen/PRPN
That’s it! Of cause I have missed many great works, but I hope I gave an reasonable overview which empowers you to see additional materials from the conference and/or arXiv. Dive in!