Interesting Stuff in EMNLP (part I)

5 min readNov 15, 2018

I’ve attended EMNLP in Brussels, Belgium, recently and here my biased and incomplete view on its contents. The papers which drew my attention are described in what I call twitter style, i.e. the main idea and/or important features of work in question. For such stuff I actually have a twitter: http://twitter.com/madrugad0, so you like this paper consider to follow me there.

The first of mentioned papers is from main track presentations, others are almost entirely from posters. The first paper also has caused funny Q&A session between Rico Sennrich and some guy from Microsoft, hilarious, hope there will be a record of it.

All illustrations belong to their rightful owners, mostly to paper authors. And without further ado:

Has Machine Translation Achieved Human Parity? A Case for Document-level Evaluation

https://arxiv.org/pdf/1808.07048.pdf

the famous Microsoft’s paper on achieving a human parity on ZhEn news translation inspected in this work; the conclusion is that measures of MT are imperfect and could be improved by using context for a sentence

Collecting Diverse Natural Language Inference Problems for Sentence Representation Evaluation

http://www.cs.jhu.edu/~apoliak1/papers/COLLECTING-DIVERSE-NLI-PROBLEMS--EMNLP-2018.pdf

New dataset for natural language inference available online: www.decomp.net ; it covers 7 phenomena in natural language, combining 13 existing datasets (including CoNLL and FactEval 2017) into one of 570 sentence pairs.

Seq2Seq-Vis: A Visual Debugging Tool for Sequence-to-Sequence Models

https://arxiv.org/pdf/1804.09299.pdf

https://seq2seq-vis.io as it is named, an universal tool for visualisation of seq2seq models, judging by existing code should work with PyTorch and Torch.

An Analysis of Encoder Representations in Transformer-Based Machine Translation

http://aclweb.org/anthology/W18-5431

A study of internal representations from transformer architecture; in particular, it shows that different layers has different features more related to some tasks

Aiming to Know You Better Perhaps Makes Me a More Engaging Dialogue Partner

https://arxiv.org/pdf/1808.07104.pdf

nouvelle metric for engagement in dialog; a task is simplified to just retrievement most relevant candidates of pre-existing pool; the study has been made on ConvAI2 data

Team GESIS Cologne: An all in all sentence-based approach for FEVER

http://aclweb.org/anthology/W18-5524

Textual entailment with commonly used IR engine and decomposable attention model for classification on four classes (true, false, unknown, irrelevant)

Vivisect: Portable, layer-wise task performance monitoring for NLP models

http://aclweb.org/anthology/W18-5445

Simplified but universalized TensorBoard, working with PyTorch and other frameworks

Semi-Autoregressive Neural Machine Translation

https://arxiv.org/pdf/1808.08583.pdf

Simple idea from Alibaba researchers lets make non-autoregressive transformer semi-autoregressive, so it will produce more words at once; the results show that producing 2 words in parallel its quality drops only by 1%; code for the paper https://github.com/chqiwang/sa-nmt

Multi-Domain Neural Machine Translation with Word-Level Domain Context Discrimination

http://aclweb.org/anthology/D18-1041

Simple yet effective idea: use domain classification for better prediction of domain-specific words; authors use two attentions for common words and for domain-specific words; code for paper https://github.com/DeepLearnXMU/WDCNMT

Back-Translation Sampling by Targeting Difficult Words in Neural Machine Translation

https://arxiv.org/pdf/1808.09006.pdf

tricky sampling for difficult (rare or lossy) words; this strategy improves translation in some cases; interestingly, at least some difficult words are subword units, the translation with context helps in this cases

Improving the Transformer Translation Model with Document-Level Context

https://arxiv.org/pdf/1810.03581.pdf

adding one transformer encoder block to get document embedding authors feed it as addition input in attention during encoding and decoding in actual translation and get improvement about 1% in EnFr & EnZh; code for paper https://github.com/Glaceon31/Document-Transformer

Unsupervised Multilingual Word Embeddings

https://arxiv.org/pdf/1808.08933.pdf

adversarial training for multi-lingual word embedding model; code for paper https://github.com/ccsasuke/umwe

Adversarial Propagation and Zero-Shot Cross-Lingual Transfer of Word Vector Specialization

https://arxiv.org/pdf/1809.04163.pdf

adversarial method for train a function to inject linguistic knowledge into pre-trained word-vectors; so we can use already existing lexicons for a small subset of all words; code for paper https://github.com/nmrksic/attract-repel

CLUSE: Cross-Lingual Unsupervised Sense Embeddings

http://aclweb.org/anthology/D18-1025

bilingual word2vec for word senses, the key idea is that for different senses in different languages could be used different words; code for paper http://github.com/MiuLab/CLUSE

Improving Cross-Lingual Word Embeddings by Meeting in the Middle

https://arxiv.org/abs/1808.08780

using two transformations we could achieve better results in combining two word vector models for different languages

Personalized microblog sentiment classification via adversarial cross-lingual learning

http://aclweb.org/anthology/D18-1031

adversarial training for classification specified for an user; code for paper https://github.com/githubfordata/data (the name is really nice one!)

NORMA: Neighborhood Sensitive Maps for Multilingual Word Embeddings

http://aclweb.org/anthology/D18-1047

learning mapping for distant languages by using intermediate one

A Syntactically Constrained Bidirectional-Asynchronous Approach for Emotional Conversation Generation

https://arxiv.org/pdf/1806.07000.pdf

simple approach to generate emotionally-reach dialog responses

Auto-Dialabel: Labeling Dialogue Data with Unsupervised Learning

http://aclweb.org/anthology/D18-1072

a system for automatic slot-value and intent labelling from Microsoft; authors use hierarchical clustering and achieve 84% of human labelling quality

An Auto-Encoder Matching Model for Learning Utterance-Level Semantic Dependency in Dialogue Generation

https://arxiv.org/pdf/1808.08795.pdf

two auto-encoders to embed utterances and FC for matching, i.e. finding a dependence; code for paper https://github.com/lancopku/AMM

Out-of-domain Detection based on Generative Adversarial Network

http://aclweb.org/anthology/D18-1077

Interesting work from Samsung colleagues; they propose training GAN as classifier of out of domain, the GAN they use matches a statistical properties of the data

Supervised and Unsupervised Methods for Robust Separation of Section Titles and Prose Text in Web Documents

http://aclweb.org/anthology/D18-1099

Authors present a system to differ titles from prose text in web-documents using features and clustering; code for paper https://github.com/abhijith-athreya/ASDUS

This is the end of first part of my notes on EMNLP, more interesting stuff is coming! Check it out here.