Interesting Stuff in EMNLP (part I)
I’ve attended EMNLP in Brussels, Belgium, recently and here my biased and incomplete view on its contents. The papers which drew my attention are described in what I call twitter style, i.e. the main idea and/or important features of work in question. For such stuff I actually have a twitter: http://twitter.com/madrugad0, so you like this paper consider to follow me there.
The first of mentioned papers is from main track presentations, others are almost entirely from posters. The first paper also has caused funny Q&A session between Rico Sennrich and some guy from Microsoft, hilarious, hope there will be a record of it.
All illustrations belong to their rightful owners, mostly to paper authors. And without further ado:
Has Machine Translation Achieved Human Parity? A Case for Document-level Evaluation
https://arxiv.org/pdf/1808.07048.pdf
the famous Microsoft’s paper on achieving a human parity on ZhEn news translation inspected in this work; the conclusion is that measures of MT are imperfect and could be improved by using context for a sentence
Collecting Diverse Natural Language Inference Problems for Sentence Representation Evaluation
http://www.cs.jhu.edu/~apoliak1/papers/COLLECTING-DIVERSE-NLI-PROBLEMS--EMNLP-2018.pdf
New dataset for natural language inference available online: www.decomp.net ; it covers 7 phenomena in natural language, combining 13 existing datasets (including CoNLL and FactEval 2017) into one of 570 sentence pairs.
Seq2Seq-Vis: A Visual Debugging Tool for Sequence-to-Sequence Models
https://arxiv.org/pdf/1804.09299.pdf
https://seq2seq-vis.io as it is named, an universal tool for visualisation of seq2seq models, judging by existing code should work with PyTorch and Torch.
An Analysis of Encoder Representations in Transformer-Based Machine Translation
http://aclweb.org/anthology/W18-5431
A study of internal representations from transformer architecture; in particular, it shows that different layers has different features more related to some tasks
Aiming to Know You Better Perhaps Makes Me a More Engaging Dialogue Partner
https://arxiv.org/pdf/1808.07104.pdf
nouvelle metric for engagement in dialog; a task is simplified to just retrievement most relevant candidates of pre-existing pool; the study has been made on ConvAI2 data
Team GESIS Cologne: An all in all sentence-based approach for FEVER
http://aclweb.org/anthology/W18-5524
Textual entailment with commonly used IR engine and decomposable attention model for classification on four classes (true, false, unknown, irrelevant)
Vivisect: Portable, layer-wise task performance monitoring for NLP models
http://aclweb.org/anthology/W18-5445
Simplified but universalized TensorBoard, working with PyTorch and other frameworks
Semi-Autoregressive Neural Machine Translation
https://arxiv.org/pdf/1808.08583.pdf
Simple idea from Alibaba researchers lets make non-autoregressive transformer semi-autoregressive, so it will produce more words at once; the results show that producing 2 words in parallel its quality drops only by 1%; code for the paper https://github.com/chqiwang/sa-nmt
Multi-Domain Neural Machine Translation with Word-Level Domain Context Discrimination
http://aclweb.org/anthology/D18-1041
Simple yet effective idea: use domain classification for better prediction of domain-specific words; authors use two attentions for common words and for domain-specific words; code for paper https://github.com/DeepLearnXMU/WDCNMT
Back-Translation Sampling by Targeting Difficult Words in Neural Machine Translation
https://arxiv.org/pdf/1808.09006.pdf
tricky sampling for difficult (rare or lossy) words; this strategy improves translation in some cases; interestingly, at least some difficult words are subword units, the translation with context helps in this cases
Improving the Transformer Translation Model with Document-Level Context
https://arxiv.org/pdf/1810.03581.pdf
adding one transformer encoder block to get document embedding authors feed it as addition input in attention during encoding and decoding in actual translation and get improvement about 1% in EnFr & EnZh; code for paper https://github.com/Glaceon31/Document-Transformer
Unsupervised Multilingual Word Embeddings
https://arxiv.org/pdf/1808.08933.pdf
adversarial training for multi-lingual word embedding model; code for paper https://github.com/ccsasuke/umwe
Adversarial Propagation and Zero-Shot Cross-Lingual Transfer of Word Vector Specialization
https://arxiv.org/pdf/1809.04163.pdf
adversarial method for train a function to inject linguistic knowledge into pre-trained word-vectors; so we can use already existing lexicons for a small subset of all words; code for paper https://github.com/nmrksic/attract-repel
CLUSE: Cross-Lingual Unsupervised Sense Embeddings
http://aclweb.org/anthology/D18-1025
bilingual word2vec for word senses, the key idea is that for different senses in different languages could be used different words; code for paper http://github.com/MiuLab/CLUSE
Improving Cross-Lingual Word Embeddings by Meeting in the Middle
https://arxiv.org/abs/1808.08780
using two transformations we could achieve better results in combining two word vector models for different languages
Personalized microblog sentiment classification via adversarial cross-lingual learning
http://aclweb.org/anthology/D18-1031
adversarial training for classification specified for an user; code for paper https://github.com/githubfordata/data (the name is really nice one!)
NORMA: Neighborhood Sensitive Maps for Multilingual Word Embeddings
http://aclweb.org/anthology/D18-1047
learning mapping for distant languages by using intermediate one
A Syntactically Constrained Bidirectional-Asynchronous Approach for Emotional Conversation Generation
https://arxiv.org/pdf/1806.07000.pdf
simple approach to generate emotionally-reach dialog responses
Auto-Dialabel: Labeling Dialogue Data with Unsupervised Learning
http://aclweb.org/anthology/D18-1072
a system for automatic slot-value and intent labelling from Microsoft; authors use hierarchical clustering and achieve 84% of human labelling quality
An Auto-Encoder Matching Model for Learning Utterance-Level Semantic Dependency in Dialogue Generation
https://arxiv.org/pdf/1808.08795.pdf
two auto-encoders to embed utterances and FC for matching, i.e. finding a dependence; code for paper https://github.com/lancopku/AMM
Out-of-domain Detection based on Generative Adversarial Network
http://aclweb.org/anthology/D18-1077
Interesting work from Samsung colleagues; they propose training GAN as classifier of out of domain, the GAN they use matches a statistical properties of the data
Supervised and Unsupervised Methods for Robust Separation of Section Titles and Prose Text in Web Documents
http://aclweb.org/anthology/D18-1099
Authors present a system to differ titles from prose text in web-documents using features and clustering; code for paper https://github.com/abhijith-athreya/ASDUS
This is the end of first part of my notes on EMNLP, more interesting stuff is coming! Check it out here.