🌻The Best and Most Current of Modern Natural Language Processing

Published in

HuggingFace

7 min readMay 22, 2019

Over the last two years, the Natural Language Processing community has witnessed an acceleration in progress on a wide range of different tasks and applications. 🚀 This progress was enabled by a shift of paradigm in the way we classically build an NLP system: for a long time, we used pre-trained word embeddings such as word2vec or GloVe to initialize the first layer of a neural network, followed by a task-specific architecture that is trained in a supervised way using a single dataset.

Recently, several works demonstrated that we can learn hierarchical contextualized representations on web-scale datasets 📖 leveraging unsupervised (or self-supervised) signals such as language modeling and transfer this pre-training to downstream tasks (Transfer Learning). Excitingly, this shift led to significant advances on a wide range of downstream applications ranging from Question Answering, to Natural Language Inference through Syntactic Parsing…

“Which papers can I read to catch up with the latest trends in modern NLP?”

A few weeks ago, a friend of mine decided to dive in into NLP. He already has a background in Machine Learning and Deep Learning so he genuinely asked me: “Which papers can I read to catch up with the latest trends in modern NLP?”. 👩‍🎓👨‍🎓

That’s a really good question, especially when you factor in that NLP conferences (and ML conferences in general) receive an exponentially growing number of submissions: +80% NAACL 2019 VS 2018, +90% ACL 2019 VS 2018, …

I compiled this list of papers and resources 📚 for him, and I thought it would be great to share it with the community since I believe it can be useful for a lot of people.

Disclaimer: this list is not intended to be exhaustive, nor to cover every single topic in NLP (for instance, there is nothing on Semantic Parsing, Adversarial Learning, Reinforcement Learning applied to NLP,…). It is rather a pick of the most recent impactful works in the past few years/months (as of May 2019), mostly influenced by what I read.

Generally speaking, a good way to start is to read introductive or summary blog posts with a high-level view that gives you enough context ✋ before actually spending time reading a paper (for instance this post or this one).

Who said that naming models should be boring and sad? — Source: Moviefone

🌊 A new paradigm: Transfer Learning

These references cover the foundational ideas in Transfer Learning for NLP:

Deep contextualized word representations (NAACL 2018)
Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, Luke Zettlemoyer
Universal Language Model Fine-tuning for Text Classification (ACL 2018)
Jeremy Howard, Sebastian Ruder
Improving Language Understanding by Generative Pre-Training
Alec Radford, Karthik Narasimhan, Tim Salimans, Ilya Sutskever
Language Models are Unsupervised Multitask Learners
Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding (NAACL 2019)
Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova
Cloze-driven Pretraining of Self-attention Networks (arXiv 2019)
Alexei Baevski, Sergey Edunov, Yinhan Liu, Luke Zettlemoyer, Michael Auli
Unified Language Model Pre-training for Natural Language Understanding and Generation (arXiv 2019)
Li Dong, Nan Yang, Wenhui Wang, Furu Wei, Xiaodong Liu, Yu Wang, Jianfeng Gao, Ming Zhou, Hsiao-Wuen Hon
MASS: Masked Sequence to Sequence Pre-training for Language Generation (ICML 2019)
Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, Tie-Yan Liu

The Transformer architecture has become ubiquitous in sequence modeling tasks. — Source: Attention is all you need

🖼 Representation Learning:

What you can cram into a single vector: Probing sentence embeddings for linguistic properties (ACL 2018)
Alexis Conneau, German Kruszewski, Guillaume Lample, Loïc Barrault, Marco Baroni
No Training Required: Exploring Random Encoders for Sentence Classification (ICLR 2019)
John Wieting, Douwe Kiela
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding (ICLR 2019)
Alex Wang, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, Samuel R. Bowman
and
SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems (arXiv 2019)
Alex Wang, Yada Pruksachatkun, Nikita Nangia, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, Samuel R. Bowman
Linguistic Knowledge and Transferability of Contextual Representations (NAACL 2019)
Nelson F. Liu, Matt Gardner, Yonatan Belinkov, Matthew E. Peters, Noah A. Smith
To Tune or Not to Tune? Adapting Pretrained Representations to Diverse Tasks (arXiv 2019)
Matthew Peters, Sebastian Ruder, Noah A. Smith

🗣 Neural Dialogue:

A Neural Conversational Model (ICML Deep Learning Workshop 2015)
Oriol Vinyals, Quoc Le
A Persona-Based Neural Conversation Model (ACL 2016)
Jiwei Li, Michel Galley, Chris Brockett, Georgios P. Spithourakis, Jianfeng Gao, Bill Dolan
A Simple, Fast Diverse Decoding Algorithm for Neural Generation (arXiv 2017)
Jiwei Li, Will Monroe, Dan Jurafsky
Neural Approaches to Conversational AI (arXiv 2018)
Jianfeng Gao, Michel Galley, Lihong Li
TransferTransfo: A Transfer Learning Approach for Neural Network Based Conversational Agents (NeurIPS 2018 CAI Workshop)
Thomas Wolf, Victor Sanh, Julien Chaumond, Clement Delangue
Disclaimer: I am an author on this publication.
Step by step explanation blog post
Wizard of Wikipedia: Knowledge-Powered Conversational agents (ICLR 2019)
Emily Dinan, Stephen Roller, Kurt Shuster, Angela Fan, Michael Auli, Jason Weston
Learning to Speak and Act in a Fantasy Text Adventure Game (arXiv 2019)
Jack Urbanek, Angela Fan, Siddharth Karamcheti, Saachi Jain, Samuel Humeau, Emily Dinan, Tim Rocktäschel, Douwe Kiela, Arthur Szlam, Jason Weston

🍱 Various picks:

Pointer Networks (NIPS 2015)
Oriol Vinyals, Meire Fortunato, Navdeep Jaitly
End-To-End Memory Networks (NIPS 2015)
Sainbayar Sukhbaatar, Arthur Szlam, Jason Weston, Rob Fergus
Get To The Point: Summarization with Pointer-Generator Networks (ACL 2017)
Abigail See, Peter J. Liu, Christopher D. Manning
Supervised Learning of Universal Sentence Representations from Natural Language Inference Data (EMNLP 2017)
Alexis Conneau, Douwe Kiela, Holger Schwenk, Loic Barrault, Antoine Bordes
End-to-end Neural Coreference Resolution (EMNLP 2017)
Kenton Lee, Luheng He, Mike Lewis, Luke Zettlemoyer
StarSpace: Embed All The Things! (AAAI 2018)
Ledell Wu, Adam Fisch, Sumit Chopra, Keith Adams, Antoine Bordes, Jason Weston
The Natural Language Decathlon: Multitask Learning as Question Answering (arXiv 2018)
Bryan McCann, Nitish Shirish Keskar, Caiming Xiong, Richard Socher
Character-Level Language Modeling with Deeper Self-Attention (arXiv 2018)
Rami Al-Rfou, Dokook Choe, Noah Constant, Mandy Guo, Llion Jones
Linguistically-Informed Self-Attention for Semantic Role Labeling (EMNLP 2018)
Emma Strubell, Patrick Verga, Daniel Andor, David Weiss, Andrew McCallum
Phrase-Based & Neural Unsupervised Machine Translation (EMNLP 2018)
Guillaume Lample, Myle Ott, Alexis Conneau, Ludovic Denoyer, Marc’Aurelio Ranzato
Learning General Purpose Distributed Sentence Representations via Large Scale Multi-task Learning (ICLR 2018)
Sandeep Subramanian, Adam Trischler, Yoshua Bengio, Christopher J Pal
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context (arXiv 2019)
Zihang Dai, Zhilin Yang, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov
Universal Transformers (ICLR 2019)
Mostafa Dehghani, Stephan Gouws, Oriol Vinyals, Jakob Uszkoreit, Łukasz Kaiser
An Embarrassingly Simple Approach for Transfer Learning from Pretrained Language Models (NAACL 2019)
Alexandra Chronopoulou, Christos Baziotis, Alexandros Potamianos
… for older papers, the number of citations is generally a reasonable proxy when choosing what to read.

As a good rule of thumb, you should read papers that you find interesting and spark joy in you! 🤷‍♂️🌟

🌍 General resources

There are plenty of amazing resources available you can use that are not necessarily papers. Here are a few:

Books:

Speech and Language Processing (3rd ed. draft)
Dan Jurafsky and James H. Martin
Neural Network Methods for Natural Language Processing
Yoav Goldberg

Course materials:

Natural Language Understanding and Computational Semantics with Katharina Kann and Sam Bowman at NYU
CS224n: Natural Language Processing with Deep Learning with Chris Manning and Abigail See at Standford
Contextual Word Representations: A Contextual Introduction from Noah A. Smith’s teaching material at UW

Blogs/podcasts:

Sebastian Ruder’s blog
Jay Alammar’s illustrated blog
NLP Highlights hosted by Matt Gardner and Waleed Ammar

Others:

Papers With Code
Twitter 🐦
arXiv daily newsletter
Survey papers
…

🎅 Last advice

That’s it for the pointers! Reading a few of these resources should already give you a good sense of the latest trends in contemporary NLP and hopefully, help you build your own NLP system! 🎮

One last thing that I did not talk about much in this post, but that I find extremely important (and sometimes neglected) is that reading is good, implementing is better! 👩‍💻 You’ll often learn so much more by supplementing your reading with diving into the (sometimes) attached code or trying to implement some of it yourself. Practical resources include the amazing blog posts and courses from fast.ai or our 🤗 open-source repositories.

What about you? What are the works that had the most impact on you? Tell us in the comments! ⌨️

As always, if you liked this post, give us a few 👏 to let us know and share the news around you!

Many thanks to Lysandre Debut, Clément Delangue, Thibault Févry, Peter Martigny, Anthony Moi and Thomas Wolf for their comments and feedback.