Published in


John Christian Fjellestad –Distant road

📚The Current Best of Universal Word Embeddings and Sentence Embeddings

Recent trend in Universal Word/Sentence Embeddings. In this post, we describe the models indicated in black. Reference papers for all indicated models are listed at the end of the post.
  • strong/fast baselines: FastText, Bag-of-Words
  • state-of-the-art models: ELMo, Skip-Thoughts, Quick-Thoughts, InferSent, MILA/MSR’s General Purpose Sentence Representations & Google’s Universal Sentence Encoder.

Recent Developments in Word Embeddings

A wealth of possible ways to embed words have been proposed over the last five years. The most commonly used models are word2vec and GloVe which are both unsupervised approaches based on the distributional hypothesis (words that occur in the same contexts tend to have similar meanings).

Elmo knows quite a lot about words context
  • ELMo’s inputs are characters rather than words. They can thus take advantage of sub-word units to compute meaningful representations even for out-of-vocabulary words (like FastText).
  • ELMo are concatenations of the activations on several layers of the biLMs. Different layers of a language model encode different kind of information on a word (e.g. Part-Of-Speech tagging is well predicted by the lower level layers of a biLSTM while word-sense disambiguation is better encoded in higher-levels). Concatenating all layers allows to freely combine a variety of word representations for better performances on downstream tasks.

The Rise of Universal Sentence Embeddings

A plot of HuggingFace’s dialogs Bag-of-Words. Bag-of-Words approaches loose words ordering but keep a surprising amount of semantic and syntactic content. Interesting insights in Conneau et al. ACL 2018.
Quick-thoughts classification task. The classifier has to chose the following sentence from a set of sentence embeddings. Source: “An efficient framework for learning sentence representations” by Logeswaran et al.
A supervised sentence embeddings model (InferSent) to learn from a NLI dataset. Source: “Supervised Learning of Universal Sentence Representations from Natural Language Inference Data” by A. Conneau et al.

Which supervised training task would learn sentence embeddings that better generalize on downstream tasks?

Multi-task learning can be seen as a generalization of Skip-Thoughts, InferSent, and the related unsupervised/supervised learning schemes, that answer this question by trying to combine several training objectives in one training scheme.



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Thomas Wolf

Natural Language Processing, Deep learning and Computational Linguistics – Science Lead @Huggingface |