Multi-Task Learning for Sentence Embeddings

Edward Ma
3 min readJun 10, 2019

Universal Sentence Encoder

“ Mount Fuji“ by Edward Ma on Unsplash

Cera et al. demonstrated that the transfer learning result of sentence embeddings is outperform word embeddings. The traditional way of building sentence embeddings is either average, sum or contacting a set of word vectors to product sentence embeddings. This method loss lots of information but just easier of calculation. Cera et al. evaluated two famous network architectures which are transformer based model and deep averaging network (DAN) based model.

Sentence similarity score (Cera et al., 2018)

This story will discuss about Universal Sentence Encoder (Cera et al., 2018) and the following are will be covered:

  • Data
  • Architecture
  • Implementation

Data

As it is designed to support multiple downstream tasks, multi task learning is adopted. Therefore, Cera et al. use multiple data sources to train model including movie review, customer review, sentiment classification, question classification, semantic textual similarity and Word Embedding Association Test (WEAT) data.

Architecture

--

--

Edward Ma

Focus in Natural Language Processing, Data Science Platform Architecture. https://makcedward.github.io/