Multi-Task Learning for Sentence Embeddings

3 min readJun 10, 2019

Universal Sentence Encoder

Cera et al. demonstrated that the transfer learning result of sentence embeddings is outperform word embeddings. The traditional way of building sentence embeddings is either average, sum or contacting a set of word vectors to product sentence embeddings. This method loss lots of information but just easier of calculation. Cera et al. evaluated two famous network architectures which are transformer based model and deep averaging network (DAN) based model.

Sentence similarity score (Cera et al., 2018)

This story will discuss about Universal Sentence Encoder (Cera et al., 2018) and the following are will be covered:

Data
Architecture
Implementation

Data

As it is designed to support multiple downstream tasks, multi task learning is adopted. Therefore, Cera et al. use multiple data sources to train model including movie review, customer review, sentiment classification, question classification, semantic textual similarity and Word Embedding Association Test (WEAT) data.

Multi-Task Learning for Sentence Embeddings

Data

Architecture

Written by Edward Ma