Using UNILM to tackle NLU and NLG | Towards AI
Unified Language Model Pre-training for Natural Language Understanding and Generation
Using UNILM to tackle natural language understanding (NLU) and natural language generation (NLG)
Recent state-of-the-art NLP pre-trained models also use a language model to learn contextualized text representation. From ELMo (Peter et al., 2018), GPT (Radford et al., 2018) to BERT (Devlin et al., 2018), all of them use language model (LM) to achieve a better result.
Dong et al. present a new model, Unified Language Model (UNILM), to tackle natural language understanding (NLU) and natural language generation (NLG) which is trained by English Wikipedia and BookCorpus. Different from ELMo (Peter et al., 2018), GPT (Radford et al., 2018) and BERT (Devlin et al., 2018), UNILM implement unidirectional language model (LM), bidirectional language model (LM) and sequence-to-sequence language model (LM) for different tasks.
The following figure shows the models' architecture. Unidirectional (left-to-right and right-to-left) LSTM is applied by ELMo . A left-to-right transformer is leveraged by GPT while BERT uses a bidirectional Transformer to learn text representation.
UNILM use Transformer (Vaswani et al., 2017) as a backbone network and it offers three language model (LM) objectives for different NLP downstream tasks.
Same as BERT, an input sequence (text) will be converted to token embeddings, position embeddings, and segment embeddings.
For token embeddings,
SOS will be inserted to be the beginning of input while
EOS will be inserted to be the end of the segment. For position embeddings, it refers to the token position of the particular segment while the segment can be either 0 (first segment) or 1 (second segment).
The model uses a multiple-layer transformer to learn contextualized text representation by the aforementioned features. Depends on the use case, you can choose unidirectional LM, bidirectional LM, sequence-to-sequence LM for different downstream tasks.
To train the model, it picks a word from text randomly and masking. The model training objective is predicting this masked word by other tokens. For bidirectional LM, training will use all of the tokens as features expect the masked token. For left-to-right LM, all tokens on the left becoming input features. For sequence-to-sequence LM, all tokens from the first sentence and all tokens on the left in the second sentence becoming features.
Three LM objectives
Depends on downstream tasks, you can choose any one of the architectures from UINLM. Available LM objectives are bidirectional LM, unidirectional LM, and sequence-to-sequence LM.
Same as other famous NLP models, it will be better to fine-tuning the generalized pre-trained model based on your domain data to achieve a better result. Since it is trained by a very large corpus, you just need to provide a relatively small dataset.
UNILM reach a state-of-the-art result in SQuAD, CoQA and GLUE result.
Like to learn?
I am a Data Scientist in the Bay Area. Focusing on the state-of-the-art in data science, artificial intelligence, especially in NLP and platform related areas. Feel free to connect with me on LinkedIn or follow me on Medium or Github.
- L. Dong, N. Yang, W. Wang, F. We, X. Liu, Y. Wang, J. Gao, M. Zhou and H. W. Hon. Unified Language Model Pre-training for Natural Language Understanding and Generation. 2019
- A. Vaswani , N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser. Attention is all you need. 2017