Several large scale text generation models are released recently. GPT-2, XLNet and XLM are some of examples of them. All of them have different story, GPT-2 (Radford et la., 2019) use autoregressive way to perform text generation, XLNet (Yang et al., 201) absorbs idea from GPT-2, Transformer XL and BERT to achieve state-of-the-art result. XLM (Lample and Conneau, 2019) focus on cross-lingual problem.
Besides most of them are using transformer (Vaswani et al., 2017) neural network architecture, there is another common behavior which is unable to control the scope. We can only provide some prompts to model and generating text based on model’s training data. Therefore, Keskar et al. from Salesforce proposed Conditional Transformer Language Model (lCTRL) which accepting a control code to generate text according to specific domain’s training data.
CTRL: Conditional Transformer Language Model
CTRL (Keska et al., 2019) is a conditional language model that considers control code (i.e. target domain) and learns the distribution of text. Same as other transformer (Vaswani et al., 2017) architecture based model, it consider both token embeddings and positional embeddings.
Training data are size of 140GB out of 180GB data from Wikipedia, OpenWebText, Amazon Review etc. Instead of using all data (more data more accurate?), Keska leverages BPE to tokenzie words and getting around 250k token with
unknown token and performing filter if more than 2 unknown token are found in the training data.
temperature is one of the way to help generating various output given that input are identical.
Penalized sampling is proposed to perform the sampling. Given a list of generated token, the probability of generated token in next token will be discounted. After reviewing the performance, They suggested that discount rate is 1.2 to have a better performance.
Like to learn?
- CTRL Implementation (Tensorflow, PyTorch (by Salesforce, by Hugging Face))
- Transformer Explanation
- GPT-2 Explanation
- XLNet Explanation
- XLM Explanation
- 3 Subwords Explanation
- Devlin J, Chang M W, Lee K and Toutanova K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. 2018
- A. Radford, J. Wu, R. Child, D. Luan, D. Amodei and I. Sutskever. Language Models are Unsupervised Multitask Learners. 2019
- Dai Z, Yang Z, Yang Y, Carbonell J, Le Q V and Salakhutdinov R. Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context. 2019
- Yang Z, Dai Z, Yang Y, Carbonell J, Salakhutdinov R and Le Q V. XLNet: Generalized Autoregressive Pretraining for Language Understanding. 2019
- G. Lample and A. Conneau. Cross-lingual Language Model Pretraining. 2019
- N. S. Keskar, B. McCann, L. R. Varshney, C. Xiong and Richard Socher. CTRL: A Conditional Transformer Language Model for Controllable Generation. 2019.