A Controllable Framework for Text Generation — CTRL

Text generation models

Edward Ma
Edward Ma
Oct 19 · 3 min read
Photo by Edward Ma on Unsplash

Several large scale text generation models are released recently. GPT-2, XLNet and XLM are some of examples of them. All of them have different story, GPT-2 (Radford et la., 2019) use autoregressive way to perform text generation, XLNet (Yang et al., 201) absorbs idea from GPT-2, Transformer XL and BERT to achieve state-of-the-art result. XLM (Lample and Conneau, 2019) focus on cross-lingual problem.

Controllable Scope

Besides most of them are using transformer (Vaswani et al., 2017) neural network architecture, there is another common behavior which is unable to control the scope. We can only provide some prompts to model and generating text based on model’s training data. Therefore, Keskar et al. from Salesforce proposed Conditional Transformer Language Model (lCTRL) which accepting a control code to generate text according to specific domain’s training data.

CTRL: Conditional Transformer Language Model

CTRL (Keska et al., 2019) is a conditional language model that considers control code (i.e. target domain) and learns the distribution of text. Same as other transformer (Vaswani et al., 2017) architecture based model, it consider both token embeddings and positional embeddings.

CTRL’s Language Modeling (Keska et al., 2019)

Training data are size of 140GB out of 180GB data from Wikipedia, OpenWebText, Amazon Review etc. Instead of using all data (more data more accurate?), Keska leverages BPE to tokenzie words and getting around 250k token with unknown token and performing filter if more than 2 unknown token are found in the training data.

Introducing temperature is one of the way to help generating various output given that input are identical. Penalized sampling is proposed to perform the sampling. Given a list of generated token, the probability of generated token in next token will be discounted. After reviewing the performance, They suggested that discount rate is 1.2 to have a better performance.

Penalized Sampling (Keska et al., 2019)
Generated example when control code is “Horror” and input is “A knife” respectively (Keska et al., 2019)
Generated example when control code is “Review” and input is “A knife” respectively (Keska et al., 2019)

Like to learn?

I am Data Scientist in Bay Area. Focusing on state-of-the-art in Data Science, especially in NLP, data augmentation and platform related. Feel free to connect with me on LinkedIn or Github.

Extension Reading

Reference

DataSeries

Connecting data leaders and curating their thoughts 💡

Edward Ma

Written by

Edward Ma

Focus in Natural Language Processing, Data Science Platform Architecture. https://makcedward.github.io/

DataSeries

Connecting data leaders and curating their thoughts 💡

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade