Member-only story
The Story of RLHF: Origins, Motivations, Techniques, and Modern Applications
How learning from human feedback revolutionized generative language models…
For a long time, the AI community has leveraged different styles of language models (e.g., n-gram models, RNNs, transformers, etc.) to automate generative and discriminative natural language tasks. This area of research experienced a surge of interest in 2018 with the proposal of BERT [10], which demonstrated that the transformer architecture, self-supervised pretraining, and supervised transfer learning form a powerful combination. In fact, BERT set new state-of-the-art performance on every benchmark on which it was applied at the time. Although BERT could not be used for generative tasks, we saw with the proposal of T5 [11] that supervised transfer learning was effective in this domain as well. Despite these accomplishments, however, such models pale in comparison to the generative capabilities of LLMs like GPT-4 that we have today. To create a model like this, we need training techniques that go far beyond supervised learning.
“Our goal is to advance digital intelligence in the way that is most likely to benefit humanity as a whole.” — OpenAI Founding Statement (Dec. 2015)