Member-only story
RLHF: Reinforcement Learning from Human Feedback
ChatGPT’s success ingredient: The Instruction Data.
ChatGPT has captivated the world with its impressive capabilities. But how did it get so smart?
I recently spoke to one of my former coworkers, a software engineer I respect a lot, and I noticed that he believes ChatGPT is a manifestation of AGI, pointing to its ability to simplify complex topics to a six-year-old’s level of understanding as evidence. While I don’t entirely disagree with him on its unreasonable intelligence, I felt compelled to put down my thoughts. In this article, I’d like to emphasize that the magic of ChatGPT is heavily reliant on its training data.
Carefully curated instruction data is the key to ChatGPT’s human-like abilities. Things like explaining concepts to a 6-year-old, turning a resume into a LinkedIn profile, brainstorming ideas with you, etc., didn’t just emerge—they were deliberately encoded into the model in the form of training data.
Like everyone else, this is the first time I am experiencing closed research. Since I was in college, all frontier research has been open and peer-reviewed, until recently. And I believe openness ultimately advances science more than…