TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial…

Member-only story

Basics of Reinforcement Learning for LLMs

Cameron R. Wolfe, Ph.D.
TDS Archive
Published in
18 min readJan 31, 2024

--

(Photo by Ricardo Gomez Angel on Unsplash)

Recent AI research has revealed that reinforcement learning — more specifically, reinforcement learning from human feedback (RLHF) — is a key component of training a state-of-the-art large language model (LLM). Despite this fact, most open-source research on language models heavily emphasizes supervised learning strategies, such as supervised fine-tuning (SFT). This lack of emphasis upon reinforcement learning can be attributed to several factors, including the necessity to curate human preference data or the amount of data needed to perform high-quality RLHF. However, one undeniable factor that likely underlies skepticism towards reinforcement learning is the simple fact that it is not as commonly-used compared to supervised learning. As a result, AI practitioners (including myself!) avoid reinforcement learning due to a simple lack of understanding — we tend to stick with using the approaches that we know best.

“Many among us expressed a preference for supervised annotation, attracted by its denser signal… However, reinforcement learning proved highly effective, particularly given its cost and time effectiveness.” — from [8]

This series. In the next few overviews, we will aim to eliminate this problem by building…

--

--

TDS Archive
TDS Archive

Published in TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

Cameron R. Wolfe, Ph.D.
Cameron R. Wolfe, Ph.D.

Written by Cameron R. Wolfe, Ph.D.

Director of AI @ Rebuy • Deep Learning Ph.D. • I make AI understandable

Responses (1)