What is the Hype About DeepSeek-R1 and What is Important to Understand?

7 min readMar 12, 2025

Why is DeepSeek-R1 Making Waves?

DeepSeek-R1 has entered the AI landscape with bold claims about enhancing the reasoning capabilities of Large Language Models (LLMs). Unlike conventional fine-tuned models, DeepSeek-R1 takes a different approach by leveraging Reinforcement Learning (RL) to improve logical reasoning and decision-making. But what makes this approach unique? And what should we really take away from the hype?

Before diving into the technical details, let’s outline the key aspects that make DeepSeek-R1 a significant development in Large Language Models (LLMs). These points will serve as the guiding structure of this series of posts. In this introductory post I explain the basic ideas of reinforcement learning and proximal policy optimization:

Reinforcement Learning (RL) in LLMs — How RL has been used in language models and why DeepSeek-R1 relies heavily on it.
Proximal Policy Optimization (PPO) — Understanding how PPO fine-tunes LLMs to align with human preferences.
Group Relative Policy Optimization (GRPO) — DeepSeek-R1’s novel RL approach that improves upon PPO.
DeepSeek-R1-Zero: Pure RL without Supervised Fine-Tuning — What happens when an LLM is trained only with RL?
DeepSeek-R1’s Multi-Stage Training Strategy — How DeepSeek combined supervised learning and RL to optimize reasoning.

about ai

What is the Hype About DeepSeek-R1 and What is Important to Understand?

Why is DeepSeek-R1 Making Waves?

Published in about ai

Written by Edgar Bermudez

No responses yet