Member-only story

Understanding DeepSeek-R1: Insights and Perspectives

Florian June
6 min readFeb 5, 2025

--

DeepSeek-R1, a recently released LLM with deep reasoning capabilities, is making waves — reminding me of the early days of ChatGPT.

DeepSeek-R1 has gained rapid popularity due to its open-source, low-cost nature and performance comparable to OpenAI o1.

Figure 1: Benchmark performance of DeepSeek-R1. [Source].

DeepSeek-R1 has made powerful LLMs more accessible. Many, even those with little tech knowledge, have downloaded and explored it for the first time, and truly experience the power of LLMs.

After reviewing DeepSeek-R1’s technical report, I provide some perspectives and insights.

Training Process

Figure 2: Training Process of DeepSeek-R1-Zero and DeepSeek-R1. Image by author.

Figure 2 is the training process:

  1. Training DeepSeek-R1-Zero (Pure RL Training): It uses reinforcement learning (RL-only) to develop reasoning abilities. During training, the model learns self-verification, reflection, and generates long Chain of Thought (CoT). However, the output lacks readability, often mixing languages and reducing user experience.
  2. Cold Start Fine-Tuning: Stabilizes early RL training, improves readability, and enhances reasoning ability. One source of data comes from…

--

--

No responses yet