Featured

DeepSeek-R1: Best Open-Source Reasoning LLM Outperforms OpenAI-o1

How to use DeepSeek-R1 and DeepSeek-R1-Zero?

Mehul Gupta
Data Science in your pocket

--

Photo by Joseph Barrientos on Unsplash

After the sensational DeepSeek-v3 a few days back, DeepSeek has now released DeepSeek-R1 and DeepSeek-R1-Zero which have outperformed OpenAI-o1 and Claude3.5 Sonnet on various benchmarks.

What is DeepSeek-R1 and DeepSeek-R1-Zero?

These are two different model series released by DeepSeek where DeepSeek-R1 is trained using DeepSeek-R1-Zero

Model Specifications (R1 & R1-Zero)

Total Params: 671B.

Activated Params: 37B.

Context Length: 128K.

Base Model: Trained on DeepSeek-V3-Base.

DeepSeek-R1-Zero, DeepSeek-R1, and six distilled models are open-sourced on HuggingFace.

Training Approach

  • DeepSeek-R1-Zero:

Trained purely via large-scale reinforcement learning (RL) without any supervised fine-tuning (SFT) as a preliminary step.

Relies entirely on RL to develop reasoning capabilities, making it a unique and groundbreaking approach.

Emerges with self-verification, reflection, and long chain-of-thought (CoT) reasoning behaviors.

Challenges : Issues like endless repetition, poor readability, and language mixing.

DeepSeek-R1:

Incorporates cold-start data before applying RL. This refers to the process of initializing or preparing the model with supervised fine-tuning (SFT) or pre-training data before training it further using reinforcement learning (RL). This approach is used to address some of the challenges faced by models trained purely with RL (like DeepSeek-R1-Zero) and to improve overall performance

Includes two SFT stages to seed both reasoning and non-reasoning capabilities.

Uses a two-stage RL pipeline:

  • Discovers improved reasoning patterns.
  • Aligns the model with human preferences.

Distillation

  • Smaller Models: Reasoning patterns from larger models (like DeepSeek-R1) can be distilled into smaller models, resulting in better performance compared to RL-trained small models.

Open-Source Distilled Models:

DeepSeek-R1-Distill-Qwen series: 1.5B, 7B, 14B, 32B.

DeepSeek-R1-Distill-Llama series: 8B, 70B.

Performance: Distilled models (e.g., DeepSeek-R1-Distill-Qwen-32B) outperform OpenAI-o1-mini across benchmarks, achieving state-of-the-art results for dense models.

Research Impact

  • DeepSeek-R1-Zero validates the potential of RL-only training for reasoning capabilities.
  • DeepSeek-R1 pipeline introduces a structured approach to improving reasoning and alignment with human preferences.
  • Distillation techniques demonstrate that smaller models can achieve high performance, benefiting resource-constrained applications.

Metrics Breakdown

AIME 2024 (Pass@1): Measures the percentage of correct responses on a math-based competition dataset. Higher scores show better single-response accuracy.

Codeforces (Percentile): Indicates percentile ranking on competitive programming problems. A higher percentile implies better performance.

GPQA Diamond (Pass@1): Tests general-purpose QA tasks. Pass@1 shows accuracy on the first response.

MATH-500 (Pass@1): Evaluates performance on advanced math problems. Pass@1 measures correctness in a single attempt.

MMLU (Pass@1): Tests multi-task learning abilities across various subjects. Pass@1 reflects single-response accuracy.

SWE-bench Verified (Resolved): Assesses software engineering task performance, focusing on problem-solving correctness.

Why DeepSeek-R1 Performs Better:

  • Overall Scores: DeepSeek-R1 consistently outperforms or matches OpenAI-o1 across most benchmarks, especially in AIME 2024 (79.8%), and SWE-bench Verified (49.2%).
  • Specialization: DeepSeek-R1 likely benefits from targeted training on specific datasets or domains, leading to higher accuracy in challenging areas like math (MATH-500, MMLU).
  • Adaptability: Higher performance in diverse benchmarks, such as SWE-bench Verified, suggests it is better at handling domain-specific tasks compared to OpenAI-o1.
  • Scalability: DeepSeek-R1–32B variant also achieves near-competitive performance with fewer computational resources, indicating efficiency in scaling.

How to use DeepSeek-R1 for free?

Very easy,

  1. Got to deepseek.com and switch to DeepThink mode

2. Use the open-sourced distilled models from HuggingFace. The link to 1.5B model is added below:

DeepSeek-R1 and R1-Zero set a new standard for reasoning LLMs, outperforming competitors like OpenAI-o1 across key benchmarks. With innovative training techniques and open-source availability, they empower developers to leverage cutting-edge AI for diverse applications. Whether tackling advanced reasoning tasks or scaling down with distilled models, DeepSeek offers flexibility and performance for all.

Give the models a definite try !!

--

--

Responses (6)