Featured
DeepSeek-R1: Best Open-Source Reasoning LLM Outperforms OpenAI-o1
How to use DeepSeek-R1 and DeepSeek-R1-Zero?
After the sensational DeepSeek-v3 a few days back, DeepSeek has now released DeepSeek-R1 and DeepSeek-R1-Zero which have outperformed OpenAI-o1 and Claude3.5 Sonnet on various benchmarks.
What is DeepSeek-R1 and DeepSeek-R1-Zero?
These are two different model series released by DeepSeek where DeepSeek-R1 is trained using DeepSeek-R1-Zero
Model Specifications (R1 & R1-Zero)
Total Params: 671B.
Activated Params: 37B.
Context Length: 128K.
Base Model: Trained on DeepSeek-V3-Base.
DeepSeek-R1-Zero, DeepSeek-R1, and six distilled models are open-sourced on HuggingFace.
Training Approach
- DeepSeek-R1-Zero:
Trained purely via large-scale reinforcement learning (RL) without any supervised fine-tuning (SFT) as a preliminary step.
Relies entirely on RL to develop reasoning capabilities, making it a unique and groundbreaking approach.
Emerges with self-verification, reflection, and long chain-of-thought (CoT) reasoning behaviors.
Challenges : Issues like endless repetition, poor readability, and language mixing.
DeepSeek-R1:
Incorporates cold-start data before applying RL. This refers to the process of initializing or preparing the model with supervised fine-tuning (SFT) or pre-training data before training it further using reinforcement learning (RL). This approach is used to address some of the challenges faced by models trained purely with RL (like DeepSeek-R1-Zero) and to improve overall performance
Includes two SFT stages to seed both reasoning and non-reasoning capabilities.
Uses a two-stage RL pipeline:
- Discovers improved reasoning patterns.
- Aligns the model with human preferences.
Distillation
- Smaller Models: Reasoning patterns from larger models (like DeepSeek-R1) can be distilled into smaller models, resulting in better performance compared to RL-trained small models.
Open-Source Distilled Models:
DeepSeek-R1-Distill-Qwen series: 1.5B, 7B, 14B, 32B.
DeepSeek-R1-Distill-Llama series: 8B, 70B.
Performance: Distilled models (e.g., DeepSeek-R1-Distill-Qwen-32B) outperform OpenAI-o1-mini across benchmarks, achieving state-of-the-art results for dense models.
Research Impact
- DeepSeek-R1-Zero validates the potential of RL-only training for reasoning capabilities.
- DeepSeek-R1 pipeline introduces a structured approach to improving reasoning and alignment with human preferences.
- Distillation techniques demonstrate that smaller models can achieve high performance, benefiting resource-constrained applications.
Metrics Breakdown
AIME 2024 (Pass@1): Measures the percentage of correct responses on a math-based competition dataset. Higher scores show better single-response accuracy.
Codeforces (Percentile): Indicates percentile ranking on competitive programming problems. A higher percentile implies better performance.
GPQA Diamond (Pass@1): Tests general-purpose QA tasks. Pass@1 shows accuracy on the first response.
MATH-500 (Pass@1): Evaluates performance on advanced math problems. Pass@1 measures correctness in a single attempt.
MMLU (Pass@1): Tests multi-task learning abilities across various subjects. Pass@1 reflects single-response accuracy.
SWE-bench Verified (Resolved): Assesses software engineering task performance, focusing on problem-solving correctness.
Why DeepSeek-R1 Performs Better:
- Overall Scores: DeepSeek-R1 consistently outperforms or matches OpenAI-o1 across most benchmarks, especially in AIME 2024 (79.8%), and SWE-bench Verified (49.2%).
- Specialization: DeepSeek-R1 likely benefits from targeted training on specific datasets or domains, leading to higher accuracy in challenging areas like math (MATH-500, MMLU).
- Adaptability: Higher performance in diverse benchmarks, such as SWE-bench Verified, suggests it is better at handling domain-specific tasks compared to OpenAI-o1.
- Scalability: DeepSeek-R1–32B variant also achieves near-competitive performance with fewer computational resources, indicating efficiency in scaling.
How to use DeepSeek-R1 for free?
Very easy,
- Got to deepseek.com and switch to DeepThink mode
2. Use the open-sourced distilled models from HuggingFace. The link to 1.5B model is added below:
DeepSeek-R1 and R1-Zero set a new standard for reasoning LLMs, outperforming competitors like OpenAI-o1 across key benchmarks. With innovative training techniques and open-source availability, they empower developers to leverage cutting-edge AI for diverse applications. Whether tackling advanced reasoning tasks or scaling down with distilled models, DeepSeek offers flexibility and performance for all.
Give the models a definite try !!