How Deepseek R1 Stacks Up Against OpenAI’s Leading Reasoning Models
An Exploration of Performance, Creativity, and Practicality
In the fast-paced world of artificial intelligence, the competition between tech giants and emerging innovators has never been fiercer. Deepseek, a Chinese AI company, recently unveiled its reasoning model, R1, which has already sparked excitement and anxiety across the global AI landscape. Promising competitive performance against OpenAI’s top-tier o1 models, R1 has achieved this with reportedly lower training costs — a claim that challenges the prevailing assumptions in the AI industry.
To gauge R1’s capabilities, we conducted a series of comparative tests against OpenAI’s ChatGPT models, including both the $20/month o1 model and the $200/month o1 Pro variant. The tests spanned various categories: creative writing, problem-solving, instruction following, and handling complex prompts. Here’s what we found.
Key Highlights
- Creative Writing and Humor
When tasked with generating dad jokes, all models showed improvements over previous iterations in humor quality. However, R1 stood out with its unique humor style, such as the “bicycle that doesn’t like to spin its wheels” quip. Despite some hits and misses, R1 demonstrated notable progress in creative tasks. - Historical Imagination
A whimsical prompt about Abraham Lincoln inventing basketball revealed R1’s flair for absurd creativity. R1 combined humor with historical references, including Lincoln’s secretary and insomnia issues. This response edged out the more straightforward narratives from OpenAI’s models. - Precision in Problem-Solving
On technical queries, such as identifying the billionth prime number, R1 outperformed OpenAI by providing a precise answer sourced from reputable datasets. In contrast, OpenAI models focused on theoretical explanations, showcasing their training in conceptual reasoning but lacking the same level of specificity.
ChatGPT vs. Deepseek: Which AI Builds a Better Gaming PC?
Beyond creative writing and logical reasoning, we tested Deepseek R1 against ChatGPT in a practical domain: building a budget-friendly gaming PC. The test aimed to determine which AI could provide a better $1,000 PC build based on current market conditions.
First Impressions: Baseline Recommendations
Without Deepseek R1’s advanced reasoning activated, both AI models suggested relatively outdated hardware, including RTX 30-series GPUs, which are poor choices in 2025 due to their pricing and availability. They also recommended Ryzen 5 processors, a B550 motherboard, and 16GB of DDR4 RAM, but with some questionable storage choices. Notably, ChatGPT recommended the Kingston NV2 SSD, known for its inconsistent performance.
Deepseek R1’s Reasoning Model in Action
Once activated, Deepseek R1’s reasoning model provided a more detailed breakdown of its choices, justifying each component and evaluating compatibility. However, it also admitted that its reasoning was not an actual internal thought process but a structured sequence of responses.
Here’s how the two models stacked up:
Areas for Improvement
While R1 shone in several areas, it faltered on certain prompts requiring meticulous adherence to instructions, such as embedding a hidden code within the second letters of sentences. Arithmetic accuracy also proved problematic in a task involving digit summation, where R1 miscalculated despite providing detailed reasoning.
Expand Your Knowledge on Large Language Models
If you’re interested in diving deeper into how large language models work and how they can be built, optimized, and deployed in real-world applications, consider exploring these recommended books:
- Build a Large Language Model (from Scratch) — A comprehensive guide to constructing your own LLM, covering data collection, model architecture, and training techniques.
- Building LLMs for Production: Enhancing LLM Abilities and Reliability with Prompting, Fine-Tuning, and RAG — Learn how to make LLMs more effective for production use with advanced fine-tuning and retrieval-augmented generation (RAG).
- Building LLM Powered Applications: Create intelligent apps and agents with large language models — Ideal for developers looking to integrate LLMs into applications, with real-world use cases and coding examples.
- Prompt Engineering for LLMs: The Art and Science of Building Large Language Model–Based Applications — Master the techniques of effective prompt engineering to improve LLM responses and usability.
These books provide invaluable insights into the current state and future potential of LLMs, making them essential reading for AI enthusiasts and profession
Final Takeaway
ChatGPT 4o provided a slightly more budget-conscious build, whereas Deepseek aimed for longevity and performance. However, both AI models made critical mistakes. Deepseek recommended a GPU that requires a minimum 700W power supply but only suggested a 600W unit, a major oversight. ChatGPT included a CPU contact frame, which is unnecessary for AMD chips and adds unnecessary costs.
Ultimately, while Deepseek’s build was deemed superior when fed into ChatGPT for comparison, neither AI provided a flawless recommendation. This highlights the limitations of current AI models in practical decision-making, emphasizing the need for human oversight.
Deepseek’s R1 is a formidable competitor, especially considering its lower development costs compared to OpenAI’s models. Its ability to blend creative writing with technical precision highlights its potential to disrupt the industry. However, the model’s occasional lapses in precision suggest there’s room for refinement before it can claim dominance.
For users and industry stakeholders, R1 represents an exciting shift in the AI landscape, proving that smaller-scale operations can compete with tech giants through innovation and efficiency.