Member-only story
Featured
(How) Do LLMs Reason and Plan?
Large Reasoning Models (LRM) have been all the rage for the last few months. The age of LLMs is over, now it is LRMs' time. Be it Gemini 2.5, Claude thinking mode, or GPT o-series models, all of them have moved towards reasoning models. Fundamentally, all of them are still LLMs only, but suddenly, these models feel much better and smarter. Their ability to reason and plan seems to increase manyfold. Every week we are crushing different benchmarks, but as responsible researchers and AI enthusiasts, we must ask, how much of this development is real and how much of it is just a hype, a marketing gimmick.
Table of Contents
- Why You Shouldn’t Trust Benchmarks
- Types of Large Reasoning Models (Test Time Scaling & Post Training Methods)
- Confusion About LLMs’ Reasoning Capabilities
- How good are LRMs?
- Is RL Overhyped In Making Machines Smarter?
- Conclusion
Why You Shouldn’t Trust Benchmarks
I feel that most benchmarks are compromised in one way or the other. Personally, I don’t trust any benchmark anymore, these are mere indicators, not the absolute performance.
There have been many leaks in the past few years in the AI benchmark.