Member-only story
When it comes to using an LLM, knowing how its thought processes work is a big advantage
Since DeepSeek disrupted the AI scene a few weeks ago, I’ve been experimenting with it by repeating many of the prompts I’ve used with ChatGPT, Claude, Mistral or Grok, the LLMs that I normally use. I’ve observed DeepSeek’s reasoning process to try to work out how many times I would have had to modify that prompt, using it has feedback for the original response generated by the algorithm.
The reason for my curiosity is to explore the fundamental role of that feedback in the use of LLMs, a topic discussed in an article in The Washington Post, “The Hottest New Idea in AI? Chatbots That Look Like They Think”, which analyzes these reasoning processes, known as the chain of thought, seeing them as a “trend” in the development of AI.
The “trend” is, in fact, a way to improve the performance of these models by partially using reinforcement learning to fine-tune the process in the second phase of training, creating a reward function that optimizes replies. This is precisely the method used by DeepSeek to achieve a superior model to its predecessors, resorting in its case mainly to ChatGPT as a sparring partner instead of using reinforced learning from human feedback (RLHF). My impression is that this is one of the keys to achieving more powerful and adaptive models in areas…