AIGuys

Deflating the AI hype and bringing real research and insights on the latest SOTA AI research papers. We at AIGuys believe in quality over quantity and are always looking to create more nuanced and detail oriented content.

Member-only story

Featured

Understanding DeepSeek’s Internal Mechanisms & Algorithms

--

I know you have already seen so many posts and articles on DeepSeek. So many blogs talk about how awesome DeepSeek is, but only a handful go into its full details.

A lot of people call it China’s ploy to beat the USA and whatnot. DeepSeek is not just a model anymore, it has become a giant geopolitical tool. People are making a lot of allegations and many political and investment talks are happening in the USA to keep the technological lead. But as an AI research blog, we are more interested in the technical side of things.

Table Of Contents:

  • Understanding MDP
  • LLM-MDP (as used in DeepSeek R1)
  • PPO vs GRPO
  • Training Stage Involving RL Post-Training

Understanding MDP

A Markov decision process (MDP) is defined as a stochastic decision-making process that uses a mathematical framework to model the decision-making of a dynamic system in scenarios where the results are either random or controlled by a decision maker, which makes sequential decisions over time.

MDPs rely on variables such as the environment, agent’s actions, and rewards to decide the system’s next optimal action. They are classified into four types — finite…

--

--

AIGuys
AIGuys

Published in AIGuys

Deflating the AI hype and bringing real research and insights on the latest SOTA AI research papers. We at AIGuys believe in quality over quantity and are always looking to create more nuanced and detail oriented content.

No responses yet