Reinforcement Learning and the Markov Decision Process

Sebastian Dittert
Analytics Vidhya
Published in
7 min readApr 11, 2020

--

In this article, I want to introduce the Markov Decision Process in the context of Reinforcement Learning.

Markov Decision Process (MDP) is a concept for defining decision problems and is the framework for describing any Reinforcement Learning problem. MDPs are intended as a simple representation of the problem, to learn from the interaction to achieve a goal.

To understand every algorithm in Reinforcement Learning and the theory behind them, it is necessary that you have a solid understanding of the MDP.

The basic idea of Reinforcement Learning, what the MDP is trying to describe is, that an agent and an environment continuously interact with each other, whereby the agent receives a state from the environment, selects an action and the environment responds to the action, presents a new state to the agent and gives a reward depending on how good the action of the agent was. Further information about this basic concept can be in found in an article I wrote earlier: Basic Formalisms of Reinforcement Learning

Back to MDP. To describe these interactions of the agent and the environment mathematically, the concept of the MDP was defined. Whereas the MDP is the final summarizing concept which consists of individual elements:

  • The Markov Chain

--

--

Sebastian Dittert
Analytics Vidhya

Ph.D. student at UPF Barcelona for Deep Reinforcement Learning