Reinforcement Learning and the Markov Decision Process

Published in

Analytics Vidhya

7 min readApr 11, 2020

In this article, I want to introduce the Markov Decision Process in the context of Reinforcement Learning.

Markov Decision Process (MDP) is a concept for defining decision problems and is the framework for describing any Reinforcement Learning problem. MDPs are intended as a simple representation of the problem, to learn from the interaction to achieve a goal.

To understand every algorithm in Reinforcement Learning and the theory behind them, it is necessary that you have a solid understanding of the MDP.

The basic idea of Reinforcement Learning, what the MDP is trying to describe is, that an agent and an environment continuously interact with each other, whereby the agent receives a state from the environment, selects an action and the environment responds to the action, presents a new state to the agent and gives a reward depending on how good the action of the agent was. Further information about this basic concept can be in found in an article I wrote earlier: Basic Formalisms of Reinforcement Learning

Back to MDP. To describe these interactions of the agent and the environment mathematically, the concept of the MDP was defined. Whereas the MDP is the final summarizing concept which consists of individual elements:

The Markov Chain

Reinforcement Learning and the Markov Decision Process

Written by Sebastian Dittert