Reinforcement Learning: Value Function and Policy

Sebastian Dittert
Analytics Vidhya
Published in
6 min readApr 13, 2020

--

In the last article I described the fundamental concept of Reinforcement Learning, the Markov Decision Process (MDP) and its specifications. With the help of the MDP, Deep Reinforcement Learning problems can be described and defined mathematically.

Since, as described in the MDP article, an agent interacts with an environment, a natural question that might come up is: How does the agent decides what to do, what is his decision-making process? Further, the agent might want to know how good his actions have been and evaluate his current situation in the environment, in the sense of wanting to solve the Problem?

This is exactly what the following article will deal with. The concrete interaction between the agent and the environment. How does the agent evaluate his temporary situation in the environment and how does he decide what action to take?

For this purpose there are two concepts in Reinforcement Learning, each answering one of the questions. The value function covers the part of evaluating the current situation of the agent in the environment and the policy, which describes the decision-making process of the agent. Both shall be explained below…

Policy

A policy (π) describes the decision-making process of the agent. In the simplest case, the policy…

--

--

Sebastian Dittert
Analytics Vidhya

Ph.D. student at UPF Barcelona for Deep Reinforcement Learning