Relationship between state (V) and action(Q) value function in Reinforcement Learning

Dhanoop Karunakaran
Intro to Artificial Intelligence
2 min readMay 21, 2021
Source: [1]

Value function can be defined as the expected value of an agent in a certain state. There are two types of value functions in RL: State-value and action-value. It is important to understand the relationship between these function to understand RL better.

State value function

It is the expected return (cumulative reward)starting from the state s following policy, π.

γ is the discount factor that determines how far future rewards are taken into account in the return.

The total cumulative reward from timestep t can be written using goal G as shown below:

Source: [2]

Action value function

The expected return(cumulative reward) starts from state s, following policy π, taking action a.

Then we can rewrite in terms of goal, G:

Source: [2]

Relationship between V & Q

We can write the relationship V from Q in a stochastic policy π as written below:

Source: [3]

The above equation interpreted as the value function is the total sum of probability of choosing action or policy multiplied by the action-value of taking each action.

Relationship in terms of Q from V is written as below:

Source:[3]

P is the state transition matrix that gives the probability of reaching the next state s’ from state s, R is the immediate reward, and V is the state value of the next state s’.

Reference

  1. http://www.damiankolmas.com/rl/Bellman-Equations-Introduction/#
  2. https://datascience.stackexchange.com/questions/9832/what-is-the-q-function-and-what-is-the-v-function-in-reinforcement-learning
  3. https://www.cs.cmu.edu/~mgormley/courses/10601-s17/slides/lecture26-ri.pdf

--

--