Relationship between state (V) and action(Q) value function in Reinforcement Learning
Value function can be defined as the expected value of an agent in a certain state. There are two types of value functions in RL: State-value and action-value. It is important to understand the relationship between these function to understand RL better.
State value function
It is the expected return (cumulative reward)starting from the state s following policy, π.
γ is the discount factor that determines how far future rewards are taken into account in the return.
The total cumulative reward from timestep t can be written using goal G as shown below:
Action value function
The expected return(cumulative reward) starts from state s, following policy π, taking action a.
Then we can rewrite in terms of goal, G:
Relationship between V & Q
We can write the relationship V from Q in a stochastic policy π as written below:
The above equation interpreted as the value function is the total sum of probability of choosing action or policy multiplied by the action-value of taking each action.
Relationship in terms of Q from V is written as below:
P is the state transition matrix that gives the probability of reaching the next state s’ from state s, R is the immediate reward, and V is the state value of the next state s’.