Relationship between state (V) and action(Q) value function in Reinforcement Learning

Published in

Intro to Artificial Intelligence

2 min readMay 21, 2021

Value function can be defined as the expected value of an agent in a certain state. There are two types of value functions in RL: State-value and action-value. It is important to understand the relationship between these function to understand RL better.

State value function

It is the expected return (cumulative reward)starting from the state s following policy, π.

γ is the discount factor that determines how far future rewards are taken into account in the return.

The total cumulative reward from timestep t can be written using goal G as shown below:

Source: [2]

Action value function

The expected return(cumulative reward) starts from state s, following policy π, taking action a.

Then we can rewrite in terms of goal, G:

Source: [2]

Relationship between V & Q

We can write the relationship V from Q in a stochastic policy π as written below:

The above equation interpreted as the value function is the total sum of probability of choosing action or policy multiplied by the action-value of taking each action.

Relationship in terms of Q from V is written as below:

P is the state transition matrix that gives the probability of reaching the next state s’ from state s, R is the immediate reward, and V is the state value of the next state s’.

Relationship between state (V) and action(Q) value function in Reinforcement Learning

State value function

Action value function

Relationship between V & Q

Reference

Written by Dhanoop Karunakaran