Day 169(RL) — Bellman equation of the value function

Nandhini N
Jul 19 · 2 min read
Photo by Antoine Dautry on Unsplash

In one of the previous posts, we’ve discussed the value function. To recap it quickly, the value function represents the sum of rewards obtained from the specific state onwards. Let’s see how the value function can be denoted in terms of the Bellman equation,

The value function for the state considered ‘s’ is the summation of immediate reward indicated by R(s, a, s’) and the next state value function V(s’) with some reduction factor. And V’(s) again is dependent on its consecutive next state and the same logic applies to all the succeeding states. The above equation perfectly fits when we have a discrete environment (i.e) when an agent makes a transition, it can only go to one next state.

But the state space(environment) can also be stochastic. Due to this nature, there could be multiple states possible from the current state along with the transition probabilities. To bring the stochastic property into the equation, we include the transition probability as well. Consider a scenario where from the state (s1), there is a 0.7 probability of going into s5 and the probability for the next state to be s7 could be 0.3. So rewriting the above equation, we have

The additional term P(s’|s,a) indicates the stochastic environment. And one more refinement to the above equation is made (i.e) adding the term P(s’|s,a). The summation ensures we are taking the details across different states based on the transition probability of that state.

We also know that not only the state space(environment) can be stochastic but also the policy can be stochastic. In a particular state, there could be multiple actions that can be undertaken and the action space is basically the probability distribution.

The first term expresses the stochastic characteristics of the policy. So for different actions, the formula will be altered to include all the ranges of the probability distribution.

References:

https://github.com/sudharsan13296/Deep-Reinforcement-Learning-With-Python/tree/master/03.%20Bellman%20Equation%20and%20Dynamic%20Programming

Nerd For Tech

From Confusion to Clarification

Nerd For Tech

NFT is an Educational Media House. Our mission is to bring the invaluable knowledge and experiences of experts from all over the world to the novice. To know more about us, visit https://www.nerdfortech.org/.

Nandhini N

Written by

AI Enthusiast | Blogger✍

Nerd For Tech

NFT is an Educational Media House. Our mission is to bring the invaluable knowledge and experiences of experts from all over the world to the novice. To know more about us, visit https://www.nerdfortech.org/.