Simple Reinforcement Learning with Tensorflow Part 8: Asynchronous Actor-Critic Agents (A3C)

Arthur Juliani

85071

Hi Arthur,

I have one question about the loss function definition in the AC_network(). When you define the loss function, I think you make the ‘advantage’ as a tf.placeholder. So, the AC_network() received the ‘advantage’ number from the Worker(). In my view, the ‘advantage’ work as a constant when you calculate the gradient of loss. Is this ok?

Since I think the ‘advantage’ is At=Rt-Vt, where Vt is output from the neural network. So, when you calculate the gradient, you will need to calculate the gradient of Vt in At with respect to the neural net parameters theta. If you make ‘advantages’ as a constant. Will tensorflow do that for you automatically? I am a freshman. Thank you for your help!