Sep 6, 2018 · 1 min read
Hi Thomas. Hope you are doing well. Great tutorials, I have been following the RL tutorials from the beginning. Thanks for the great explanations.
I have a question about the PPO implementation, mainly in the model.py, where the total loss is calculated. I see that part of the loss is calculated by: vf_loss*vf_coef, where vf_coef=0.5. But, when calculating vf_loss you are already multiplying by 0.5 (vf_loss=0.5 * tf.reduce_mean(tf.maximum(value_loss_unclipped,value_loss_clipped ))).
Is that correct? and if so, why we need to multiply twice by 0.5. Thanks in advance for your help.