Simple Reinforcement Learning with Tensorflow: Part 2 - Policy-based Agents

Arthur Juliani

1.2K44

Hi Arthur, first of all, tks for your fantastic work!!

I had followed your instruction and read every line of codes from part 0 — I did learn a lot. I read a lot from net but there is no other one’s lessons like you. You really rock! TKS!!

Now I am at here, got some questions on the code, I really hope you could shed a light to me

- loss = -tf.reduce_mean((tf.log(input_y — probability)) * advantages)

I guess this is the cross entropy. You tried to use (input_y — probability) to estimate (and get close) to advantage ? what is the background meaning of (input_y — probability)

2. newGrads = tf.gradients(loss,tvars) — using loss function derivative over tvas does it mean we will try both direction W1Grad W2Grad ? how we apply this newGrads ?

look forward to your reply

Larry