Hi Arthur, first of all, tks for your fantastic work!!
I had followed your instruction and read every line of codes from part 0 — I did learn a lot. I read a lot from net but there is no other one’s lessons like you. You really rock! TKS!!
Now I am at here, got some questions on the code, I really hope you could shed a light to me
- loss = -tf.reduce_mean((tf.log(input_y — probability)) * advantages)
I guess this is the cross entropy. You tried to use (input_y — probability) to estimate (and get close) to advantage ? what is the background meaning of (input_y — probability)
2. newGrads = tf.gradients(loss,tvars) — using loss function derivative over tvas does it mean we will try both direction W1Grad W2Grad ? how we apply this newGrads ?
look forward to your reply