Simple Reinforcement Learning with Tensorflow: Part 2 - Policy-based Agents
Arthur Juliani
21237

Hi Arthur,

I am a bit confuse on the code part below

Setting up our Neural Network agent
loss = -tf.reduce_mean((tf.log(input_y - probability)) * advantages)
Running the Agent and Environment
y = 1 if action == 0 else 0 # a "fake label"
ys.append(y)

If I am understand your intention correctly, are you calculating the loss based on the probability that the action took ?

Would it be the same if coded like below?

Setting up our Neural Network agent
loss = tf.reduce_mean((tf.log(input_y)) * advantages)
Running the Agent and Environment
y = (1-tfprob) if action == 0 else tfprob # a "fake label"
ys.append(y)

Thanks.

A single golf clap? Or a long standing ovation?

By clapping more or less, you can signal to us which stories really stand out.