Simple Reinforcement Learning in Tensorflow: Part 1 - Two-armed Bandit
Arthur Juliani
1.8K18

Hello Arthur,

Thanks for the great post,

Any intuition behind sometimes, not following the chosen_action by the Network in this case ?

#Choose either a random action or one from our network.
if np.random.rand(1) < e:
action = np.random.randint(num_bandits)
else:
action = sess.run(chosen_action)

was wondering if it’s a common practice (any citations?) or we make that here so we add more fuzziness to the model since the problem is very straight forward to learn.

Cheers !

One clap, two clap, three clap, forty?

By clapping more or less, you can signal to us which stories really stand out.