Simple Reinforcement Learning in Tensorflow: Part 1 - Two-armed Bandit
Arthur Juliani

Hello Arthur,

Thanks for the great post,

Any intuition behind sometimes, not following the chosen_action by the Network in this case ?

#Choose either a random action or one from our network.
if np.random.rand(1) < e:
action = np.random.randint(num_bandits)
action =

was wondering if it’s a common practice (any citations?) or we make that here so we add more fuzziness to the model since the problem is very straight forward to learn.

Cheers !

One clap, two clap, three clap, forty?

By clapping more or less, you can signal to us which stories really stand out.