Thanks for the great post,
Any intuition behind sometimes, not following the chosen_action by the Network in this case ?
#Choose either a random action or one from our network.
if np.random.rand(1) < e:
action = np.random.randint(num_bandits)
action = sess.run(chosen_action)
was wondering if it’s a common practice (any citations?) or we make that here so we add more fuzziness to the model since the problem is very straight forward to learn.