Simple Reinforcement Learning in Tensorflow: Part 1 - Two-armed Bandit3.1K27Arthur JulianiAlpha ShuroFollowAug 27, 2017 · 1 min readFollowing on my previous comment, I have a question:What was the motivation behind the chosen loss function?