Jul 25, 2017 · 1 min read
Nice tutorial. In the code, action = np.argmax(actions == a), should it be the same logic as action = np.argmin(actions == a) ?? But it turn out to be not the same. Quite confused with why you use argmax here rather than just take the index directly.