Michael Megliola
1 min readMar 18, 2020

--

I understand it now. By accumulating negative rewards, you force an algorithm to explore the unexplored (ie, not-yet-penalized) space. As negative rewards accumulate, you will tend to attempt new solutions, until you happen to succeed… or something like that :)

--

--