Jul 10, 2017 · 1 min read
The idea basically goes: You have DRL optimize first, being sample efficient, and then have genetic algorithms (like the OpenAI paper) optimize, and then DRL, interleaved and repeated ad nauseam.
The intuition behind it is that genetic algorithms and DRL can converge to different optimums, but in similar regions, eventually getting an even better optimum, somewhat similar to the cyclic learning rates paper (“Shapshot Ensembles”).
