Tapabrata Ghosh
Jul 10, 2017 · 1 min read

The idea basically goes: You have DRL optimize first, being sample efficient, and then have genetic algorithms (like the OpenAI paper) optimize, and then DRL, interleaved and repeated ad nauseam.

The intuition behind it is that genetic algorithms and DRL can converge to different optimums, but in similar regions, eventually getting an even better optimum, somewhat similar to the cyclic learning rates paper (“Shapshot Ensembles”).

    Tapabrata Ghosh

    Written by

    CEO/Co-Founder at Vathys