Introduction
Traditional scalable reinforcement learning framework, such as IMPALA and R2D2, runs multiple agents in parallel to collect transitions, each with its own copy of model from the parameter server(or learner). This architecture imposes high bandwidth requirements…