Trainability of Badger — Why is Badger so hard to train?
Badger [Rosa] is a multi-agent meta-learning architecture with homogenous agents called experts. It turns out that Badger is usually quite hard to train. We mostly observe multiple plateaus during the training and it generally takes many thousands of iterations to converge.
We want to understand if Badger behaves like RL and meta-learning and to suggest some ways to improve the learning.
Badger is a general meta multi-agent learning framework with homogenous expert policy. It usually works in the outer/inner loop setting. The outer loop is currently used to learn the policy that the inner loop experts will follow to solve the task. The current inner loop policy is trained only for a specific task. In the future iterations of Badger we aim for a more powerful inner loop that is able to learn to solve a range of different tasks.
Micro-Badger is one of the existing implementations of Badger with some specific features. The outer loop learns the weights of a recurrent neural network. Each expert in the inner loop is a recurrent neural network with the same weights. The topology of connections between the experts is created randomly. Each expert will send its output to some other experts and will receive outputs of other experts as well. This happens only in micro-badger, Badger’s long term goal is to discover the policy inside the inner loop.
Read the full article here.
Originally published at https://www.goodai.com on May 20, 2020.