DeepMind & UCL’s Stochastic MuZero Achieves SOTA Results in Complex Stochastic Environments
Introduced in 2020, MuZero (Schrittwieser et al.) is a model-based general reinforcement learning agent that combines a learned model of the environment dynamics with a Monte Carlo tree search planning algorithm. Although MuZero has achieved state-of-the-art results on a wide range of domains ranging from board games to visually rich environments, it is limited to deterministic models; and thus struggles in real-world environments that are inherently stochastic.
In the new paper Planning in Stochastic Environments with a Learned Model, a research team from DeepMind and University College London proposes Stochastic MuZero for stochastic model learning. The novel approach achieves performance comparable or superior to state-of-the-art methods in complex single- and multi-agent environments while maintaining the superhuman performance of the original MuZero in deterministic environments such as the game of Go.
Stochastic MuZero combines a learned stochastic transition model of the environment dynamics with a Monte Carlo tree search (MCTS) variant to model the dynamics of stochastic environments…