On leveraging imitation learning to train bots

Alex Borghi
WildMeta
Published in
3 min readNov 10, 2021

The team is now back from Develop:Brighton 2021, the UK major game developer conference, where we had the opportunity to meet in real life many studios and give a technical presentation. While we could have simply shared Alex’s slides after his talk, we realised it could be a bit bland so we decided instead to explore some of the elements briefly mentioned in the presentation over a couple of blog posts, starting today with imitation learning.

In our previous technical blog post we mostly talked about our reinforcement learning system. Imitation learning is another powerful tool at our disposal to train bots for video games. Indeed, some games have access to a lot of gameplay generated by their players, for example RTS and MOBA games, to be able to replay past matches. This creates an opportunity to use this data to train bots and make them learn techniques developed by players. Of course, having a dataset created in such a way can pose its own challenges, as two players might react in different ways to the same situation, some actions or observations might be underrepresented, the quality of the data might be highly variable,… but as we will see there are also many advantages of using human data.

Two distinct approaches for imitation learning are behavioural cloning and reinforcement learning with demonstrations. In simple terms, the former consists in using supervised learning to train a neural network that maps observation to action, while the latter involves using human data as a guide during the RL training process (using both human samples and agent generated samples) to help training faster than pure RL. In both cases, these approaches produce agents that exceed the performance of their demonstrations. To learn more about RL with demonstrations check out these articles: Deep Q-learning from Demonstrations and Policy Optimization with Demonstrations.

In pure RL we typically start from a neural network randomly initialised (more precisely each operation/layer has a set of weights initialised using a heuristic that involves sampling from a specific probability distribution). This means the initial behaviour of the agent is erratic and poor at exploring the environment, hence training can be especially slow at the very beginning. Using supervised learning to pretrain a model for RL is a way to train much faster. This approach is at the base of AlphaStar, an AI trained by DeepMind for the game StarCraft II.

AlphaStar visualisation. Source: DeepMind

Moreover, there are several other advantages of using human data rather than pure RL:

  • Human-generated data can help escape pathological behaviours. Indeed, RL can sometimes find suboptimal policies and get stuck.
  • Human data provides a variety of different behaviours. This diversity can greatly help as agents can learn to become more robust.
  • Agents learn more human-like behaviours as data come from humans, in contrast to rule-based/scripted bots that appear more rigid and therefore less believable.

It is also possible to leverage existing rule-based bots (when available) to generate data. Writing good scripted bots is difficult — of sometimes near impossible, which is why we work on ML-based bots — but writing simple ones is often possible, even if seeing them in action and diving into their actual performance would obviously betray their nature. Using data generated in such a way would of course be of lower quality than human data but rule-based bots could still improve the very beginning of training, help exploration, or be used as opponents to train against in competitive games.

We are often asked about the technology we use. Machine learning is offering us a number of tools we can choose from depending on the type of game, what the AI has to learn and how the game is being developed. We all know that each game is different, that’s why the best way to find out more and explore how we can help is to get in touch at contact@wildmeta.com.

Or you can also follow us on Twitter or LinkedIn!

WildMeta, AI for video games.

--

--

Alex Borghi
WildMeta
Editor for

CTO at WildMeta | Machine learning research scientist | Ex Graphcore, Imagination Technologies & Feral interactive