Zombox — Multiplayer Demonstrative Learning in Unity ML Agents in 48 hours.

Changbai Li
12 min readDec 17, 2019


By Jan Dornig and Changbai Li.

It’s almost the anniversary of the last Gamejam we participated, so just in time and way too late, we finally wrote it up to share with you how we tackled our one question: how can Machine Learning be used as a game mechanic.

A 48 hour timeframe, and a two man team that can’t actually write machine learning algorithms from scratch would make this quite a hard sell but thanks to Unity ML agents there is hope.

Unity made amazing progress in enabling Machine learning(ML) experiments in their engine through the Unity ML Agents platform and provides different ML models for programmers to deploy fairly easily in games. When looking through the examples online, we were searching for possibilities to put the player in touch with the learning properties of the ML models- we wanted the player to

  1. Witness meaningful learning in the algorithm
  2. Have as much impact as possible on the ML performance

Reinforcement Learning vs. Demonstrative Learning

Unity ML agents offer different learning models to choose from. Overall, what’s interesting for us were these two categories — reinforcement learning(RL) and demonstrative learning(DL).

Reinforcement learning

Typically, RL is one method that is often used with games. And the fact is that games are used to test RL algorithms, rather than the method being used to support games.

RL is done by rewarding the agents for behaving the way we want them to, and punishing when they don’t. — In short, I get points for accomplishing a goal and negative rewards when I fail — and since that’s the logic for a lot of games, the connection is pretty obvious. Through this, the algorithm slowly learns through trial-and-error the best strategies to maximize rewards/points/coins/wins. Since games often have these mechanics built in, they can often be directly utilized. But more difficult game scenarios also call for customized reward systems that help the AI find its way, rather than stumble in the dark for too long.

Apart from the actual math behind it all, this involves setting up reward-punishment systems, and importantly what the model/agents can sense/gather about the environment.

These are some of the demos available for RL from Unity. The way they are set up is usually that there is no human interference after the stage and model has been set.

Now, for an actual game, we of course want the player to interact in meaningful ways and enjoy the experience, for this to happen, we would have to look at different ways to create a human-in-the-loop system, or something close to what is being called Interactive Machine Learning. These systems are mostly defined by having human agency within them. And to make a good system with human agency, it needs to have observable progression, adjustable components, and feedback mechanisms.

Now some things we could imagine:

  1. Player as Architect: e.g. Platformer game where the player constructs different levels with increasing difficulty to teach the AI increasingly complex movements with a given “end-level” that it has to master. A bit like the recent Unity challenge where they asked participants to write an AI than can solve different simple puzzles that get increasingly more difficult from one level to the next- but rather than writing the Ai directly, you would act as a trainer through preparing challenges for the AI.
  2. Player as Judge/Reward system: e’g. Have the player actually provide the rewards and punishments in real time. You might remember training your creature in Molyneux’s Black and White. In these instances, you are the trainer.

These examples hint at possible game designs and mechanics, and are things we certainly would like to try, but within the 48hour sprint, it seemed unclear whether we would be able to make such a system work, since the interactive parts would have to be implemented. At the same time it was very unclear if the AI would be able to learn in a meaningful way with the limited amount of feedback that can be given by a human.

When it comes to making judgements about how good the AI is doing, most unity examples simply look at how much progress is made within a certain time frame. We would really love to see what would happen if it’s a human who gives feedback on whether an agent is on the right track. Would this approach perform better? If you know some examples that tried that, let us know in the comments.

Demonstrative learning

To our surprise, Unity ML Agent also provides Demonstrative Learning. Instead of training the agents via rewards and punishments, this method is about giving them examples of how to execute the task. If you want the agent to push a box, for example, you can record a footage of yourself pushing the box, and the agent will follow. While we don’t need to set up a reward-punishment system, we still need to define what the agents can sense about the environment.

This training process resembles the real life process of coaching — where the coach first demonstrates how to do something, and then asks the students to repeat that action. This hints at a game mechanics where players take on the role of a coach, and teach agents via demonstration, interacting with the training process in a meaningful way. Having an easy-to-understand metaphor is also great, because that means players will understand how to play very quickly, even when they don’t have any experiences in machine learning.

One of the most important aspects for using this for a game is the speed with which the algorithm can learn, so we did some first tests. This was both to ensure technical feasibility, and also see how effective the training is — how long it takes for the demonstrations to affect the agents’ behavior. And it seemed to work.

The agents learned to push the box toward goal area, after just one demonstration.

Game design

Now that we have a core mechanic that resembles a player training another, our mind naturally thought of sports, of two different coaches getting their team to play better than the other. To win our game then, you’d simply have to be better at training your team than the other player!

We started out thinking about all the things we could do with our newly trained and therefore smart little underlings. Building on the existing unity demo where the AI pushes boxes around, the first idea was to use our minions to collectively built a tower. The two players would be judged based on the height of the tower. Ideally the whole thing would be built directly by player & AI together and might lead to some funny situations where the AI keeps on building in really unstable ways with the player having to help out as an architect.

But a bit into testing, reality came crashing down as well and it became increasingly clear that what we have on our hands here, with the limited training time, is definitely closer to mindless Ai zombies than a box pushing terminator.

With a big part of the Gamejam time now gone, we focused on what we got from Unity and tried to make the best of it. The mindlessness of the henchmen-in-training were actually a cute aspect of the whole thing and reminiscent of the fun of watching the iconic yellow minions half help half destroy their master with their actions.

At this point we arrived at ZOMBOX: two teams would fight over a battle field with randomly spawning boxes, which they would have to push back to their lair to receive points. The team with more points wins.

Players control the Zombox Kings, and by controlling them, they show the minions how to push crates properly into the goal as well as make points themselves.

The system records the players actions and the input of their virtual sensors that track what the character sees around them. Part of the game mechanic is that players control when their movement is recorded and therefore used for training the zomboxes and when not. They can turn training mode on and off any time to choreograph the demonstrations carefully. Train them to attack the enemy, or train them to collect boxes, up to the player. At a later stage that might expand to training different groups of minions to perform different actions, but at this stage we realized a single group with shared behavior.

As with any good AI system, we added a feature that lets the players wipe the agents’ memory clean at any time, in case they want to restart training for the agents.

Having players pitted against each other means that the challenges they face will be quite dynamic — it will be totally up to how the opponents behave. Optimal strategies might change throughout the game, corresponding to how the opponent is playing. Therefore the player might have to teach the agents a new strategy to defeat the opponent’s strategy. The opponent could then re-train the agent with another new strategy. The feedback loop could go on forever!


The ML agents toolkit is split into two parts: a python environment, managed by Anaconda trains the machine learning models; and the Unity SDK is used to run the simulations.

mlagents-learn config/online_bc_config.yaml --train --slow

Unity’s ML Agents toolkit uses “brains” to link the machine learning models, and agents in the game. The brains can record what the player is doing, use it to train the model, and let the model control agents in the game, all at the same time.

To train the neural network with this toolkit, we have to first start the python training process in a command prompt, then jump back to the Unity Editor and press play within 30 seconds. Admittedly, this is not a very friendly gaming experience; but with 48 hours and an avant-garde technology, we have to compromise.

In the newer version of this toolkit, Unity added support for the training to run with an executable. Nevertheless, we still have to start the training process manually. We hope this process can be further automated so that games like this can be ready for shipping.

So long as online_bc_config.yaml contains the brain’s configuration, and there’s an agent in the active game that uses that brain, the toolkit would start training.

The Unity examples only train a single brain each time, but there didn’t seem to be restrictions, so we tried putting in both team’s brain configuration and… hurrah! Both teams are learning from their respective player now. You can have a look at how we’ve set up the training configuration here.


To allow the agents to gather information about their surroundings, we tagged everything in the scene, and then set the agents to read them using raycasting. Much like a spider, the zomboxes each have 8 “eyes” that can see in different directions; unlike a spider, one of the eyes look directly behind the zombox. This represents the core input for the neural network that learns what action to take based on what it sees.

Game Assets

To keep it simple, we stuck with the overall box theme and expanded that to using a pixel/voxel style, making the game assets in MagicaVoxel before exporting to Unity. And a couple basic pixel graphics for the UI.

Apart from the characters, we prepared some indicators which would make it easier to communicate what the zomboxes are focused on back to the players and making the whole scene more cartoonish.


Disappointing Zombies

The most important takeaway was that Zombies are stupid and that the Demonstrative Learning is not producing results fast enough for long term engagement. While it is possible to see the first results of the Zomboxes seemingly starting to head in the right directions, it quickly feels like improvement is bottoming out at a stage that is still very much useless. Since the initial testing seemed to produce more useful looking results, it might be that with enough optimisation of variables and environment this might be able to improve still. At the same time, the zombox theme and game design already tries to balance that out with the PvP mechanic, but it overall can’t be helped that players might look at their running boxes with the look and feeling of regret of disappointed parents.

Combining RL and DL

At one point we looked for ways to speed up the training for better results and considered to use DL and RL in combination. This is a practice that can be found in other areas, especially robotics. A human would show initial demos to get a baseline of “right” input, which noticeably shortens the time of training, since otherwise algorithms start off with random actions and take a long time to arrive to anything useful — which is fine for simulated/virtual training but problematic with real world robotics.

After some googling, and though we found a number of people who were asking the same thing, we found that the Unity framework did not allow for this combination. “Did” — because it seems like much has improved over the last year and Unity adopted the now popular GAN approach which allows for continuous improvement.

From the unity blog: “In v0.9, we introduced GAIL, which addresses both of these issues, based on research by Jonathan Ho and his colleagues. …Because GAIL simply gives the agent a reward, leaving the learning process unchanged, we can combine GAIL with reward-based DRL[Deep Reinforcement Learning] by simply weighting and summing the GAIL reward with those given by the game itself.” https://blogs.unity3d.com/2019/11/11/training-your-agents-7-times-faster-with-ml-agents/

A brave new world full of boxes

One of the aspects we loved about watching the little ones flit around doing whatever they are doing is that you start guessing their level of smartness and what they might be thinking. It’s an anticipation of their learning progress and thinking about what you should do or not do to improve it. So sometimes you see a Zombox running towards a box and you triumphantly think that you finally achieved a little Mini-me, only for it to run straight over the edge and disappear into the dead bot filled ether.

We are still naively optimistic about the prospect of using a mix of Machine Learning to create engaging game mechanics. Naively because it’s unclear if complex behavior can be reasonable learned in a short time, but optimistic because as so much in games — there might be a workaround with half baked Ai models as starting point, different mixes of learning strategies, optimized games and game environment plus the rapid progression of the technology itself.

Concepts like a “wolf whisperer/ alpha wolf” that acts as the leader and trainer and gives rewards and punishment to a horde of wolves accomplishing different tasks around him could lead to interesting different behaviors and new game mechanics where you might spend time training a group before pitting them against others.

One reference we keep coming back to is more reminiscent of the aforementioned Black and White — a sort of simulation/god genre — G.R.R.Martin’s Sandkings describes the story of a curiosities collector who finds something like the most entertaining indoor ant-farm ever where different tribes of bug-like creatures compete for the favor of their god (the collector) and slowly those evolve and more sophisticated behaviors and capabilities begin to show. Experimental games might explore the generative possibilities of letting a bunch of agents run around and create their own habitats and see how this fits in with things like god-games and base-building. Games like Spore have explored similar concepts without the Machine learning, and “ecosystem” kind of environments with AI-agents have been explored by a number of people including Andrej Karpathy or Larry Yaeger’s Polyworld. While these are more rudimentary and evolve creatures in basically a sandbox, maybe we will see that rather soon new ways of what forms this can take in more constructed and complex environments with companies like Klang and their game Seed.io on the horizon.

On a wider range, the power of AI for generative aspects will most likely lead to many new ways for people to create game content with on the spot generation of characters and props that make those worlds feel more alive and dynamic.

If you work on something like that, or have other comments on this, let us know in the comments! And head over to Github to try our demo, make a fork and show us how it’s actually done!