Teaching AI To Play a Platform-Fighting Game Using Genetic Neural Networks

10 min readNov 21, 2017

With AI in the field of games becoming more prevalent, Let see what happens when we try applying some of these classic machine learning approaches to a complex game.

This is an on-going project and there will be subsequent follow ups hopefully with better and better results.

The Game

The game we will be using for this is one I’m personally developing called Melee Masters. Melee Masters is fighting game, however it is more in the realm of “Platform-fighter” which is a genre derived from the popular “Super Smash Brothers” Series. The rules of the game are simple. Attacking the enemy deals damage. the more damage you do to someone, the further they fly away. Each player has 4 Lives, and the goal is to be the last one alive by damaging and knocking the enemy off the stage. Like most fighting games, the characters are 3D but move on a 2D plane. However where it differs from most fighting games and what makes this genre of “platform fighters” is the freedom of movement in this plane. Most traditional fighting games do not give the character control of their horizontal movement once they jump. In these genres there are usually multiple vertical levels, and a lot of various movement options to keep fights interesting.

What the game is all about

The key aspect to consider with this kind of fighting games is the stage.

Unlike traditional 2D fighting games, the goal is to knock players off the stage. This means that you yourself can walk off the stage, this leads to many issues that will be touched on later. Every attack in the game has a damage value, that damage compounds and how far away enemies are sent flying is based on this number. In this game the value in which characters typically die is around 120%. People can also die a lot sooner from moves that have higher knockback, or send at angles that are difficult to recover from. For a full length gameplay video of the game see below

The Goal

The goal here is to replicate basic level game-play by combining Evolutionary Learning with Neural Networks. I chose evolutionary learning because in a game as complex as this, local maxima would be very common. Additionally, these types of games have very varied play-styles and in order to achieve bots that produce varied play-styles, a “Survival Of The Fittest” approach seemed the best idea to produce variants that still trends towards the over-all goal of winning. All aspects of THIS video on Evolutionary learning were implemented in hopes of achieving this goal. The genome in this case is the neural network. In this project 2 of these AI are set to fight against eachother until 4 lives are lost, then then next species repeats until an entire species of size 14 finishes. The best performing species of each generation is used to generate the neural network weights for the subsequent generations in hope of finding appropriate mutations that lead towards better performance.

Since this is a game that must run at 60FPS. Using constant learning approaches with deep neural networks seemed out of the question initially. As running to many operations per frame would slow down the game to unplayable levels.

This isn’t a tutorial on neural nets. So I won’t really be going over the particular implementation, this will be more of a discussion about what was implemented at a higher level. I did, however, write everything involved with the neural networks from scratch. This was all done in C# using the Unity3D engine.

Starting Out

The first task was selecting inputs. I wanted the machine learning to learn as organic as possible. This means giving it information such as if the enemy is in an attack, but not which specific attack. No visual learning is used, It is all based off internal positions and script data. In order to train, 2 bots are dropped into a stage using the same machine learning approach and forced to fight eachother.

The hardest task here was making sure the inputs were all critical information and information that should be trained around. The first layer of the neural network ended up being 25 nodes.

Inputs

The green node represent a duplicate of all the blocked data except pertaining to the enemy character

This is the information I felt most critical for the network to decide upon. Your position relative to the center of the stage, the enemies position, and various variables that dictate the state of the character. If a character is moving fast but is stunned is very different than a character moving fast while not in the stunned state.

All inputs are normalized to values between 1 and -1. Position is normalized based on the maximum distance you can move without dying to the blast zone. Data is normalized to help the neural network train more easily. If you’d like further details on Data Standardization for Neural Networks, check out this article HERE

A single hidden layer of size 15 was used here. I used a smaller layer size than the amount of inputs because not all information was always necessary, and learning what to use when would be critical for decision making.

Outputs

The outputs are the controls that will be sent directly to the character and then executed the next frame. These are representative of all the possible inputs a human player could input on a controller. Button inputs have an activation threshold of 5.0 with a minimum value of -10.0 and a maximum value of 10.0. All outputs are scaled to values between -10 and 10.

Fitness

Evolutionary learning requires a fitness value at the end of the evaluation to determine how well each species did. This is something that took a LONG time to nail down, and I’d like to go over all the mistakes made initially to help learn from the naivety of some of the approaches.

So how do you know how well you are doing in this game? At the most basic level, if you have more lives than your opponent you are generally ahead. And if they have more damage percentage than you, you are also doing well. The more damage your character has on them, the further they fly from enemy attacks, which means you are a lot more likely to die at higher damage. This is obviously something we want to avoid so if our AI is taking damage, we want to punish the behavior that resulted in this. Pretty simple right? I thought so.

Characters traditionally die around 120% damage. fStockValue here is 120. So we add to our fitness the number of lives we have left times the stock value. then subtract how many our opponent has. The same with damage. If we have a lot of damage we subtract that from our fitness, if the enemy has more damage on them than they are doing “worse”, so we increase our own fitness.

Initial Results

They are doing things!

Things were working! the characters would randomly jump around and attack randomly. Those with the highest weighted peak and average fitness were decided as the best from that generation, and their network weights were the basis for the next generation. Two things became obviously fairly quickly though. Firstly, in this game, if you just move your character around randomly, there is a high probability you are going to run your character off the stage. Secondly, Basing your own performance off the enemies performance was a large mistake.

The Idle Problem

The characters quickly learned the best strategy was simply not to move! Because of the nature of this game, moving usually resulted in death, which meant lower average fitness, which means they are highly unlikely to reproduce.

Dealing with idle characters

My first idea to deal with idle character was to add another input: ground timer. This is a timer that adds up every frame when the character is grounded, and reverts to 0 when they are not grounded. This however did not end up working, as the networks that reproduced usually have very low weights for this value, it ended up producing little or no actual effects in later generations, however it did show noticeable changes in idle behavior (characters began moving slightly after staying still for a while).

My second idea for dealing with idle character was implementing rank-space diversity. Rank-space diversity means ranking the final order of the most successful species not only according to their fitness, but also how different they are from the previous generation. This is to encourage diverse groups that achieve similar or slightly worse results in order to climb out of local maxima.

The X axis represents the average fitness, and the Y axis represents diversity from the previous generation. The most red circles are selected as the basis for the next generation. This is all done in an effort to reduce stagnation, by weighing a character that may have performed slightly worse, but is sufficiently different such that the next generation is likely to produce an outcome to overcome the local maxima.

Reevaluating Fitness

After running the program overnight I wake up to find both characters running off either side of the stage very quickly. Part of the problem for idle characters was also a problem with fitness relying on the opponents performance as well. Often times a very poor species would move on simply because the enemy’s rival species would kill itself very quickly. And since the victorious party’s species was used for BOTH characters in the next generation, it would be a race to see who would kill themselves just slightly slower than the others. This lead to a re-evaluation of fitness.

Now, Fitness relies purely on the characters own actions. Only damage dealt by them and their current game-state is relevant. This eliminates the cases where ending generations would just repeatedly kill themselves. However this brought back the issue of stagnant characters. We also want to reward the player for moving, but only if that moving is not the result of being attacked.

However this still was not enough, there is a massive local maxima that keep forming where remaining still is the optimal choice. Occasionally the characters would attack each-other, but would end up walking off stage and therefore eliminating the advantage they gained by attacking.

Recurrent Neural Networks

These networks still ran into a problem. Once they hit a spot where the math works out to make them hold down (crouch, and therefore not move) or hold no horizontal direction at all. They will not move from that spot unless the other AI does. And since both operate off similar networks. Both would eventually walk to certain parts of the map and stop moving.

So, I figured recurrent neural networks would be the next step in solving this. The recurrence ideally should prevent situations where the same inputs (both characters not moving) should produce no results.

The Looping Problem

While this worked sort of, this created a new problem of “loops” where the depth of the recurrence would lead to chains of actions that ultimately got the character no-where. This is better than before, however still a problem. Often time these loops led to the character walking off stage and dying as well.

Long Short-Term Memory

LSTMs are an attempt by traditional RNNs to overcome this issue. LSTM’s try to solve the issue of short term results being lost in the long run in RNNs. Over time the math converges and what was done far in the past is no longer relevant. In a game as complex as this, one critical mistake could move towards a very poor game-state for that player. However due to implementation issues, this ultimately proved inconclusive after running to generation 100 a few times. For more information on how LSTMs try to solve RNN problems, check out this article.

We Are Here

After everything we’ve done we’ve arrived at another local maxima. The probability of random weighting resulting in actions that don’t cause the player to simply run off the edge are too high for evolutionary learning alone to overcome. While sometimes successful, this approach is not consistently improving performance beyond a certain point. At later stages, the players devolve into simply standing still in certain positions as there is no punishment for doing nothing. They are not being attacked and they are surviving for the maximum amount of time.

Moving forward, I’ve planned to implement reinforcement learning in combination with the existing fitness function used in the evolutionary approach to slowly move towards a more controlled learning approach than simply hoping evolution would lead to success. The initially weighting would be done with a set of network weights that have the player moving and attacking versus those which we trained to stand still from the evolutionary approach. Additionally, I’ll pit the neural network AI against the hard-coded AI so that it doesn’t only learn to play purely based off another AI which will likely use a poor strategy in the first play.

While generally unsuccessful, I still have some hope for this. The most enjoyable part of this entire idea was when I woke up one morning to see how the AI had progressed and after being knocked off stage. He used his double jump and up special ability to return to the stage! Through some combination of everything above he learned to save himself. After landing on stage he didn’t do much, but he had the basics to survive!

Conclusion

All in all, things did not work out as hoped. However a lot was learned in terms of making things as easy as possible for the machines to learn, not crowding them with too much or too little data, and not learning the wrong way. I will continue to test and try different methods to get something more representative of a traditional hard-coded AI and then eventually close to a normal-looking player.

If you are interested in the game and following it’s development, check out the website or follow the twitter

Thanks for reading!