CrAIg: Using Neural Networks to learn Mario
Joe Crozier and I recently came back from YHack, a 36-hour, 1500 person hackathon held by Yale University. This is our second year in a row attending, and for the second time we managed to place in the top 8!
Our project, named “crAIg”, is a self-teaching algorithm that learns to play Super Mario Bros. for the Nintendo Entertainment System (NES). It begins with absolutely no knowledge of what Mario is, how to play it, or what winning looks like, and using neuroevolution, slowly builds up a foundation for itself to be able to progress in the game.
My focus on this project was the gritty details of the implementation of crAIg’s evolution algorithm, so I figured I’d make a relatively indepth blog post about it.
crAIg’s evolution is based on a paper titled Evolving Neural Networks through Augmented Topologies, specifically an algorithm titled “NEAT”. The rest of the blog post is going to cover my implementation of it, hopefully in relatively layman’s terms.
Before we jump right into the algorithm, I’m going to lay a foundation for the makeup of crAIg’s brain. His “brain” at any given point playing the game is made up of a collection of “neurons” and “synapses”, alternatively titled nodes and connections/links. Essentially, his brain is a directed graph.
Above is the second part of this project, a Node.js server that displays the current state of crAIg’s brain, or what he is “thinking”. Let’s go through it quickly to understand what it’s representing.
On the left you see a big grid of squares. This is what the game looks like right now, or what crAIg can “see”. He doesn’t know what any of the squares mean, but he knows that an “air” tile is different from a “ground” tile in some way. Each of the squares is actually an input neuron.
On the right side you can see the 4 “output neurons”, or the buttons that crAIg can press. You can also see a line going from one of the black squares on the left grid to the “R” neuron, labelled “1”. This is a synapse, and when the input neuron fires on the left, it will send a signal down the synapse and tell crAIg to press the “R” button. In this way, crAIg walks right. As crAIg evolves, more neurons and synapses are created until his brain might look something more like this:
In this one I’ll just point out a couple things. First of all, the green square on the left is a goomba. Second, you can see another neuron at the very bottom (labelled 176). This is called a hidden neuron, and represents a neuron that is neither input nor output. They appear in crAIg’s brain for added complexity as he evolves. You can also see that at his time of death (Mario just died to a goomba), he was trying to press the “R” and “B” buttons.
Why Neuroevolution is so cool
While learning Mario is a neat application of neural networks and neuroevolution, it serves mostly as a means to demonstrate the power of these self-evolving neural networks. In reality, the applications for neural networks is endless. While crAIg only learned how to play a simple NES game, the exact same algorithm that was implemented could also be applied to a robot that cleans your house, works in a factory, or even paints beautiful paintings.
crAIg is a cool peek into the future where machines no longer need to be programmed to complete specific tasks, but are instead given guidelines and can teach themselves and learn from experience. As the tasks we expect machines to complete become more and more complex, it becomes less possible to “hard code” their tasks in. We need more versatile machines to work for us, and evolving neural networks are a step in that direction.
NEAT (NeuroEvolution of Augmented Topologies)
If you’re curious about some history behind the problems encountered by neuroevolution, I highly recommend reading the paper that this algorithm is based off. The first section of the paper covers many different approaches to neuroevolution and their benefits.
NEAT is a genetic algorithm that puts every iteration of crAIg’s brain to the test and then selectively breeds them in a very similar way to the evolution of species in nature. The hierarchy is as follows:
Synapse/Neuron: Building blocks of crAIg’s brain.
Genome: An iteration of crAIg’s brain. Essentially a collection of neurons and synapses.
Species: A collection of Genomes.
Generation: An iteration of the NEAT algorithm. This is repeated over and over to evolve crAIg.
1. Calculating Fitness
The first step every generation is to calculate the fitness of every individual genome from the previous generation. This involves running the same function on each genome so that NEAT knows how successful each one is. For crAIg, this means running through a Mario level using a particular genome, or “brain”. After running through the level, we determine the “fitness” of the genome by this function:
Once the fitness of every genome has been calculated, we can move on to the next portion of the algorithm.
2. Calculating Adjusted Fitness
This part of the algorithm is probably the least intuitive. The reason for this “adjusted fitness” is to discourage species from growing too big. As the population in a species goes up, their “adjusted fitness” goes down, forcing the genetic algorithm to diversify.
The proper implementation of this algorithm is relatively intensive, so for crAIg’s implementation we simplified it to the following:
The important part here is that each genome now has an adjusted fitness value associated with it.
3. Survival of the Fittest
Here’s where the natural selection part comes in! The “Survival of the fittest” portion is all about determining how many genomes survive another generation, as well as how many offspring will be born in the species. The algorithms used here aren’t outlined directly in the paper, so most of these algorithms were created through trial and error.
The first step is to determine how many off a species will die to make room for more babies. This is done proportionally to a species’ adjusted fitness: the higher the adjusted fitness, the more die off to make room for babies.
The second step is to determine how many children should be born in the species. This is also proportional to the adjusted fitness of the species.
By the end of these two functions, the species will have a certain number of genomes left as well as a “baby quota” — the difference between the number of genomes and the populationSize.
This algorithm is necessary to allow for species to be left behind. Sometimes a species will go down the completely wrong path, and there’s no point in keeping them around. This algorithm works in a very simple way: If a species is in the bottom __% of the entire generation, it is marked for extinction. If a species is marked for extinction __ times in a row, then all genomes in the species are killed off.
5. Mating Season
Now comes the fun genetics part! Each species should have a certain number of genomes as well as a certain number of allotted spots for new offspring. Those spots now need to be populated.
Each empty population spot needs to be filled, but can be filled through either “asexual” or “sexual” reproduction. In other words, offspring can result from either two genomes in the species being merged or from a mutation of a single genome in the species. Before I discuss the process of “merging” two genomes, I’ll first discuss mutations.
There are three kinds of mutations that can happen to a genome in NEAT. They are as follows:
- Mutate Synapse Weights
This involves a re-distribution of all synapse weights in a genome. They can be either completely re-distributed or simply “perturbed”, meaning changed slightly.
2. Mutate Add Synapse
Adding a synapse means finding two previously unconnected nodes and connecting them with a synapse. This new synapse is given a random weight.
3. Mutate Add Node
This is the trickiest of the mutations. When adding a node, you need to split an already existing synapse into two synapses and add a node in between them. The weight of the original synapse is copied on to the second synapse, while the first synapse is given a weight of 1. One important fact to note is that the first synapse (bright red in the above picture) is not actually deleted, but merely “disabled”. This means that it exists in the genome, but it is marked as inactive.
Synapses added in either Mutate Add Node or Mutate Add Synapse are given a unique “id” called a “historical marking”, that is used in the crossover (mating) algorithm.
When two genomes “mate” to produce an offspring, there is an algorithm detailed in the NEAT paper that must be followed. The intuition behind it is to match up common ancestor synapses (remember we’ve been keeping their “historical marking”s), then take the mutations that don’t match up and mix and match them to create the child. Once a child has been created in this way, it undergoes the mutation process outlined above. I won’t go into too much detail on this algorithm but if you’re curious about it you can find a more detailed explanation of it in section 3.2 of the original paper, or you can see the code I used to implement it here.
Once all the babies have been created in every species, we can finally progress to the final stage of the genetic algorithm: Respeciation. Essentially, we first select a “candidate genome” from each species. This genome is now the representative for the species. All genomes that are not selected as candidates are put into a generic pool and re-organized. The re-organization relies on an equation called the “compatibility distance equation”.
This equation determines how similar (or different) any two given genomes are. I won’t go into the gritty details of how the equation works, as it is well explain in section 3.3 of the original paper, as well as here in crAIg’s code.
If a genome is too different from any of the candidate genomes, it is placed in its own species. Using this process, all of the genomes in the generic pool are re-placed into species.
Once this process has completed, the generation is done, and we are ready to re-calculate the fitness of each of the genomes.
While creating crAIg meant getting very little sleep at YHack, it was well worth it for a couple reasons.
First of all, the NEAT algorithm is a very complex one. Learning how to implement a complex algorithm without losing myself in its complexity was an exercise in code cleanliness, despite being pressed for time because of the hackathon.
It was also very interesting to create an algorithm that is mostly based off a paper as opposed to one that I have example code to work with. Often this meant carefully looking into the wording used in the paper to determine whether I should be using a > or a >=, for example.
One of the most difficult parts of this project was that I was unable to test as I was programming. I essentially wrote all of the code blind and then was able to test and debug it once it had all been created. This was for a couple reasons, partially because of the time constraints of a hackathon, and partially because the algorithm as a whole has a lot of interlocking parts, meaning they needed to be in a working state to be able to see if the algorithm worked.
Overall I’m happy and proud by how Joe and I were able to deal with the stress of creating such a deep and complex project from scratch in a short 36 hour period. Not only did we enjoy ourselves and place well, but we also managed to teach crAIg some cool skills, like jumping over the second pipe in Level 1: