Super Mario Bros. Level Generation Using Torch-RNN

I recently came across a popular blog post on generating levels for Super Mario Bros, and it was really cool.

It gives an excellent high-level overview of how to generate levels for the game, from feature selection to training, with impressive results. For someone like me who had never worked with machine learning before, it did a great job of demystifying the process. At the same time, it left out a lot of low-level details that I hope to fill in.

In this post, I will describe all of the steps that I took to generate levels and play them in the original game, from extracting raw level data to getting reasonable output. A demo of the level that the network generated is available here, as well as on Super Mario Maker. Full source code is available here.

The Level Format

The level format of the game is fairly straightforward, although it has a few quirks designed to save space on the cartridge. I couldn’t find any documentation for the original game, although the Super Mario All Stars port of it has a fairly similar level format that is well-documented.

The easiest way to find the locations in rom that need to be modified if you want to edit levels is to open a hex editor and do a search. Looking at a disassembly of the game, all of the level data (and their corresponding world/level number) are documented. Then, you can search for the first few bytes of the level in the rom to find it. For example, the level objects for W1–1 are stored at offset 0x269E and the enemy objects are stored at 0x1f11 in the USA version of the rom file.

I took the level and enemy data from the disassembly and put them in one big array to output (see source). The level format itself is actually fairly easy to parse. Here is W1–1:

0x50, 0x21,
0x07, 0x81, 0x47, 0x24, 0x57, 0x00, 0x63, 0x01, 0x77, 0x01,
0xc9, 0x71, 0x68, 0xf2, 0xe7, 0x73, 0x97, 0xfb, 0x06, 0x83,
0x5c, 0x01, 0xd7, 0x22, 0xe7, 0x00, 0x03, 0xa7, 0x6c, 0x02,
0xb3, 0x22, 0xe3, 0x01, 0xe7, 0x07, 0x47, 0xa0, 0x57, 0x06,
0xa7, 0x01, 0xd3, 0x00, 0xd7, 0x01, 0x07, 0x81, 0x67, 0x20,
0x93, 0x22, 0x03, 0xa3, 0x1c, 0x61, 0x17, 0x21, 0x6f, 0x33,
0xc7, 0x63, 0xd8, 0x62, 0xe9, 0x61, 0xfa, 0x60, 0x4f, 0xb3,
0x87, 0x63, 0x9c, 0x01, 0xb7, 0x63, 0xc8, 0x62, 0xd9, 0x61,
0xea, 0x60, 0x39, 0xf1, 0x87, 0x21, 0xa7, 0x01, 0xb7, 0x20,
0x39, 0xf1, 0x5f, 0x38, 0x6d, 0xc1, 0xaf, 0x26,
0xfd,

And here is the enemy data:

0x1e, 0xc2, 0x00, 0x6b, 0x06, 0x8b, 0x86, 0x63, 0xb7, 0x0f, 0x05,
0x03, 0x06, 0x23, 0x06, 0x4b, 0xb7, 0xbb, 0x00, 0x5b, 0xb7,
0xfb, 0x37, 0x3b, 0xb7, 0x0f, 0x0b, 0x1b, 0x37,
0xff,

The level data starts off with a two byte header with information about timing, background type and how Mario starts the level. Next, every two bytes represent a level object, such as a question block or a row of bricks.

The format for a level object is XXXXYYYY POOOOOOO . The first nibble represents the x position within a 16 block page, then the y position. In the next byte, the most significant bit represents whether or not the object is the start of a new page. The remainder specifies what type of object it is.

For example, 0x07 0x81 has x=0, y=7, p=1, and o=1. That is, it is a question block 7 blocks from the top of the screen and 16 blocks into the level.

The enemy object format is similar, excluding the header. For more detailed information about the level format, check out the Super Mario All Stars documentation.

The hardest part by far of reading in these levels, however, is a limit of 2–3 objects per y coordinate. I can’t seem to find a definitive rule, but putting more than 2 objects sometimes corrupts the level, causing the next page not to load. The original game gets around this by having “groupable” objects. For example, if o=0x21, that would be two horizontal bricks. If o=0x22, that would be three.

In order to load levels from a text file, the emulator first searches for objects that can be grouped vertically, and then horizontally. While perhaps not optimal, it seems to get around the restrictions.

Using Torch-RNN

I followed the steps at https://github.com/jcjohnson/torch-rnn, using the CUDA docker images. I passed in -v /home/justin/ml:/data to docker to map the folder ~/ml in my home folder to /data in the vm, that way I could get training data in and level data out.

First Attempts at Training

I initially tried to output the levels with their block type as the first character of the line, followed by the rest of the objects for that vertical slice of the level:

1             
1
1
1
1 ?
1
1
1
1 b
1 !
1 g b ?
1 ?
1 b
1
1
1
1 pp
1
1
1
1
1
1
1
1
1
1 ppp
1
1 g
1
1
1
1
1
1 pppp
1
1
1
1
1
1
1
1 g
1 g
1
1 pppp

While this format was easy to read in, it was much more difficult for the neural network to pick up on patterns. The levels generated were less than ideal:

Clearly this format was not going to work, and so I tried to stay as close as possible to the original article. My next attempt at a level format looked like this:

==............
==............
==............
==............
==............
==...?........
==............
==............
==............
==...b........
==...!........
==g..b...?....
==...?........
==...b........
==............
==............
==............
==pp..........
==pp..........
==............
==............
==............
==............
==............
==............
==............
==............
==ppp.........
==ppp.........
==g...........
==............
==............
==............
==............
==............
==pppp........
==pppp........

Instead of reading in the block type directly, the code looks for the block type that covers the most number of blocks, without going over. It then fills the rest of the = blocks with bricks. While this produced better results, I still couldn’t get anything playable.

I reached out to Adam Geitgey, the author of the original blog post, who suggested that I try to simplify the format and play around with various parameters on torch-rnn. Specifically, he suggested that I try using a smaller network, since I didn’t have much data. I tried many different combinations of network size, number of training epochs, and sampling temperature, but nothing seemed to work. I just didn’t have enough data.

One of Adam’s other suggestions was to use a GPU to do the training. This saved me so many days of training.

Success

I was finally successful by repeating levels multiple times in random order in the training data. While this normally tends to cause overfitting, in this case it seemed to work out fine. Perhaps someone more knowledgeable on the subject might have a better approach — it is entirely likely that some combination of network parameters could have given the same effect with less repetition.

The key factors that seemed to improve results were:

  1. Fewer training epochs. I kept increasing the amount of training, hoping that it would make the net better. I went as far as 20000 epochs, but in the end, it seems that somewhere between 20 and 80 is the magic number. More training != better results.
  2. More repetition. I’ve read online that repeating data in your training set is usually bad, as it tends to cause overfitting. In this case, it seemed to work out, but in an ideal world more data would be preferable. There are a whole slew of Mario games and user-generated content to learn from.
  3. More data. I added some levels from the second game, Super Mario Bros: The Lost Levels. This gave the output a distinctly harder flavor, although it seemed to help avoid overfitting.

The final settings that I used to generate the level above were ~20 epochs (I used the 10000th checkpoint), every level repeated 20 times and randomly shuffled, with all of torch-rnn’s default options. The raw output looked like this:

I.......
.Ik...........
..I...........
..I...........
..I.........K.
..I...........
..I.....0.....
.....I..0.....
.....I........
.....I........
.....I........
.....I........
.....I........
.....I........
.....I........
.....I...I....
.....I........
.....I........
.....I........
.....I........
.....I........
.....I........
.....I........
.....I........
.....I........
..............
..............
..............
..............
.........<....
.........<....
.........I....
.........I....
.........I....
.........I....
.........I....
.........I.g..
..............
..............
..............
==-...........
==............
==............
==............
==............
==............
==............
==............
==............
..............
..............
......0.......
......00......
.........0...
==...?........
==...?........
==............
==............
==............
==k.......0...
==............
==............
==............
==----........
==............
==............
==...?........
==...?........
==...?........
==..........b.
==............
==pp.....b....
==pp.....b....
==.......b....
==.......b....
==.......b....
==............
==g...........
==g...........
==............
==............
==............
==----........
==------......
==------......
==............
==............
==k...........
==............
==............
==............
==............
==............
==..8.........
==............
==............
==............
==............
==F...........
==............
==............
==A...........
==A...........
==............
==............
==............
==............
==............
==............
==............
==............
==............
==............
==............
==............
==............
==............
==............
==............
==............
==............
==............
==............
==............
==............
==............
==............
==............
==............
==............
==............
..............
==............
==............
==............
==.8..........
==............
==............
==............
==............
==............
==............
==............
==............
==............
==8.8.........

It continued for the remainder of the level. I manually removed the first incomplete level, added a small level header to specify the level type, and added a flag pole at the end. Everything else is completely unmodified.

Next Steps

There are tons of avenues to explore for generating levels. There are many more games with much more data to harvest. In particular, the New Super Mario Bros series of games have many levels in the same general style, which would be a perfect application of machine learning. In addition, other algorithms such as genetic algorithms, procedural generation, and Markov chains could provide hours of fun new content.

If you want to play around with torch-rnn yourself, I have the full source code to generate the training data and play the generated levels available on Github. It also includes the checkpoint files that I used to generate the level above, if you want to skip training (the funnest part).

I have never done any kind of machine learning before this, so if you find any gross errors, please let me know. Best of luck!