Running Advent of Code on a $2 microcontroller
This year for the first time, I completed Advent of Code. It’s 25 fun little programming exercises, or at least that’s how it starts out.
Warning: THIS BLOG POST HAS SPOILERS FOR ADVENT OF CODE 2022!
At some point the exercises get quite a lot harder. Not as tough as Project Euler, but I’m not just banging them out. Of course I am doing the exercises in Toit, the language for IoT microcontrollers that I help build.
The nice thing about being a developer of the language you are using is that you can add and fix stuff that annoys you as you go. For example we already had the any
and every
methods on the collection classes like List (aka array) and Set. But we didn’t have those on maps. Maps are not normal collections, being more like a collection of pairs in some ways, but it’s convenient to have any
and every
on maps too. Because we use command-line style optional arguments and flags on our methods I was able to add three variants: map.any --keys
, map.any --values
, and plain map.any
, where the predicate takes two arguments, a key and a value.
I also fixed some error messages to be more helpful, because I figured if I was slowed down by working out where exactly a bit of punctuation was missing, then our users would have similar issues.
I ran the puzzles on my Linux desktop, but after completing the 25 puzzles I got to wondering how many of them would run on the $2 microcontroller that we are focusing on for Toit. Although it is thought of as a microcontroller (and it’s certainly too small for Linux), it’s a proper little computer with two cores, and 520k of RAM. A lot of the RAM is gone by the time you boot up, though, used for things like network buffers, caches, interrupt handlers etc. The Toit heaps max out at about 130k. This includes the memory used by Jaguar, a tiny HTTP server that (during development) sits on your Wifi, ready to install programs from your workstation.
I haven’t been making a huge effort to save memory when solving the Advent of Code problems. I’ve made liberal use of small classes like Coord
, a 2-D coordinate class, and I’ve been putting the Coord
objects in sets and maps rather than constructing memory efficient data structures like bitmaps. After all, the focus is on solving the puzzles quickly, not hyper-optimizing the implementation. (Sadly I never made the leader board, which requires you to bang out a solution in about 10 minutes. Partly I’m just not fast enough at these sorts of tasks, and partly I’m too lazy to get up at 6 a.m. when they are released.)
First task was to convert them to use a large constant string as the input rather than a file. Although you can have a file system on an ESP32 it’s often simpler to have a self-contained program that carries its data with it. Toit goes to great lengths to keep the program and the initial data in flash, so it doesn’t cost any RAM to move the input files into the programs.
The first few days run with no trouble at all. On my workstation I use jaguar to run programs on the ESP32 over wifi, and write:
About 4 seconds later I can see on the serial port of the ESP32 that the program has run and printed its results:
Day 9 — too many coordinates
The first real issue is with day 9. In this puzzle a rope is dragged across a 2-D surface, and you must calculate how many of the squares the tail of the rope are dragged across. I had implemented this as a Set of Coord(inate) objects. Since a Set doesn’t store duplicate objects, inserting the coordinates into a set is a simple way to avoid counting squares twice.
However, a set with 5858 Coord instances is just too big for an ESP32. The Coord class has two integer fields and a header, so it’s 3 words.
The set adds at least two words per entry: One for the pointer to the Coord instance, and one for the index entry. See Hash maps that don’t hate you.
In practice the index that the set uses internally can only be of a size that is a power of two, and it needs some slack for efficiency, so the minimum size of the index is 112.5% of the number of entries, and the max is 225%.
In effect we need max 7 words per Coord for the set, which is 164k — too much for our poor ESP32.
Instead, we could encode the coordinate as an integer. In 32 bit Toit, integers up to about ±1 billion are tagged pointers that don’t cause independent objects to be created.
But I quite like having a little Coord class. It’s rather more convenient than having integers throughout the program that encode an x and a y value. So in order to minimize the changes to the program, I made a special CoordSet
that wraps a regular Set, encoding objects into integers before inserting them.
As it happens, we never take objects out of the set in this application, so for day 9 we don’t need to implement decoding the integers and reifying the Coord objects. This gets the memory use of the 5858-entry set down to a level where it can run on the ESP32.
Day 13 — too many Thyngs
For day 13 you have to construct some objects that look rather like Lisp s-expressions — lists of lists and integers. Then you must compare and sort them. Again, this takes up far too much memory.
I decided to use a similar approach to day 9, with a compact representation of the objects that can live in a collection and can be reified into a real object on demand. In this case the simplest compact form was the string form that serves as input to the program.
The input consists of strings like: [[[1,4,[9]],5,2],[1,0,1,2],[[8,[4,10,3,3,0]],[[],6,[2,10,4,10,1],[3,5,9,7,6],[2,7,5,9,5]],9,[10,[3,1,6,2,9],[2,0,2,8]],[[8,1,10,6],2,5]],[[[0,4,6,6,4]],[],9,[8,4,6],[9,[10],[],[5,3,7,7]]]] and of course there is a parser that makes objects out of this. I called the superclass Thyng
and it contains Lyst
and Ynt
objects. The parser can serve as a reifier, and the compact format is a string slice. A string slice acts exactly like a string, but it can be backed by part of another string — in this case the string slices are backed by the constant string that remains in flash.
This small change is enough to get day 13 to run on the ESP32, but it’s about to get much harder.
Day 14 — too much sand
Day 14 takes place in a 2D world again, and this time it’s grains of sand we are simulating. Grains of sand are known for being plentiful, and we have to simulate over 24 000 of them (at least with my input data — everyone gets a different input in this game).
As usual my desktop solution placed the objects in a (hash) set. It’s not necessarily the most efficient data structure, but it’s really simple to use. It’s clear that we need to make a huge reduction in space object now. A conventional approach might be to use a rectangular bitmap to manage the grains of sand. But then we would either have to restrict the size of the world, or somehow implement resizing of the bitmap when sand spills over the edge. It’s all a bit annoying and sounds like quite a rewrite.
So instead, we will implement a special kind of set. It has methods for adding Coord
objects, but internally it’s more like a map from (rounded) coordinates to tiny 16x1 bitmaps. The changes to the actual dayE.toit
are extremely minimal:
- set := {}
+ set := PixelSet
The nice thing about PixelSet is that we can use it again on day 23, without having to worry too much about the exact extent of the 2D world we are moving around in. With this change, we can have a set of 24 000 pixels, using the familiar Set interface ofadd
contains
and do
(for iteration).
The hard days: 15–20
We now come to 5 days that I haven’t yet managed to press into the ESP32’s memory. Day 15 has a puzzle with transmitters and beacons that currently uses over 100Mbytes on Linux and runs for half a minute. I’ll need a completely different approach.
Day 16 is all about the best path to open valves when assisted by elephants. I programmed this as a breadth-first search, which is completely impractical for a small memory device. It also took hours to run on a fast desktop system. Perhaps it can be squeezed into the memory with a depth-first search, but I’ll need to come up with some more tricks if I want it to finish before Advent of Code 2023.
Day 17 is a sort of Tetris, but with a trillion rocks. I solved it on the desktop without simulating all 1 000 000 000 000 dropped shapes, but it’s a big world. Again, a different approach would be needed.
Day 18 is a 3D flood fill that works fine on the ESP32. Yay! But at day 19 it’s again a path search, building machines that can build machines in the right order. I did this as depth first, and it seems to work, but it doesn’t finish in a reasonable time.
Day 20 is all about an array of 5000 rotating numbers. It almost fits in memory, and I have some hope that it can be made to work. My solution takes about a minute and a half on the desktop, so it’s within the realms of possibility.
Day 21 — pulling out all the stops
Day 21’s puzzle is all about monkeys shouting numbers to each other. There are about 2000 monkeys and initially they each took about 60 bytes of memory. No way that was going to fit, especially with a hash map to manage them that can add a peak of about 24 bytes per monkey when rehashing. (While writing this I realized our implementation could be improved here.)
I started by splitting the monkeys into two classes. One of the classes just shouts numbers, the other does calculations (yes, it’s a wacky world the Advent of Code takes place in). The shouters don’t need as many fields.
Although it’s not specified in the puzzle description, each monkey only has at most one other monkey that listens to its shouts. I had a list (growable array) of listeners on each monkey object, but a growable array is really two objects — a holding object with two fields, and a backing object with one size field and some number of array slots. With the headers, a one-element growable list can’t be less than 6 words — 7 if you include the field in the monkey object that points at it. That’s 28 bytes per monkey we can’t afford to spend.
Since Toit is dynamically typed we can use a field in the monkey object to contain either null (no listeners) a monkey reference (one listener) or a list reference (more than one listener). Handling this trickery is all encapsulated in the Monkey class. There’s now a getter that constructs a list of listeners on the fly so the interface is unchanged. (In Toit you can replace a field with a getter and a setter, making it easier to keep a class interface stable when you change the implementation — this is also the reason we don’t have gratuitous private fields with getters and setters: If you later decide that’s what you wanted you can do it with no interface change.)
This still wasn’t enough. Each monkey has a 4-letter alphabetic name. We parse this as a 4-digit base 36 number because integers are more compact than small strings.
That still wasn’t enough. The map from names (numbers) to monkeys was replaced by a custom set where you can look up entries by their name. It’s rather like a map, but the key is contained in the value, saving a word per entry.
Aaand… that still wasn’t enough. There was a work-list to keep track of which monkeys need to redo their arithmetic. I could have abolished that, and just repeatedly scanned the entire set of monkeys to find out which ones had updated inputs to their formula, but that felt a bit dirty. I didn’t want to make the big-O runtime worse. So I took a leaf from the book of Quicksort implementations.
Quicksort doesn’t normally have a work-list. Instead, it uses recursion, which is a bit like using your stack as a work-list. But it has the same issue — we don’t want the stack/work-list to grow without limit. Every time Quicksort partitions its array it has two sub-arrays to sort. It could just recurse on both sub-arrays, but then there’s no limit to how deep the recursion (and thus the memory use) could go. Instead, almost all Quicksort implementations use recursion to take care of one sub-array, and iteration to take care of the other. Because recursion takes the smaller one we know this only requires log(n) recursion depth.
Similarly, when a monkey starts shouting a number we need to update the listeners. But if we have n listeners we can push n-1 onto the work-list and take care of the last one with iteration, saving memory. As it happens, there is only one listener per monkey, so in practice this gets rid of the work-list completely.
With this change, day 21 fits in the memory of the ESP32, and the run-time is quite short.
Declaring victory and going home
Days 22–25 seem to work fine. Day 23 (rotating hurricanes) is quite slow, taking 17 seconds to run on the desktop, so I haven’t actually run it to completion on the ESP32, but I think it works.
In all there are 19 days that work, 2 days that are too slow, and 4 days that need too much memory.
These programs aren’t very embedded. They don’t use the support for GPIO, or the SPI bus that we put in the Toit platform. They don’t drive tiny displays, sensors or actuators. These programs aren’t even very Internet-of-Things-like: They don’t even connect to JSON-based HTTPS end-points. But the idea of Toit is that it’s a real 2020s programming language, with garbage collection, a nice standard library, data structures, crash-proofing, and rapid prototyping. When something goes wrong it throws exceptions with stack traces instead of sullenly crashing. So since it’s a real language you can write anything you want in it. Even silly Xmas puzzles.
If you want to take a look, the code is all at https://github.com/erikcorry/advent-of-code-2022. The PixelSet
class is in the utility file, aoc.toit
, along with some functional methods that haven’t made it into the standard library yet, but that are nice time savers for the sorts of tasks that Advent of Code sets.
Till next year!