How I optimized Conway’s Game Of Life
Use Chrome’s Developer Tools to speed up your code
Last month my brother reinvigorated my love for Conway’s Game of Life with some ridiculous videos on YouTube. I loved it — I’d heard of the Game of Life while I was at university, but had never really looked at it in depth. I started doing a little research and, as I learned more about the famous cellular automata, I decided to build my own version.
Optimizing code can be tricky. Identifying performance issues in a complex system can be hard to do. Deciding how to fix them can be a challenge, too. In this article I want to share my journey through optimizing my code, some tactics I used, and tricks I learned to help you optimize your own webapps.
Identifying the problems
The first rule of performance optimization is: test your assumptions. If you built the system, you probably have some good ideas of where to start looking for problems . But, before you start changing code, you should prove yourself right (or wrong). Nothing is quite as unsatisfying as making painful and complex changes to your code only to discover it runs at exactly the same speed as before.
The best place to start is with a measurement of some kind. Decide what “optimized” is, then measure the “optimizedness” of your app. I wanted to improve frame rate, so I decided to aim for a 20ms clock cycle on my laptop for a “pretty high resolution grid”. Once I decided on my metric, I had to measure my app’s current (slow) performance.
My version of the Game of Life is a front-end web application. Chrome and Firefox both have some fantastic code profiling tools built in for measuring web app performance. My screenshots show Chrome’s version — the “Performance” tab.
The “Performance” tab allows you to collect a performance profile of your website.
- Open this tab on the page in question and push “record”.
- Wait a few seconds to get a good sample of overall performance.
- Then hit stop.
Lots of information!
Within the “Performance” tab you can see:
- images of individual frames
- information about rastering
- data regarding GPU use
- and more.
So now I want to find out which code. The “Bottom-up” and the “Call Tree” tabs can help us. These two views are really two ways to view the same information — how much time each function took to execute.
Call Tree tab
Take a look at the “Call Tree” pane:
A good rule in computer optimization is to “make the common case fast.” First, find out what things your program does all the time, then make those parts of the code better. If you make a function dramatically faster, but it only runs 1% of the time, you haven’t actually made your application faster — just that one function. This “Call Tree” view helps us identify “the common case”.
The top line of this view tells me that an event called “Timer Fired” is spawning processes that take up 99.6% of the processing time during my recording.
Notice the two columns on the left labeled “Self Time” and “Total Time”. “Self Time” is the amount of time a function spends within its own scope. “Total Time” includes calls to subroutines. The “Timer Fired” event (part of the JS event loop) only uses 0.1% of the “Self Time”, but its subroutines are taking 99.6% of the total time.
The “Call Tree” orders function calls by total time, and helps us identify which code paths are hogging the CPU. In my case, there was exactly one code path taking up all the time — the call to
setInterval which advances the round and repaints the canvas. Not much of a revelation. But, drilling down, I can see that the “true” CPU hogs are the
fillRect function during painting, and some garbage collection during
There are some other heavy functions too —
setPaintStyles is taking a lot of time as is
getNeighbors. These are functions that I’ll try to optimize opportunistically. If I’m going to be changing the code surrounding how they are used while I’m fixing
fillRect and whatever my garbage collection issue is, then I will try and fix it all at once. But my primary concern is
Using the “Bottom-Up” view, we can see the same information, but ordered differently. The “Call Tree” orders functions by total time, and drilling in shows which functions were called by that outer function. “Bottom-Up” is the reverse view — functions are ordered by self time, and drilling in shows us which functions called the function in question (as opposed to showing us which functions were called by the function in question). Here is that view for my code:
One last helpful tip — although I didn’t really need it during this debugging session — you can select/zoom-in on select portions of the full recorded snapshot. In the chart below, I’ve zoomed in on one firing of
setInterval — this chart is called a flame chart, and it’s a visualization of the call stack over time.
You can see in this chart that
repaint both make subroutine calls, and the placement of the garbage collection calls. This helps us get a sense for how the code is executing over time, rather than just how much time everything is taking in aggregate. If you apply this zoom in the main tab , the “Bottom-Up” and “Call Tree” tabs will update to reflect just your selected time slice. This is helpful when one particular function is slow.
Alright, so I know where to look for problems. Now, how do we fix them?
Fixing the problems
In the broadest sense, we really only have two optimization strategies:
- make the slow thing faster
- do the slow thing less often
The performance of
fillRect and garbage collection are both out of my hands unless I want to open a PR to the Chrome/V8 project. So making things faster at the “leaf nodes” of my call tree is out. Option 2 it is. My goal now is to paint fewer rectangles, and trigger fewer garbage collection events.
Let’s take a look at the key components of the offending code path — the two functions taking all of my time are
repaint. Some of the code snippets below have been slightly modified, to help us focus on what matters for this post.
In this code I’m just looping over the grid and painting each pixel. This happens once every round. I like how simple this code is. It’s unfortunate that I had to change it, but painting every pixel every round is just too wasteful.
My first idea was to paint one large rectangle (the background) then only paint individual pixels if they are alive. This would definitely reduce the number of calls to
fillRect each cycle.
I decided not to go this route, because I already support a case where different pixels within the same grid are given unique “alive” and “dead” colors. The Creeping Ivy simulation is one example (you might have to refresh a couple times to get the multi-color scheme).
My next idea was a common optimization in rendering: only repaint the pixels that have changed. It’s often faster to check if a pixel’s value changed this round than it is to repaint that pixel. In order to do this, I would need to know the previous value of the pixel then, in the
repaint function, I can check if I need to actually repaint this pixel or not.
I want my code to look like this:
Unfortunately, my pixel class didn’t have a
previousState variable at the time.
Now, let’s take a look at the other slow function,
advanceRound. This function had to change in order to enable a
previousState of some kind. I also had to address the garbage collection issues this function had.
Can you see why this code triggers a lot of garbage collection?
Every time I computed a frame, I used a brand new array. This means the old array had to be garbage collected. You can’t tell from this code, but it was even worse than that. The function
entity.update() was actually returning a new instance of the pixel class — which was honestly a strange thing for it to do. Because I was replacing the old grid with a brand new grid full of brand new entities, the garbage collector had to clean up the old stuff.
I had to change a lot in the
ConwayPixel class to allow
previousState information to be saved, and prevent
entity.update from creating a new
ConwayPixel. The result is a lot fewer objects being created and discarded.
I also noticed that
getNeighbors was a little bit expensive. Not only was it taking a reasonable amount of time, but it returned an array of pointers which would have to be garbage collected just like the grids. I wanted to address that too.
It would have been easier for me to change the code less. For example, I could have created the
nextState grid like I was already doing and then sent both of the grids through to the
repaint function. I didn’t do that, because it wouldn’t have solved the garbage collection issue.
Okay, at this point in the investigation, I finally felt that I understood the heart of the problem and how to solve the problem. Although it’s
repaint that was slow, most of the changes to support the faster version actually had to happen between
advanceRound, and the
ConwayPixel class. Here is the plan I came up with:
- Add member variables to the
- Stop creating and discarding all the entities in the grid each update.
- Use those new member variables to
repaintas little as possible.
advanceRound looks like now:
Let’s also look at the relevant supporting functions. Here is the new constructor for a pixel:
Simple, just making a place for the new data.
nextState are modified during the new update functions (below) and the
pixel.neighbors array is populated right after the main grid is constructed.
Here are the two instance methods on individual pixels you saw being called in
advanceRound. This two-step update process lets the whole grid update based on the same information. Changing a pixel then using it’s new value to compute the neighboring pixel’s
nextState would create errors in the simulation.
And, finally, the new version of
repaint, which contains one more small canvas optimization. Instead of painting each pixel in order from top-left to bottom-right, I do a preprocessing step that groups each pixel by the color it will be painted. I learned that setting a canvas context’s fill color is a little expensive, so grouping by color can improve render speeds.
After making the change, it’s time to measure again:
Notice that the flame chart isn’t as deep. That’s good, because it means I’m making fewer nested function calls. Another sign that these optimizations worked is the lack of a garbage collection entry in my call tree — adding persistent data to the individual pixels means there is less clean up to do. The last thing to note is that each
repaint combo is happening in under 20ms (instead of 200ms!). For my effort I was able to achieve a nearly 10x increase in frame rate.
I’m sure it can still be faster… but I want to get back to feature development now.
This article is produced by Teb’s Lab. To learn more about my journey to graduate school, and learn what I’m learning about machine learning, genetics, and bioinformatics, sign up for the Weekly Lab Report, become a patron on Patreon, or just follow me here on Medium.