Chapter 3: Building a Self-Replicating Cellular Automaton with Top-Down Programming

Phillip Compeau
Programming for Lovers
27 min readDec 1, 2019

“But where is everybody?” — Enrico Fermi

Can a machine replicate itself? Given enough construction materials, such a robot could multiply exponentially — one robot copies itself into another robot, two robots become two, four robots become eight, and so on.

Potential applications for a self-replicating robot are humbling. Imagine an unmanned space probe that could explore the galaxy while also harvesting raw materials from passing asteroids and planets to build copies of itself. We would only need a few dozen generations of these probes before we could assign a single probe to each of the 250 billion stars in the Milky Way, and in so doing explore every planet in our galaxy for signs of life. Such probes are called von Neumann probes in honor of our friend John von Neumann from our work with pseudorandom number generation.

Many scientists believe that a technologically advanced society could produce von Neumann probes. They also believe that the conditions for life almost certainly exist elsewhere in our galaxy. Why, then, have we not seen evidence of alien life ourselves? This conundrum, inspired by Enrico Fermi’s alleged quotation at the start of this chapter, is known as the Fermi paradox.

Robots, and especially self-replicating robots, are difficult to build (see video below). Despite major advances in robotics research, we are still far from manufacturing self-replicating robots that can also perform a task competently.

The state of the art for self-replicating robots is still quite primitive. These robots, produced in 2005 by a Cornell University team and recorded with a potato, can self-replicate given spare blocks but cannot perform any other task.

In an effort to understand self-replication, perhaps we should look for the simplest self-replicating system that we can find. This was one of John von Neumann’s goals in the 1940s, which is why self-replicating space probes are named in his honor. When Stanislaw Ulam pointed out to von Neumann that cells, the basic unit of all living things, are themselves self-replicators, the strange and wonderful world of cellular automata was born.

The Game of Life

The rules of the game

A cellular automaton is a grid of (typically square) cells, along with a collection of simple rules that allow the cells to change from one state to another. An automaton has finitely many states, and the rules for a cell changing from one state to another are based only on the cell’s current state and the states of the cells in their immediate surroundings. We would never mistake such a system for an intelligent one.

Our goal is to build a cellular automaton that can display self-replication. Yet first, we will illustrate how cellular automata work with an example.

We will consider a very simple cellular automaton that was popularized in the 1970s, called the Game of Life. Each cell of the two-dimensional grid in the Game of Life has one of two states: alive (black) and dead (white). A sample two-state cellular automaton is shown in the figure below.

A sample two-state cellular automaton. Black cells are in one state, which we think of as alive, and white cells are in the other state, which we think of as dead.

A cellular automaton also includes rules describing how cells can change states from one generation to the next. We first define the Moore neighborhood of a cell as the eight cells that border either an edge or a corner of a given cell (see figure below).

A cell (shown in green) along with the eight cells forming its Moore neighborhood (shown in blue).

The Game of Life rules dictating state changes from one generation of the automaton to the next are summarized as follows. These rules never change, are universal across the entire grid of cells, and are based only on the Moore neighborhood of a given cell.

A: If a cell is alive and has either two or three live neighbors, then it remains alive.

B: If a cell is alive and has zero or one live neighbors, then it dies out.

C: If a cell is alive and has four or more live neighbors, then it dies out.

D: If a cell is dead and has more than or fewer than three live neighbors, then it remains dead.

E: If a cell is dead and has exactly three live neighbors, then it becomes alive.

In the figure below on the left, we label each cell in the automaton previously considered with “A”, “B”, “C”, “D”, or “E”, according to which of the above five rules if falls under. We then apply these rules to produce the next generation of this board, shown on the right.

A sample Game of Life board (left) along with its updated board after one generation (right). We label each cell in the board on the left with the rule label that applies to it when updating that cell. We assume that the board stretches infinitely in every direction and that all cells beyond the boundaries shown are dead.

STOP: Find the next generation of the Game of Life board on the right in the figure above.

Despite the simplicity of the Game of Life rules, we can apply a loose biological meaning to each of them. We call rule A “Propagation”, since it allows life to continue in the next generation if it has enough suitable mates nearby. We call rules B and C “Lack of mates” and “Overpopulation”, respectively, since the cell will die out if it has too few or too many neighbors. We call rule D “Rest in peace”, since in most cases dead cells will not spontaneously regenerate.

If we had only these four rules, then the number of live cells could never increase, since any time that a cell died, there would be no way for it to come back alive. To avoid such a boring cellular automaton, we call rule E “Zombie”, as it allows a cell to arise from the dead if it has exactly three live neighbors. (“Birth” offers a more appropriate, albeit less fun, biological metaphor.)

Even with these biological interpretations, you may be wondering why the Game of Life applies this particular set of rules. Why should the “Propagation” step demand exactly two or three live neighbors? Why not one or two live neighbors? Or two, three, or four live neighbors? What makes these rules interesting?

Stable forms and oscillators

It turns out that most of the two-state cellular automata we could consider are boring. Either the live cells die out quickly, or they propagate outward wildly, without any ultimately interesting behavior.

Some of the initial configurations of the Game of Life are just as predictable. The figure below shows a few stable forms (also known as still lifes), patterns whose cells’ states will not change from one generation to the next.

A few stable forms.

STOP: Carry out the next few generations of the board below. What happens?

The animation below shows three oscillators, or patterns that after a certain number of generations repeat to their original configuration. The period of an oscillator is the number of generations that it takes for the pattern to repeat. The oscillators below have period equal to 2; we will also say that stable forms are oscillators that have period equal to 1.

In fact, 19, 23, 38 and 41 are the only positive integers that are not known to be the period of any oscillator. Below is a more complicated pattern called the “dinner table”, which has period 12.

The curious case of the R pentomino

Most simple patterns for the Game of Life either “die out” or eventually produce an oscillator. Perhaps the Game of Life would have been just a quirky novelty were it not for the curious case of the R pentomino, a five-cell pattern shown in the figure below.

The R pentomino.

Animating the R pentomino provides a surprise. How can something so apparently complicated derive from five simple rules?

The R pentomino animated over 1,200 generations. Five gliders are produced early on, and after several hundred additional generations, a sixth glider is produced. At this point, the board converges to a state of stable forms and period-2 oscillators.

The above animation shows that after over a thousand generations, the interior of the automaton converges to a collection of stable forms and oscillators of period 2. However, in the meantime, six interesting patterns called gliders are produced (five early on and one toward the end of the animation). A glider is shown in the figure below.

A five-cell glider pattern.

When we animate the glider, what appears to be a swimming five-cell automaton emerges, shown in the animation below. Without outside interference, a glider is immortal, swimming off into infinity. Note also that the glider above is very similar to the R pentomino, as it is formed by moving only a single live cell to an adjacent cell.

A lone glider, animated over 50 generations.

The R pentomino is interesting, but at the same time, the growth of the number of live cells for this pattern is finite. After many generations, we are left with six gliders along with a collection of stable forms and oscillators. Is it possible for the growth of an automaton to be infinite?

Gosper’s gun

This question was unresolved until 1970, when Bill Gosper found the following pattern that became known as the Gosper glider gun.

The Gosper glider gun.

The reason why Gosper’s pattern is called a “gun” owes to the fact that when we animate the pattern, it looks as if the two sides of the pattern are communicating in an effort to fire off gliders (see below). As the number of generations increases, the number of gliders grows without bound, and so the Gosper glider gun exhibits infinite growth of cells.

The Game of Life Problem and the case for planning

Although many more amazing patterns have been discovered for the Game of Life, none of these patterns exhibits von Neumann’s desired principle of self-replication. You can view a Gosper gun as a factory that produces gliders one at a time. If the gliders instead could produce copies of themselves, then the gliders would grow not just infinitely but exponentially.

We will place our search for a self-replicating cellular automaton on hold for the current time, since there is much to be learned in being able to implement the Game of Life. We state our work as a computational problem below, which is the most complicated problem we have encountered thus far.

Game of Life Problem

Input: An initial configuration of the Game of Life board initialBoard, and an integer numGens.

Output: All numGens + 1 configurations of this board over numGens generations of the Game of Life, starting with initialBoard.

You may be familiar with the ridiculous Hollywood trope of a hacker who enthusiastically saves the day by hammering away at a keyboard at 100 words per minute. In practice, starting with coding is the last thing that we should do when presented with a challenging computational problem. If we try to organize hundreds of lines of code in our heads without planning in advance, we are doomed to produce buggy, poorly organized code. How, then, should we plan to solve a computational problem?

A Brief Introduction to Top-Down Programming

In this section, we will explore a paradigm for solving computational problems called top-down programming. In this approach, we first write a function solving the problem, assuming any subroutines that we need. We then write these subroutines, assuming any functions that they call as subroutines in turn.

We have already used top-down programming to solve problems. The simplest example was in our exploration of bacterial genomes, when we wrote a function ReverseComplement to take the reverse complement of a DNA string by assuming that we had two simpler functions Reverse and Complement.

ReverseComplement(text)
xReverse(text)
yComplement(x)
return y

We also saw top-down programming in our work with simulating an election, when we simulated many elections by assuming that we had a function SimulateElection that would handle simulating a single election for us.

SimulateMultipleElections(polls, electoralVotes, numTrials, marginOfError)
winCount1 ← 0
winCount2 ← 0
tieCount ← 0
for numTrials total trials
votes1,votes2 SimulateElection(polls, electoralVotes, marginOfError)
if votes1 > votes2
winCount1
winCount1 + 1
else if collegeVotes2 > collegeVotes1
winCount2
winCount2 + 1
else (tie!)
tieCount tieCount + 1
probability1 winCount1/numTrials
probability2
winCount2/numTrials
probabilityTie
tieCount/numTrials
return probability1, probability2, probabilityTie

The SimulateOneElection function in turn called an AddNoise subroutine, which is where the gritty details of the program were hiding in the form of (pseudo)random number generation, which required its own set of subroutine calls.

As we move toward larger programs with many functions, we can visualize how these functions are organized using a function hierarchy, or a network in which functions are connected to subroutines that they call by arrows. In a function hierarchy, the “root” function (i.e., the one at the top of the diagram that has no incoming arrows) solves the computational problem at hand. Function hierarchies are shown in the figure below for the functions SimulateMultipleElections and ReverseComplement.

Function hierarchies for simulating multiple elections (left) and reverse complementing a string (right).

STOP: Review the Game of Life Problem above. Plan the highest-level function in the hierarchy that will solve this problem. What subroutine(s) will be helpful?

Top-Down Programming and the Game of Life

The first function in the Game of Life hierarchy

To solve the Game of Life Problem, we will apply top-down programming to write a concise function PlayGoL. This function will need to take the initial configuration of the board, initialBoard, as well as the number of generations, numGens. To play the Game of Life over numGens generations, we simply need to know how to update a given pattern into its next generation, and apply this approach numGens times.

PlayGoL(initialBoard, numGens)
boards ← array of numGens + 1 game boards
boards[0] ← initialBoard
for every integer i from 1 to numGens
boards
[i] ← UpdateBoard(boards[i–1])
return boards

Note that we have not specified what the type of the Game of Life “game board” should be, and yet we can already write the highest-level function PlayGoL. We already have a notion of an array as a table of values; arrays can also be two-dimensional (or higher-dimensional) as well. In this case, it makes sense to use a two-dimensional array of boolean variables in which true corresponds to “alive” and false corresponds to “dead”.

A bit about arrays

Before continuing with implementing UpdateBoard, we should make a few remarks about working with two-dimensional arrays. Given such an array a, the notation a[r] typically refers to all of the elements in row r of a, and a[r][c] refers to the single element in row r and column c. Continuing 0-based indexing, for an array with n rows and m columns, the rows are indexed from 0 to n-1, and the columns are indexed from 0 to m-1.

A 7 x 4 array (denoted a) with seven rows and four columns. The rows are indexed 0 to 6, and the columns are indexed 0 to 3; as a result, the highlighted element is referred to as a[1][2]. This array can be thought of as a one-dimensional array of length 7, where every element is a one-dimensional array of length 4.

To range over all the rows of an array a with n rows and c columns, we would encounter the following pseudocode.

for every integer r between 0 and n-1
access a[r]

We will think of the row a[r] as a one-dimensional array. To range over all the elements of a[r], we would let an integer c range from 0 to m-1 and access a[r][c]. This provides us with an insight into how to access all of the elements of a: we range over every row r, and then for each value of r we range over every column value c.

for every integer r between 0 and n-1
for every integer c between 0 and m-1
access a[r][c]

Updating a Game of Life board

Now that we know how to range over the elements of a two-dimensional array, we will apply what we have learned to write UpdateBoard. Continuing our top-down design, just as we play the Game of Life one generation at a time, we update a game board one cell at a time. In the pseudocode below, InitializeBoard takes integers numRows and numCols as input and returns a two-dimensional game board in which all values are set to false. (We will write this function when implementing our work in a specific language.)

UpdateBoard(currentBoard)
numRows CountRows(currentBoard)
numColsCountCols(currentBoard)
newBoardInitializeBoard(numRows, numCols)
for every integer r between 0 and numRows – 1
for every integer c between 0 and numCols – 1
newBoard[r][c] ← UpdateCell(currentBoard, r, c)
return newBoard

STOP: Note that UpdateBoard uses subroutines CountRows and CountCols. These functions are usually built into languages, but how could you write them yourself?

Note that we have not yet applied any of the rules of the Game of Life. We have also not handled any of the cells on the boundary of the board. Delaying critical details like this tends to cause introductory programmers unease; isn’t it prudent to focus on the trickiest details first? Yet we will see that these details often wind up being easier than anticipated if we delay them to lower levels of the function hierarchy.

We now will write UpdateCell, which takes a game board currentBoard as input along with integers r and c. It returns true or false depending on whether currentBoard[r][c] will be alive or dead in the next generation, respectively. We will apply the rules of the Game of Life in UpdateCell by calling a subroutine CountLiveNeighbors. Note that we still have not handled the boundaries of the board!

UpdateCell(currentBoard, r, c)
numNeighborsCountLiveNeighbors(currentBoard, r, c)
if currentBoard[r][c] = true (current cell is alive)
if numNeighbors = 2 or numNeighbors = 3 (propagation)
return true
else
(lack of mates/overpopulation)
return false
else
(current cell is dead)
if numNeighbors = 3 (zombie!)
return true
else
(rest in peace)
return false

We are now three levels beneath the PlayGoL function that we will use to solve the Game of Life Problem. We will have solved this problem if we can write CountLiveNeighbors, which returns the number of live neighbors in the Moore neighborhood of currentBoard[r][c].

Counting live cells in a neighborhood

The cells in the Moore neighborhood of the cell in row r and column c, with indices labeling the rows and columns.

The Moore neighborhood of a cell in row r and column c contains cells between rows r-1 and r+1 and between columns c-1 and c+1. To range over these cells, we should apply the same idea that we used when ranging over an entire array: range over the row indices using a for loop, and within this for loop range over the column indices.

CountLiveNeighbors(currentBoard, r, c)
count ← 0
for every integer i between r–1 and r+1
for every integer j between c–1 and c+1
if currentBoard[i][j] = true
countcount + 1
return count

STOP: Unfortunately, this function is wrong. How can we fix it?

Consider the two for loops in the CountLiveNeighbors function as currently written.

for every integer i between r – 1 and r + 1
for every integer j between c – 1 and c + 1

At some point in these for loops, i will be equal to r and j will be equal to c. Yet this is the cell currentBoard[r][c] under consideration! To avoid including a cell in its own Moore neighborhood, we should only consider currentBoard[i][j] in CountLiveNeighbors if this cell is not currentBoard[r][c]; that is, if ir or j c. This produces the updated function shown below.

CountLiveNeighbors(currentBoard, r, c)
count ← 0
for every integer i between r – 1 and r + 1
for every integer j between c – 1 and c + 1
if ir or jc
if currentBoard[i][j] = true
countcount + 1
return count

STOP: This function is still wrong. What else is missing?

You are now permitted to be concerned that we have not handled the boundary of the board. Consider what would happen if we call CountLiveNeighbors for r = 0. Because we range i between r-1 and r+1, i will start as -1, and we will be trying to access cells of the form currentBoard[-1][c], which don’t exist! As a result, our program will encounter an error. We will encounter similar problems if c is equal to 0 or if r or c is equal to their maximum possible values.

It may seem that the issue of handling the cells on the boundary of the board will cause our pseudocode to come crumbling down around us. Will weneed to change the entire function hierarchy?

We will be able to patch the hole in our pseudocode by adding a subroutine. The following function called InField takes currentBoard along with row and column indices i and j; it returns false if the cell currentBoard[i][j] lies outside of the board and true otherwise.

InField(currentBoard, i, j)
numRowsCountRows(currentBoard)
numColsCountCols(currentBoard)
if i < 0 or i > numRows-1 or j < 0 or j > numCols-1
return false
else
return true

STOP: The InField function above tests whether a cell is not in the field, returning false if it is not and true if it survives this test. Rewrite InField by adjusting the if statement to test if currentBoard[i][j] lies within the boundaries of the board; if so, return true, and if not, return false.

Now that we have written InField, we will simply add it asa subroutine call to CountLiveNeighbors to ensure that before we try to access the cell currentBoard[i][j], this cell is within the board.

CountLiveNeighbors(currentBoard, r, c)
count ← 0
for every integer i between r – 1 and r + 1
for every integer j between c – 1 and c + 1
if (ir or jc) and InField(currentBoard, i, j) = true
if currentBoard[i][j] = true
countcount + 1
return count

And just like that, we have solved the Game of Life problem, whose complete function hierarchy is shown in the figure below. One of the drawbacks of top-down design is that because we delay the details until the end of our work, there may be a sense of anticlimax when we have solved the problem at hand.

The function hierarchy for the Game of Life.

Printing Game of Life boards

Top-down design is powerful becauses it leverages modularity of writing code with many functions in an organized manner. Another important strength of code modularity is the division between data generation and data visualization. Beginner programmers sometimes conflate these two concepts, trying to visualize data at the same time as it is being generated. In this case, we can generate the data corresponding to a collection of Game of Life boards, and once these boards are generated, we will handle the visualization of the generations as a separate process.

We will leave more sophisticated image rendering to individual programming languages, but we will point out that languages do share a built-inPrint function that prints a given string to the console. If we wanted to print a collection of Game of Life boards to the console one at a time, then we could proceed top-down again, dividing the task into many calls to a PrintBoard subroutine that prints a single board.

PrintBoards(boards)
for every integer i from 0 to len(boards) - 1
PrintBoard(boards[i])

In turn, the PrintBoard function ranges over all the rows of a given two-dimensional array board, and prints each row by calling a separate PrintRow subroutine. We add a blank line after printing the board so that if we are printing multiple boards, we can distinguish them.

PrintBoard(board)
for every integer j from 0 to CountRows(board) - 1
PrintRow(board[j])
Print(new line)

We now write our function PrintRow, which ranges over every cell in a given row and prints the corresponding cell. It then prints a new line symbol so that if we call PrintRow multiple times as part of printing a board, we can distinguish the rows.

PrintRow(row)
for every integer k from 0 to len(row) - 1
PrintCell(row[k])
Print(new line)

Finally, we must write a function to print a single cell, which simply tests whether the cell is true or false and prints a black square or white square accordingly.

PrintCell(value)
if value = true
Print("⬛")
else
Print("⬜")

To build a more sophisticated visualization and produce an animated GIF for a given initial Game of Life configuration, please join us for the coding lecture linked below. In the meantime, we will return to von Neumann’s original goal of finding a self-replicating cellular automaton. Does such an automaton exist?

Langton’s Loops

A short history of self-replicating automata

In 1952, almost two decades before the discovery of the Gosper gun for the Game of Life, von Neumann discovered a self-replicating cellular automaton. Yet the Game of Life enjoys widespread popularity, while von Neumann’s cellular automaton is largely unknown. Why would this be?

Part of the reason is that the Game of Life exhibits an enormous amount of complex behavior from five very simple rules and while only allowing two states for cells (alive and dead). In contrast, von Neumann’s self-replicating cellular automaton had 29 states and 130,622 cells. This automaton hardly meets von Neumann’s desired simplicity in self-replication; can we find a smaller pattern with fewer states?

This question remained open for over three decades. In the meantime, Edgar Codd produced a self-replicating cellular automaton with only eight states in 1968, but the pattern exhibiting the self-replication had nearly 300 million cells. Only in 1984 did Christopher Langton manage to find an 8-state self-replicating cellular automaton with only 86 cells that has come to be called Langton’s loop. The initial pattern is the header image of this chapter.

Formalizing the rules of an automaton

In what follows, we wish to implement Langton’s loop and visualize its self-replication for ourselves. But it seems silly to start the whole process over. Certainly, some of our work on solving the Game of Life Problem will generalize to another cellular automaton?

Better yet, why don’t we devise an algorithm that implements any two-dimensional cellular automaton? We can state this goal as a computational problem.

Cellular Automaton Problem

Input: An initial configuration of a cellular automaton initialBoard, an integer numGens, and a collection of rules rules for the automaton.

Output: All numGens + 1 configurations of the automaton over numGens generations starting with initialBoard and following the rules indicated in rules.

This chapter focuses on top-down design, but solving the Cellular Automaton Problem will allow us to broaden a solution that we have for one problem when presented with a more general version of the problem. It is common in practice to get a simple version of a program working before attempting to extend it to full generality.

As we move from two states to an arbitrary number of states, we will need to generalize the type of the automaton from a two-dimensional array of boolean variables to a two-dimensional array of integers. This generalization is the easy part, but it is unclear how to represent the rules of a cellular automaton so that there is some set of rules encoding any possible automaton.

STOP: How could we succinctly summarize the rules of the Game of Life? What about the rules of an arbitrary cellular automaton?

One way that we could succinctly represent the rules of a two-state automaton like the Game of Life would be with what is known as BySx format. In this format, y is the number of live neighbors needed for “birth” (i.e., for a dead cell to become alive), and x is the number of live neighbors for “survival” (i.e., for an alive cell to remain alive). The BySx format for the Game of Life is B3S23; if we know that birth can only happen with three live neighbors, and that survival can only happen with two or three live neighbors, then the conditions for a cell dying or remaining dead can be inferred immediately.

Yet in a generalized cellular automaton we will have more states, not to mention rules that may be even more general. A rule may consider not just the number of cells with a given state in the neighborhood of a cell, but their configuration as well. For example, a two-state automaton might update a given live cell with two live neighbors according to the conditions shown in the figure below.

STOP: How can we represent each of the two rules in the figure below succinctly?

(Left) Two different configurations of two live cells in the Moore neighborhood of a given live cell in a hypothetical two-state cellular automaton. (Right) The top rule keeps this cell alive, whereas the bottom rule kills it. The other eight cells in the Moore neighborhood are labeled with question marks since their own neighborhoods are unknown.

To represent all of the different ways that a given cell can be updated based on its Moore neighborhood, we will use a rule string containing ten characters. The first character encodes the state of the cell under consideration. The next eight characters encode the states of the neighbors of the cell (which we will read clockwise from top left), and the final character encodes the new state of the central cell. For example, using “0” to denote “dead” and “1” to denote “alive”, the rule strings in the figure above are represented as “1000010011” and “1001000010”, respectively.

STOP: Interpret the rule string “0111001100”. How many rule strings does a given two-state cellular automaton have? How many different possible two-state cellular automata “games” are there?

There are eight cells in the Moore neighborhood of a given cell. In a two-state cellular automaton, each of these cells could be either alive or dead, with each possible assignment of the two statestrue and false to a cell and its Moore neighborhood corresponding to a different rule. As a result, a given two-state automaton has 2⁹ = 512 total rule strings.

For each rule string, the automaton has one of two outcomes based on whether the cell will be alive or dead in the next generation. Because these choices are independent, we obtain 2⁵¹² total two-state cellular automata, an enormous number (for comparison, there are about 2³⁰⁰ atoms in the known universe). There are undoubtedly many fascinating two-state automata that we will never discover.

STOP: How many different possible k-state cellular automata “games” are there?

von Neumann neighborhoods

We have moved our target from programming a single cellular automaton to programming any of a cosmically large number of automata. And before we undertake this goal, we will make the problem even more general by introducing a new neighborhood used by many automata, including Langton’s loop. In the von Neumann neighborhood of a cell, we will only consider a cell’s four adjacent cells (see figure below).

A cell’s eight-state Moore neighborhood of a cell (left) and its four-state von Neumann neighborhood (right).

We should also adapt our rule strings to accommodate cellular automata with von Neumann neighborhoods. These rule strings will have six elements, where the first and last elements encode the current and updated state of the cell under consideration, respectively. The middle four symbols encode the states in the four cells of the von Neumann neighborhood of the given cell. We will read these cells clockwise, starting with the top neighbor (see figure below).

Two hypothetical rules and their associated rule strings for a two-state cellular automaton whose rules are based on the von Neumann neighborhood of a given cell.

STOP: How many different possible k-state cellular automata are there if we assume a von Neumann neighborhood?

Now that we have made rigorous the notion of rule strings for two types of neighborhoods, we can state an even more general Cellular Automaton Problem.

Cellular Automaton Problem

Input: An initial configuration of a cellular automaton initialBoard, an integer numGens, a string neighborhood indicating the type of neighborhood considered, and a collection of strings ruleStrings indicating the rule strings of the automaton.

Output: All numGens + 1 configurations of this board over numGens generations starting with initialBoard, following the rules indicated in ruleStrings and using the indicated neighborhood.

Implementing an automaton builder

We need to generalize our function hierarchy for PlayGoL to a function hierarchy for a function PlayAutomaton solving the Cellular Automaton Problem. This function will need to take two additional parameters: nbrhood is a string that will either be “vonNeumann” or “Moore”, and ruleStrings is a list (i.e., array) of strings containing the rule strings for the automaton being considered. Using PlayGoL as a model, we obtain the following function, where we pass nbrhood and ruleStrings down to the UpdateBoard subroutine.

PlayAutomaton(initialBoard, numGens, nbrhood, ruleStrings)
boards ← array of numGens + 1 game boards
boards[0] ← initialBoard
for every integer i from 1 to numGens
boards
[i] ← UpdateBoard(boards[i – 1], nbrhood, ruleStrings)
return boards

As for UpdateBoard, we will apply the same idea, passing both of the new parameters down into an appropriate subroutine and delaying details for as long as we can. For this implementation of UpdateBoard, we will only update cells that are on the interior of the board, meaning that we will start row and column indexing at 1 instead of 0 and end it one row/column early. We ignore boundary cells because the rule strings for boundary cells will need to be shorter than rule strings for interior cells; even Langton ignored dealing with boundary cells in his automaton, assuming instead that the board would be large enough.

UpdateBoard(currentBoard, nbrhood, ruleStrings)
numRows CountRows(currentBoard)
numColsCountCols(currentBoard)
newBoardInitializeBoard(numRows, numCols)
for every integer r between 1 and numRows – 2
for every integer c between 1 and numCols – 2
newBoard[r][c] ← UpdateCell(currentBoard, r, c, nbrhood, ruleStrings)
return newBoard

STOP: we could also handle boundary cells by having neighborhoods “wrap around” the sides of the board. For example, cells on the bottom of the board will have cell(s) in the top row as their neighbor. (Note: this is a challenging exercise.)

To update a single cell, we will range through all of the rule strings, looking for the rule string that applies to the cell and its neighborhood. We will check whether a given string matches the cell’s neighborhood by passing the work to a subroutine called RuleMatch. This subroutine will need to know the neighborhood type as well as the given rule we are checking, as well as the current board and the row and column index of the cell under consideration. If we find a matching rule string rule, then we know that its final symbol (occupying position len(rule)-1 of rule) is the state that the current cell should have in the next generation, and so we will return this value.

UpdateCell(currentBoard, r, c, nbrhood, ruleStrings)
for every integer i from 0 to len(ruleStrings) – 1
rule ruleStrings[i]
if RuleMatch(currentBoard, r, c, nbrhood, rule) = true
return
rule[len(rule) – 1]

STOP: How could we have organized the rule strings using a dictionary instead?

We have made our work much more manageable since we have moved from needing to consider many rules to just a single rule in RuleMatch. The format of this rule will differ depending on the neighborhood type, so let’s write a function that simply splits RuleMatch into two subroutines based on the neighborhood that we are considering. We could have made this split much earlier (i.e., higher in the function hierarchy), but it would have meant duplicating much of our work for both von Neumann and Moore neighborhoods unnecessarily.

RuleMatch(currentBoard, r, c, nbrhood, rule)
if nbrhood = "vonNeumann"
return RuleMatchVonNeumann(currentBoard, r, c, rule)
else if nbrhood = "Moore"
return RuleMatchMoore(currentBoard, r, c, rule)

Writing each of these subroutines is an exercise in boredom and in understanding the order in which we consider the elements of a rule string. In either case, if the initial element of a given rule string rule does not match the current cell board[r][c], then we should return false. If we find that any of the next four (von Neumann) or eight (Moore) cells do not match the appropriate next four elements of rule,then we should also return false. Much like InField, if we survive all of these tests, then we can return true.

RuleMatchVonNeumann(board, r, c, rule)
if board[r][c] ≠ rule[0]
return false
if
board[r–1][c] ≠ rule[1] or board[r][c+1] ≠ rule[2] or board[r+1][c] ≠ rule[3] or board[r][c–1] ≠ rule[4]
return false
return true

The RuleMatch function for Moore neighborhoods, which is even more utilitarian, is shown below.

RuleMatchMoore(board, r, c, rule)
if board[r][c] ≠ rule[0]
return false
if
board[r–1][c–1] ≠ rule[1] or board[r–1][c] ≠ rule[2] or board[r–1][c+1] ≠ rule[3] or board[r][c+1] ≠ rule[4] or board[r+1][c+1] ≠ rule[5] or board[r+1][c] ≠ rule[6] or board[r+1][c–1] ≠ rule[7] or board[r][c–1] ≠ rule[8]
return false
return true

We will pause to make the point that it would be easy to incorporate bugs into a function like RuleMatchMoore or RuleMatchVonNeumann because of a typo. One of the many positive benefits of writing well-planned, modular functions is that it is much easier to test and debug a short function than a long one. In this case, you could imagine testing RuleMatchMoore and RuleMatchVonNeumann by giving them some sample rules for a few small 3 x 3 game boards, and ensuring that they return the correct values.

Moreover, when testing your PlayAutomaton code, you should start by testing RuleMatchMoore and RuleMatchVonNeumann since they do not call any subroutines (i.e., they are at the bottom of the function hierarchy), and so any bug must be present within the confines of the function. If you find that the lowest-level functions in the hierarchy are working, then proceed upwards in the hierarchy until you find a function that is not working. In other words, whereas we often program top-down, it is helpful to debug and test our code bottom-up.

Visualizing Langton’s loop

Now that we can build any cellular automaton, what of Langton’s loops? If you are interested in coding all this yourself in Go, join us in the following video. We also provide a GIF of Langton’s loop beneath the video.

You might be surprised that we can implement all this in two hours. Join us and code along!

Take a moment to reflect that this simple pattern took 30 years to discover. Notice also that the cells not only self-replicate, but after a period of time, they enter a “death” state in which they no longer have cells that change state. In this sense also they mimic life.

Langton’s lovely loops.

Conclusion

One of the enduring qualities of Langton’s loop is how lifelike the loops seem. We would not be unreasonable to notice that the loop itself vaguely resembles a cell, or that its three-cell wide outer shell is reminiscent of a biological molecule like DNA that has a two-sided exterior with information represented inside the molecule.

In fact, Langton’s work in the mid-1980s helped launch an entire field of research called “artificial life” that examined whether we can use models to mimic the complex systems of life. For example, could we build a computational model that simulates the conditions on Earth four billion years ago and that produces early life forms only from chemical reactions?

Unfortunately, the aims of artificial life are so grand that most of the work in the field may not be possible. You likely have never heard of this field because after some early excitement, it largely failed to produce any substantive research results and mostly flamed out in the 1990s. Artificial life offers an excellent example of what can happen to a perfectly reasonable area of scientific interest when it becomes oversold and conflated with ideas that it should not include. Artificial intelligence offers a comparable potential example of such a field in the present day, where genuine research at the forefront of the field sometimes suffers because most of what is being sold to the general public as artificial intelligence is either elementary statistics or unadulterated bullshit.

Finally, while philosophizing, we would return to the start of this chapter and the Fermi paradox. Perhaps we have not seen self-replicating space probes not because alien life isn’t out there, but because other civilizations find self-replication just as difficult to achieve as we do. If it took 30 years just to discover a simple self-replicating cellular automaton, maybe there is something magical about the fact that the trillions of cells in your body achieve self-replication every day without fanfare.

Or maybe the resolution to the Fermi paradox is something much simpler. Maybe our lust for exploration makes us a peculiar species. Maybe the aliens know how to build von Neumann probes, but they are too busy enjoying existence to be bothered.

--

--

Phillip Compeau
Programming for Lovers

Associate Teaching Professor and Asst Dept Head for Education in the Computational Biology Department in Carnegie Mellon University’s School of Computer Science