# Coding Stories: Playing with Patterns

Finding an idea that combines artistic beauty with meaningful computation can be challenging. Some ideas lead to evocative visuals but lack a clear narrative of learning and discovering. Other ideas are conceptually meaningful but don’t seem to lend themselves to artistic expression. Finding this connection is often about exploring an interesting pattern. I think this is the case because strong programming involves noticing patterns and the same can be said about successful works of art.

A few weeks ago a student came by my office to ask about enrolling in my advanced computer science class. I asked him what sparked his interest. He told me about a research project he was doing in math class on the Collatz conjecture. He said that he used programing to “prove” the conjecture and he wanted to do more work like this by taking my class. I said that sounded like a great reason to take the course and inquired further about the problem. It turns out that the Collatz conjecture involves following a sequence of numbers that are produced by a simple rule.

`If n is even n->n/2If n is odd  n->3*n+1`

The conjecture’s name comes from Lothar Collatz who made the claim that regardless what number you start with, if you follow the sequence it will eventually generate 1. Unfortunately, it has yet to be proven mathematically. In fact Paul Erdős said about the Collatz conjecture: “Mathematics may not be ready for such problems.” at least according to the wikipedia page. To give you a better sense of the sequence here is an example:

`18, 9, 28, 14, 7, 22, 11, 34, 17, 52, 26, 13, 40, 20, 10, 5, 16, 8, 4, 2, 1, (4, 2, 1) repeating`

I started searching the problem online and realized that it is included in a set of problems that I really enjoy called project Euler. It is listed as problem #14. In this version of the problem it asks you to find the longest sequence that starts with one of the integers between one and one million.

I decided that this would be a good assignment for my CS class. The students were able to develop an algorithm to construct the sequence for each of the million starting numbers. We call this approach brute force because computers can accomplish trillions of calculations every second and therefore what at first appears to be an onerous task is suddenly quite doable. But to get a computer to do this work for you it needs to be written in a language that a computer can understand. Here is the Python code we used.

`def collatz_len(n): count=1 while n!=1: n = n//2 if n%2 == 0 else n*3+1 count+=1 return countmax(collatz_len(n) for n in range (1,10**6))`

Solving a problem using brute force sometimes feels like cheating because the computer is doing all the work. However, as the programmer you need to figure out how to set things up so that the computer can do the work. The function collatz_len() steps through each number in the Collatz sequence and returns the number of steps. Solving challenges like this help you improve your fluency with the computer language. Organizing code ultimately requires clear thinking. It also allows you to look at a problem in new ways. In class we had a debate about the line

`n = n//2 if n%2 == 0 else n*3+1`

I argued that it might be more clear to write it as follows

`if n%2 == 0: n=n//2else: n=n*3+1`

But the students voted to keep the code on one line. There are many different ways to code the same algorithm. But unlike the English language there is only one correct interpretation of what it will do on your computer. Computer code is a very precise form of communication because you can test your interpretation of it against the output. This ability to engage in trial and error and test your ideas is something I find particularly compelling about computer programming.

I often engage students in debates about syntax to push them to be more observant and think deeply about algorithm design. For example, I appreciated that the students used the max() function to identify the longest sequence, but as it is written you don’t know what the starting value is. However, that can easily be fixed as follows

`max((collatz_len(n),n) for n in range (1,10**6))`

The code does find the correct answer to the problem

`(525, 837799)`

Indeed 837,799 produces the longest Collatz sequence in the given range. The problem with the program is that when you run it on Google Colab, a Jupyter notebook server space that I frequently use in class, it takes between 19s and 28s. That does not invalidate the approach but I could not help thinking that we could increase the efficiency of our algorithm.

So we started thinking about shortcuts. This is my favorite part of computational problem solving. Most algorithms can be improved and the key step is to look for patterns. A student had the clever idea to print out some sequences to see if there are patterns. So we revised our code to output more info.

`def collatz(n): while n!=1: print(n,end=”, “) n = n//2 if n%2 == 0 else n*3+1 print(n)for i in range(1,10): collatz(i)OUTPUT:12, 13, 10, 5, 16, 8, 4, 2, 14, 2, 15, 16, 8, 4, 2, 16, 3, 10, 5, 16, 8, 4, 2, 17, 22, 11, 34, 17, 52, 26, 13, 40, 20, 10, 5, 16, 8, 4, 2, 18, 4, 2, 19, 28, 14, 7, 22, 11, 34, 17, 52, 26, 13, 40, 20, 10, 5, 16, 8, 4, 2, 1`

We noticed that the sequence often repeats itself. For example 6 -> 3. The sequence for 3 was already computed previously and the 6 sequence repeats the same string of numbers. Similarly 9 -> 7 which had been computed above. In fact the 9 sequence captures most of the numbers listed above

`9, 28, 14, 7, 22, 11, 34, 17, 52, 26, 13, 40, 20, 10, 5, 16, 8, 4, 2, 1`

The exception is the first 2 numbers in the 6 sequence

`6, 3, 10 … (same as 9 sequence starting at 10)`

On the board we started playing with this idea using graphs. Graphs are a collection of nodes with connections between them called edges. Here is a graph visualization of all the sequences above. I created this visual using a python package called networkx which works very nicely in Google Colab. I could not figure out how to prevent the graph from overlapping itself but you get the idea.

To improve our algorithm we can leverage prior calculations that have already been computed. It turns out that repetition is a very common occurrence in the world of programming and there is a classic way to take advantage of this. Once we calculate the length of a Collatz sequence we can store that value to be used later. This is called caching. We decided to store the sequence lengths in a list where the index of the list is the starting value of the sequence. This is how we initialized the list.

`cache = *10**6`

Now as we compute a new sequence we can check each step to see if the number is already in our cache. Then when it is computed it can be placed in the list. Here are the changes to the code that we made. With this optimization the answer to the problem can be computed in 1 second.

`def collatz_len(num): n=num count=1 while n!=1:  if n<len(cache) and cache[n]: # check if n is in the cache count += cache[n] — 1 # compute count based on cache break n = n//2 if n%2 == 0 else n*3+1 count+=1 cache[num] = count # add the current calculation to the cache return countmax((collatz_len(n),n) for n in range (1,10**6))`

The class was impressed by how much this small change improved the speed of our algorithm. I was excited to share such an important algorithm design idea without even having planned to do so. I would recommend doing more project Euler problems as each one has an important lesson to teach. I was feeling satisfied with the Python program we had created, but it felt like there was more to explore. I looked on wikipedia and I saw some inspirational visualizations of the sequence. I also found a simplification of the sequence. When an odd number maps to 3n+1 that always results in an even number. So the rule can skip a step jumping immediately to (3n+1) / 2

As I thought more about the problem I wondered if I could take an entirely different approach to solving it. In the current implementation I was starting at various numbers and following the sequence until it reached the number 1. Another common approach to solving computational problems is to break problems down into sub-problems. These sub-problems can then be reassembled into the solution. This is sometimes accomplished by working through the solution in reverse. This approach is called dynamic programming. In this case I will apply the Collatz rule in reverse determining which numbers would land on the current number next in the sequence. Since every trip to 1 must go through the number 8 I will begin there.

Proceeding backwards from number 8 you know that the number 16 would get divided by 2 and land on 8. So essentially I am multiplying the current number by 2. Similarly (5*3+1)/2 results in 8. In other words both 16 and 5 will have the number 8 as the next number in the sequence.

From those 2 numbers I can expand out. 10 and 3 both have 5 as the next number in the sequence and similarly 32 leads to 16.

And from there the graph expands. By solving the problem backwards I got a very different perspective on the sequence. Using Javascript, I programmed the following animation of this dynamic programming approach to the problem. Now I had found an aesthetic way to represent the problem that honors the caching of solutions to build up new pathways. I included an easing animation causing the the numbered discs to slide into place. I also decided to align the angles of the rays of doubling numbers that emerge. The distance of a number from the center indicates the number of steps in the sequence it takes to get to that number.

Programming an animation is a fun challenge because it pushes me to clarify my thinking about how I want to visualize the problem and what computational structures I will need to accomplish that goal. In this case I made each node or number an object with a .show() and .update() method. The object also stores the node that it links to so that I could draw the lines. I decided to grow the pattern one layer at a time so that I could space the outer most nodes evenly around a circle.

The amazing thing about visualizations is that they lead to important insights about a problem. Moving backwards through the sequence each number can either generate one number or maximally 2 numbers. In many of the branches it alternates between generating 2 numbers and 1 number. In branches that are divisible by 3 there is only one branch creating a single line of numbers. The beauty of the pattern is not only pleasing to the eye but tells you about the nature of the pattern and perhaps about the nature of numbers themselves.

I spent a fair amount of time watching the pattern grow. I even tried starting at different numbers. For example if you start with the number 5252 which was 19 steps back from 8 you get a very similar pattern of numbers

The Collatz conjecture is a great problem to get thinking deeply about math. It turns out that a number of teachers at my school have introduced this idea in their math classes from 4th to 12th grade. Computation is one tool that can be used to explore mathematical ideas. I find it to be a compelling one. The process of trying to program a computational idea expands my thinking in new and interesting directions. Perhaps through computation and programming we can expand everyone’s love and excitement with mathematical ideas.

--

--