Prologue: Ancient Greek Mathematics and the Foundations of Computational Thinking

Phillip Compeau
Programming for Lovers
36 min readOct 21, 2019

If you enjoy this work, please join my ongoing programming class for charity: https://canvas.instructure.com/enroll/RKFKKP

A Brief Introduction to Algorithms

I want this work chiefly to build fun things with programming, from an election simulator to a self-replicating cellular automaton and from a physics engine modeling gravity to an algorithm building an evolutionary Tree of Life. I have worked to choose examples that are not just fun and approachable to introductory programmers but also feel modern; after all, the concept of “computer science” as its own rigorous discipline arose recently, alongside the advent of modern computers in the middle of the 20th Century. Yet the foundations of computer science, and therefore this work, lie much farther in the past.

Specifically, this prologue will take place 2,300 years ago and center on the stories of two problems solved in ancient Greece. What makes these problems special is not that they are particularly complex, but that their solutions were beautiful, and these solutions led to two of the world’s first “nontrivial algorithms”. You may rightly be wondering what those two words mean, so before we get ahead of ourselves, let us provide some context.

An algorithm is a sequence of steps used to solve a problem. In a very broad sense, an algorithm could be as non-computational as the sequence of loops and knots that you use to tie your shoes, or the steps that you follow to bake your favorite cake. An admittedly lame party trick of the author is an algorithm that allows him to quickly provide the day of the week that anyone’s birthday falls on in the current year.

STOP: How might a human be able to quickly determine the day of the week corresponding to a day and month, without needing to memorize a great deal of information?

For a more quantitative example of an algorithm, long ago you learned a method to divide any two integers by hand, called “long division”. This algorithm, which dates to the 16th Century, exemplifies that many algorithms from before the computational era were developed as an aid to humans to perform computations by hand. Most of us are so ingrained into long division that it may seem strange to think of it or any other arithmetical method as an algorithm.

A particularly cute display of a learned mathematical algorithm.

A mathematician would call a standard algorithm for arithmetic computations like long division “trivial”, an unfortunate word meaning that this algorithm is based upon relatively simple or obvious mathematics. Its antonym, “nontrivial”, is reserved for less evident approaches. An example of a nontrivial algorithm will have to wait for the moment.

For now, say that we would like to compute the greatest common divisor (GCD) of two integers. Although we are currently discussing algorithms that we can complete by hand, computing a GCD is a natural task for us to assign to a computer because computers are much faster at basic computations than we are, especially as the two input numbers grow. Nevertheless, computers can only follow very precise instructions — at least for the time being! So we will need to be exact when it comes time to convert our human ideas into instructions that a computer can understand; that is, program the computer. Therefore, the first step that we will take is to specify exactly what we intend the computer to take as input, and exactly what it should return as its output; this leads us to the following computational problem, which is a specification of input data as well as a desired output that solves the problem on the input data and that can be interpreted in only one way.

GCD Problem

Input: Integers a and b.

Output: The greatest common divisor of a and b, denoted GCD(a, b).

Our goal is to solve this problem, by which we mean developing an algorithm that produces the desired output from any input. Note that there is only one way to interpret the GCD of two integers, and so this is a well-defined computational problem.

The input integers a and b are called input variables, since they can change depending on what values we want them to have. We note that although we chose a and b, the input variables can be named essentially anything that we like. We could just as well have named them x and y, or airplane and rocket.

What we might call the trivial algorithm for computing GCD(a,b), and the one you may have learned as a student, proceeds according to the following steps.

  1. Set our largest known common divisor d equal to 1.
  2. For every integer n between 1 and the minimum of a and b, we ask two questions: Is n a divisor of a? and Is n a divisor of b? If the answer to both of these questions is “Yes”, then update the largest identified common divisor d to be equal to n.
  3. After ranging through all these integers, the current value of d must be GCD(a,b).

The trivial algorithm is also summarized in the table below.

Divisors of 378 (top row) and 273 (bottom row). Common divisors are shown in green, and we can see that the largest such common divisor is 21.

However, for very large integers, this process will quickly become impossible for a human — imagine having to test every possible divisor of two 100-digit numbers — and time-intensive even for a powerful computer.

This point, that computers have limitations, is a critical one. We will discuss the limitations of computers in greater detail soon, but for now, suffice it to say that the better the algorithms that we provide computers with, the faster they can solve our problems.

What does it mean for one algorithm to be “better” than another? Is it possible to find a GCD of two numbers without having to consider all of their divisors? And how do the ancient Greeks fit in to all this? We hope that we have intrigued you sufficiently to read on and find out!

A Painless Introduction to Pseudocode and Control Flow

Finding the minimum of two numbers

We will soon return to the problem of computing a GCD, but for now we should provide more context of how we are going to program a computer. There are dozens of programming languages that allow us to convert our ideas into computer instructions, and each language has its own quirks; a unifying thread of these languages is that they all take time to learn. We want to get started teaching computational ideas as quickly as we can, and so we will place learning a specific language on hold for now. Instead, we will use a way of describing algorithms called pseudocode that has many of the features of programming languages without getting mired in details. It is best to illustrate pseudocode with a simple example in the form of the following computational problem.

Minimum of Two Numbers Problem

Input: Numbers a and b.

Output: The minimum value of a and b.

You might think that this problem cannot possibly be of any computational interest — we simply need to look at the two numbers and see which one is smaller! We will soon see that even finding the minimum value in a collection of numbers can indicate important computational principles.

We will think of an algorithm solving a computational problem as a function that takes in the input variables specified by the problem, processes these variables in some way, and then returns the output solving the problem. This notion of a function parallels the mathematical view of a function, which is commonly taught as a “machine” that takes one or more variables as input and maps the input variables to one or more output values. For example, the mathematical function f(x,y) = (x + 3, y − 4, x · y) takes two input variables and returns a triple of values.

For the Minimum of Two Numbers Problem, we are looking for a function that we will call Min2 that takes two input number variables a and b and returns their minimum. A seminal programming tool is the ability to test a condition and branch a computer program into different possibilities based on this condition. In this case, we want to test if a is smaller, in which case we want to return it; otherwise, we want to return b. We can represent such a test in pseudocode by an if statement, where we take one action if a statement is true and another action if it is not. For example, below is pseudocode for the function Min2.

Min2(a, b)
if a > b
return b
else
return a

The first line of Min2 names the function and names the input variables. The second line tests if a is larger than b. If this is true, then the program enters line 3 and returns b. An important point is that the program will now terminate, as a program ends as soon as it encounters a return statement. On the other hand, if a is not larger than b, then the program does not execute line 3 but instead continues to line 4, where the else indicates that we should enter the indented line following the else. Since a is not larger than b, we know that a is less than or equal to b, and so we can safely return a(and halt the program).

STOP: Does Min2 still return the desired answer if a and b are equal?

The if statement in Min2 offers an instance of a commonly occurring way of writing if statements, which we represent below. The following hypothetical function first executes instructions A. If condition X is true, then the function executes instructions Y (called the if block); if the function doesn’t encounter a return statement, then it continues onward to execute instructions B. On the other hand, if condition X is false, then the function executes instructions Z (called the else block); if it doesn’t encounter a return statement in those instructions, then it executes instructions B.

SomeFunction(parameters)
execute instructions A
if condition X is true
execute instructions Y
else
execute instructions Z
execute instruction B

Although Min2 is a simple function, it allows us to emphasize control flow, or the sequence of steps that we take through a function. Understanding control flow is central to a mastery of computer programming because computer programs can only execute a single instruction at a time (more on this much later in this work). In fact, implementing more complicated algorithms in this book will require us to understand the control flow of a program with hundreds of lines.

You may also be curious why certain words are typeset in bold in pseudocode. These words are called keywords; in most programming languages, they have reserved meanings. For example, a programming language would never let us use the word if to represent a variable, or return as a function name.

Subroutines: functions as building blocks

We will now tweak our ongoing problem by adding just a single input variable; try writing a function in pseudocode to solve it!

Minimum of Three Numbers Problem

Input: Numbers a, b, and c.

Output: The minimum value of a, b, and c.

We provide a solution to the preceding exercise below in the form of a function Min3. If the control flow of this function is unclear, don’t worry — we will walk through it line by line.

Min3(a, b, c)
if a < b
if a < c
return a
else
return
c
else
if b < c
return b
else
return c

After line 1 names the function and specifies the three input variables, line 2 provides an if statement testing whether a or b is smaller.

STOP: If a < b, then what can we conclude about the minimum of a, b, and c?

If a is smaller than b, then we can immediately deduce that b cannot have the minimum value of a, b, and c. As a result, we only need to check whether a and c are smaller: the result must be the minimum of all three parameters. Lines 3–6 of Min3 represent the if block; in this block, we check which of a and c is smaller and return the smaller value:

        if a < c
return a
else
return
c

These lines represent a nested if statement within another if block in the Min3 function. This example indicates how powerful branching can be within a computer, since we can produce as many different possibilities as we like.

On the other hand, if a is not smaller than b, then we know that the minimum value must be contained by b or c. If this is the case, then we enter the else block on lines 8–11 of Min3 and return the smaller of b and c:

        if b < c
return b
else
return c

This is a perfectly reasonable way to write a function finding the minimum of three numbers, but we point out that it has become somewhat involved for what should be such a simple task. We also would ask you to look again at the if block and the else block in Min3; what do they remind you of?

Part of the reason why we have separately discussed the if block and the else block in Min3 is that they are both performing the same task, that of finding the minimum of two numbers. But we have already completed this task when we wrote Min2! How can we use what we have already done?

A fundamental programming paradigm is the ability to use functions as building blocks to construct more complicated functions. Our ongoing example illustrates this concept well. We know that Min3 needs to know the minimum of two numbers in the two highlighted blocks above. We can instead greatly shorten Min3 by using the value returned by the function Min2 . In general, a function that occurs within the body of another function is known as a subroutine, and in this case we say that we call Min2 as a subroutine.

Min3(a,b,c)
if a < b
return Min2(a,c)
else
return Min2
(b,c)

We should take a moment to explain how the control flow works when one function calls another as a subroutine. Say that we enter the if block in Min3 and reach the following line:

        return Min2(a,c)

Although a function would normally immediately stop at a “return” statement, Min3 must wait when it encounters the call to Min2. In the meantime, the control flow passes to Min2, with input variables of a and c. If a is smaller, then this subroutine returns the value of a; otherwise, it returns the value of c. Only when this subroutine returns the value are we able to use this value and continue on in Min3. In this case, this value is used in the return statement and Min3 terminates.

STOP: Write a function in pseudocode to find the minimum values of four input number variables. How can we use any of the functions that we’ve already written as a subroutine? If you’re up for a challenge, try writing your function using only two lines.

One answer to the previous problem is to note that the minimum of four numbers a, b, c, and d is equal to first considering the minimum of three of a, b, and c, and then comparing the result to d. After this comparison, the smaller of the two numbers must be the minimum of a, b, c, and d.

Min4(a, b, c, d)
return Min2(Min3(a, b, c), d)

The Doomsday algorithm, and else if

Before continuing, we will use the author’s party trick from the beginning of the chapter to introduce an important variant of the if statement. When students are asked how they think this algorithm might work, one of their most common ideas is to memorize the first day of the year, and then count how many days have elapsed between this first day and a given birthday. The problem with this approach is that although it would not take long to compute the day of the week for a birthday in January or February, birthdays later in the year would become intensive to calculate.

Students then often say that we could memorize the day of the week for the first day of every month, and then use this information to count forward in a given month. For example, if April 1st falls on a Sunday, then we would automatically know that the 8th, 15th, 22nd, and 29th also fall on Sundays, so that it would be easy to compute that April 24th falls on a Tuesday by counting two days ahead from the 22nd.

This idea is on the right track, but there is still an inefficiency with this approach, which is that the first day of every month will fall on different days of the week, meaning that we have to remember quite a lot of information in any given year. The trick is to memorize a day in each month such that all these days always fall on the same day of the week.

  • January 3rd (or January 4th in leap years)
  • February 28th (or February 29th in leap years)
  • March 0th (we pretend that this day exists)
  • April 4th
  • May 9th
  • June 6th
  • July 11th
  • August 8th
  • September 5th
  • October 10th
  • November 7th
  • December 12th

These days are called, with an unnecessary sense of foreboding, doomsdays. In 2019, they all fall on a Thursday (in 2018, they fell on a Wednesday).

These particular doomsdays are chosen because they can be remembered very quickly. We remember January 3rd, but February 28th and March 0th correspond to the same day. The five other even-numbered months (April, June, etc.) have the same day as the number of the month as the doomsday (4/4, 6/6, 8/8, 10/10, 12/12). As for the remaining four days, they can be remembered via the mnemonic device “I work 9-to-5 at 7–11” (yielding 5/9, 9/5, 7/11, and 11/7). As a result, we only need to memorize four pieces of information: the January and February dates, the even-month rule, and the mnemonic.

The corresponding algorithm for finding the day of the week for a given day and month is called the Doomsday algorithm. An abbreviated pseudocode version of this algorithm is shown below.

Doomsday2019(day, month)
if month = 1
if day = 3, 10, 17, 24, or 31
return "Thursday"
else
if
day = 4, 11, 18, or 25
return "Friday"
else
(and so on...)
else
if
month = 2
if day = 7, 14, 21, or 28
return "Thursday"
else
if
day = 1, 8, 15, or 22
return "Friday"
else
(and so on...)
else
if
month = 3
(and so on...)

This function is undoubtedly ugly. You will note that the test for whether the month variable is equal to 2 is buried within the first else statement, and the test for whether month is equal to 3 is buried one level lower. In practice, we don’t need the “else” statements, because if month is not equal to 1, we can automatically proceed past the if block and test if month is equal to 2. This reasoning leads us to the following streamlined version of Doomsday2019.

Doomsday2019(day,month)
if month = 1
if day = 3, 10, 17, 24, or 31
return "Thursday"
if day = 4, 11, 18, or 25
return "Friday"
(and so on...)
if month = 2
if day = 7, 14, 21, or 28
return "Thursday"
if day = 1, 8, 15, or 22
return "Friday"
(and so on...)
if month = 3
(and so on...)

In practice, however, if we have 12 different possibilities to check, we will typically indent them all at the same level and use the keywords else if for all 11 possibilities after the first. This makes our (pseudo)code much more readable as we can imagine jumping directly to the block corresponding to the value month, and within that block jumping straight to the block corresponding to the value of day.

Doomsday2019(day,month)
if month = 1
if day = 3, 10, 17, 24, or 31
return "Thursday"
else if day = 4, 11, 18, or 25
return "Friday"
(and so on...)
else if month = 2
if day = 7, 14, 21, or 28
return "Thursday"
else if day = 1, 8, 15, or 22
return "Friday"
(and so on...)
else if month = 3
(and so on...)

Loops and a Trivial GCD Algorithm

While loops, intermediate variables, and a factorial function

To write a function to find the GCD of two numbers, we call your attention to step 2 of our proposed algorithm, which starts “For every integer n between 1 and the minimum of a and b.” How can we perform a task “for every integer” in a range? To implement this idea in pseudocode, we will first solve a simpler problem.

Factorial Problem

Input: An integer n.

Output: n! = n·(n−1)·(n−2)···2·1.

The following algorithm, which we will explain line-by-line in what follows, solves the Factorial Problem. It makes use of a control structure called a while loop.

Factorial(n)
p ← 1
i ← 1
while i n
p
p·i
i
i+1
return p

Before describing the while loop, let us first focus on lines 2 and 3 of Factorial:

    p ← 1
i ← 1

We have thus far worked with variables that are given as input to a function, which are sometimes called parameters. However, we can also introduce new variables as part of instructions in a function. On the above two lines, we “declare” two variables, p and i, both having value 1. We read each of these lines like an equation, in which “” is a variant on the equals symbol; the variable on the left side of “” receives the value of the right side of the equation. Once we have declared these two variables, we can use them: p will store the product of n numbers that we are building, and i will help us keep track of how many numbers we have multiplied thus far.

Now we will discuss the while loop, which in general takes the following form:

    while condition X is true
execute instructions Y
execute instructions B

The statement while X is much like if X, with one small difference. In an if statement, if X is true, then we enter the if block, and then continue on with the program. In a while loop, if X is true, then after completing the while block and executing instructions Y, we return to test X again. If X is true, then we enter the while block again. We only stop entering the while block and continue on with the rest of the function (i.e., instructions B) when X stops being true.

In the case of Factorial, say that our input variable n is equal to 4. We know that p and i both start equal to 1, and so when we reach while i n, this statement is true and we enter the while block. Within the while block, we encounter the following instructions:

        p p·i
i
i+1

These statements may seem perplexing — how can we set a variable equal to itself plus one?

We should clarify how we read a variable assignment with . On the right side of , we are merely accessing the values of any variables we encounter; we then set the variable on the right side of equal to the resulting value. In this case, when p and i are both 1, we are setting p equal to 1 · 1 = 1, and we are setting i equal to 1+1=2.

We then check again if i is at most n; it is, and so we enter the while block again, setting p equal to 1·2 = 2 and i equal to 2+1 = 3. We continue in this fashion, as summarized in the table below, until p = 24 and i = 5; at this point, it is no longer true that i n, and rather than entering the while block, we skip the while block and continue on in the function. All that is left to do is return the value of p, which has the (correct) value 24 since we have multiplied 1 by every number between 1 and 4.

A table indicating the result of updating variables as we proceed through Factorial for the input value n = 4.
A table indicating the result of updating variables as we proceed through Factorial for the input value n = 4. In each row, we summarize the current values of p and i, then check if i is less than or equal to n. If so, we update p and i according to the formulas within the while block in Factorial. Once i is equal to 5, we no longer enter the while loop and return the current value of p, which is 24.

STOP: What would happen if we forgot to add the line i i + 1 to the Factorial function?

The answer to this exercise is quite important, so we discuss it here. If we neglected to add the line i i + 1, then the test i n would always be true. As a result, we would continue entering the while block over and over again, never hitting a return statement, and the algorithm would never terminate! Such a loop is called an infinite loop and is a common error; we need to exercise caution when writing loops.

For loops and another way of writing a factorial function

The concept of initializing a variable to 1 and performing some action for every value of this variable up to some upper bound appears frequently in programming. As a result, we commonly use a special loop structure called a for loop that can be easier to work with in practice. An example of a for loop is shown in the modified factorial function below.

AnotherFactorial(n)
p←1
for every integer i between 1 and n
p
p·i
return p

After initializing p equal to 1, the for loop states that we are going to perform an action for every value of i between 1 and n, inclusively. This means that the first time through the for loop, i will be equal to 1, and we will increase i every time we perform the action inside the for block, which in this case just consists of setting p equal to the product of the values of p and i. We only stop entering the for block and move on to return p once we have performed the instructions in the for block n times, which means that the steps taken by AnotherFactorial are the same as those taken by Factorial from above. However, AnotherFactorial is a bit easier to parse, not to mention that this new function avoids the possibility of the infinite loop that we demonstrated before because we explicitly assume that i increases by 1 each time through the loop.

In general, a for loop takes the following form, where we execute the instructions Y contained within the for block for every value of a variable satisfying some condition X:

    for every value of some variable satisfying condition X
execute instructions Y

Although for loops make ranging through a given collection of variables convenient, we should note that while loops can be applied to a wider class of functions since they can be applied to test any statement. For example, we need a while loop to conceptualize the following function.

PittsburghFebruary()
while current temperature is below freezing
daydream about moving south

Returning to the trivial GCD algorithm

We have now learned enough about pseudocode to return to the GCD Problem and implement the “trivial” GCD algorithm that we illustrated earlier in this chapter.

STOP: Take a shot at writing pseudocode for a function TrivialGCD that takes two positive integers a and b as input and returns their GCD (you may assume any subroutines that you like).

The pseudocode below implements the trivial GCD algorithm, for which it is appropriate to use a for loop. Note that this function allows us to reuse our Min2 function as a subroutine.

TrivialGCD(a,b)
d←1
m Min2(a,b)
for every integer p between 1 and m
if p is a divisor of both a and b
d
p
return d

We can see how this pseudocode clearly implements the sequence of steps that we outlined when describing the trivial GCD algorithm in English. All that remains is to discuss how to test whether one number is a divisor of another.

Given integers n and p, the integer division of n by p is performed by keeping only the integer part of the division and “throwing away” the remainder. For example, if n=14 and p=3, then the integer division of n by p is 4 (and the remainder is 2).

STOP: How can we use the remainder of integer division to test when p is a divisor of a?

STOP: Write pseudocode for functions IntegerDivision(n, p) and Remainder(n, p) corresponding to the integer division and remainder of n divided by p, with your only allowable arithmetic functions being addition, subtraction, and multiplication. Hint: use IntegerDivision as a subroutine when writing Remainder.

Euclid’s Insight Leads to the World’s First Nontrivial Algorithm

Proving Euclid’s Theorem

If you are fortunate enough to be familiar with Euclid, it is because of his work in laying the foundations for much of what we know as introductory geometry 2,300 years ago. However, Euclid also devised an ingenious method for computing GCDs. In part to aid with calculations by hand, Euclid’s idea was to avoid having to compute a GCD by instead reducing the problem to an easier computation. We state his observation as a theorem, or a mathematical statement that can be proven definitively.

Euclid’s Theorem. If a>b, then GCD(a,b) = GCD(ab, b).

There are two immediate questions that we might have about Euclid’s theorem. First, how can we demonstrate that this statement is true for all possible pairs of integers? Second, why do we really care whether it is true? That is, how can it help us implement an algorithm for computing the GCD of a and b? We will address these questions one at a time.

Rather than demonstrating that GCD(a, b) is equal to GCD(a b, b) when a > b, we will prove a different statement, that the set of divisors shared by a and b must be the same as the set of divisors shared by a b and b. If these two sets are the same, then certainly the largest element in the sets, which are GCD(a,b) and GCD(a b, b), must be the same as well.

It may seem that we have replaced one challenging task (proving that two pairs of numbers have the same largest common divisor) with a more challenging one (proving that the two pairs of numbers share all of the same divisors)! Yet we can prove this seemingly harder statement if we prove the following two statements:

  1. Any shared divisor of a and b must also be a divisor of a b.
  2. Any shared divisor of a b and b must also be a divisor of a.

To prove the first statement, assume that some arbitrary integer (which we will call d) is a shared divisor of a and b; we will demonstrate that d must also be a divisor of a b. Because d is a divisor of both a and b, there must be some integers k and m such that k·d = a and m·d = b. Subtracting the two equations gives us that k · d m · d = a b. Factoring a d from the left side yields that (k m) · d = a b, and so d is also a divisor of a b, which is what we wanted to demonstrate. We assumed nothing about d other than that is was a shared divisor of a and b, so we can conclude that every such shared divisor is also a divisor of a b.

To prove the second statement, assume that some integer e is a shared divisor of a b and b; we will demonstrate that e must also be a divisor of a. As before, there must be some integers p and q such that p·e = ab and q·e = b. Certainly a is equal to (ab)+b; so if we plug in our values for ab and b, we obtain that a = p·e+q·e = (p+qe. Thus, e must be a divisor of a as well.

We can now safely conclude that the set of divisors of a and b is the same as the set of divisors of a b and b. Therefore, GCD(a, b) = GCD(a b, b), which is what we wanted to demonstrate.

Euclid’s algorithm for computing a GCD

Now that we have proven Euclid’s theorem, we move to our second question — how is this old piece of mathematics helpful for designing an algorithm to compute a GCD? Note before we continue that Euclid’s theorem equally applies to the case that b > a, where GCD(a,b) = GCD(a,ba). The idea of our updated algorithm, which we will call Euclid’s algorithm, is therefore to continually apply the statement of the theorem to obtain a GCD of smaller and smaller numbers.

We will illustrate Euclid’s algorithm with an example; say that our two input numbers are 378 and 273. We can now repeatedly apply Euclid’s theorem to find that the GCD of these two numbers is 21, as shown below:

This example illustrates why Euclid’s algorithm is considered “nontrivial”. It prevents us from having to consider every possible integer up to the minimum of a and b; instead, we just make a handful of subtractions to obtain that the GCD is 21. We can also see how this algorithm generalizes for any input values a and b. We repeatedly apply the theorem until we obtain the same two input numbers x on the right side, since GCD(x,x) is always equal to x.

You might be wondering what the big deal is: Euclid’s algorithm was undoubt- edly helpful to the ancient Greeks to compute GCDs by hand, but aren’t computers lightning-fast super-machines? Why does it matter what approach we use when programming a computer to find a GCD? We will allow you to time the two approaches and see that the speed-up provided by Euclid’s algorithm to compute the GCD of a pair of very large numbers is striking. For now, we would point you to a short lecture on the importance of speed from computer science pioneer Grace Hopper.

STOP: Brainstorm how we could write a function in pseudocode to compute the GCD of two numbers by repeatedly applying Euclid’s Theorem.

Our pseudocode implementation of Euclid’s algorithm, which we call EuclidGCD, is shown below and illustrated in the table following. Because we are repeating the same process of applying Euclid’s theorem multiple times, a while loop will be helpful. We want the algorithm to continue until a is equal to b, so we will use while a b as our loop. Inside the while block, we replace the larger of a or b with the positive difference between the two values. After the while loop is complete, we know that a is equal to b, and so we can return a, which must be the GCD of the original two values.

EuclidGCD(a,b)
while a b
if a > b
a
a b
else
b b a
return a
A table indicating how two sample values of a and b are updated during EuclidGCD.

STOP: If we replaced the final line in EuclidGCD by returning b instead, how would it change the result of the algorithm?

Eratosthenes, Primes, and the World’s Second Nontrivial Algorithm

Who first computed the circumference of the Earth? When I ask this question in my class, we lament that at least one of my students will cheerily reply, “Christopher Columbus”! Yet it is rare to find someone who knows the correct answer: Eratosthenes of Cyrene, a Greek polymath living in Alexandria, who in the 3rd Century BCE devised an ingenious experiment. No one has better or more simply explained Eratosthenes’ work than the late Carl Sagan, and so we leave this task to him.

A trivial algorithm for finding prime numbers

Although this is a compelling story, and one that is not taught often enough, the reason why we have introduced you to Eratosthenes is because of another of his interests: finding prime numbers. A reminder that a positive integer larger than 1 is prime if its only divisors are 1 and itself, and the non-prime integers larger than 1 are called composite. The ancient Greeks were perennially looking for order in the world around them, and yet the primes stubbornly refused to follow any discernible pattern. What is special about the number 1423 that makes it prime, while 1421 is kept out of the club?

Eratosthenes’ desire to categorize the prime numbers is a natural one to transform into a computational problem. We state the following problem for a single input integer.

Prime Number Problem

Input: An integer n.

Output: “Yes” if n is prime, and “No” otherwise.

The Prime Number Problem is an example of a decision problem, which is a computational problem that always returns a “Yes”/“No” answer. Decision problems may seem simple, but we will see later in this work that they lie at computer science’s dark heart. For now, we note that we will use the special keywords true and false to represent “Yes” and “No”. These keywords can be returned or assigned as values to a variable; a variable that can take either true or false as a value is called a boolean variable.

The following pseudocode uses the true/false keywords to solve the Prime Number Problem.

IsPrime(n)
for every integer p between 2 and n − 1
if p is a divisor of n
return false
return true

STOP: Say that we were to change the first line of the IsPrime function body to the following: for every integer p between 1 and n − 1. How would this change the IsPrime function? What if we change this line to for every integer p between 2 and n instead?

We can then run this algorithm for larger and larger input values, and in this way find as many primes as we like (see table below). However, this method would certainly be called the trivial algorithm for finding primes. Can we find a faster prime-finding algorithm?

A table of the positive integers up to n = 11; each number p is associated with “true” if p is prime and “false” otherwise.

One way to improve the trivial algorithm for finding prime numbers is to make a tweak to IsPrime. If a number x is a divisor of n, then we automatically know that there is some other y such that xy = n. If x < y, then we would always conclude that n is composite when considering x, before we need to consider its “partner” factor y. We also know that since x < y, the largest that x could be is √n. As a result, we don’t need to consider all possible divisors of n up to n − 1; it suffices to stop at √n, and we can therefore revise IsPrime as follows.

IsPrime(n)
for every integer p between 2 and √n
if p is a divisor of n
return false
return true

STOP: Walk through the control flow of IsPrime for each integer n between 2 and 11.

Although this insight is an improvement to our approach, Eratosthenes’ idea was more clever. Yet before we describe his algorithm, which we call “the world’s second nontrivial algorithm”, we will first ask ourselves how we know that the prime numbers really do go on forever. After all, if we were to determine that there is some largest prime after which there are no more primes, then every number after it must be composite, and our hunt for primes would be complete.

It is rare to find someone who believes that the prime numbers eventually stop, but this idea is not such a radical one. If we count the prime numbers that we find up to a given point, we see that they become rarer and rarer as the integers become larger (see figure below). What is to say that we don’t eventually run out of primes? And why, if your teacher told you that the primes go on forever, did you believe them?

A plot of integers n on the x-axis against the number of primes between 2 and n on the y-axis. The effect is subtle, but the curve has started to flatten out a bit, so that as integers get larger, the chance of finding a prime decreases.

The infinitude of the prime numbers

The proof that there are in fact infinitely many prime numbers is another one of Euclid’s gems, a result so beautiful that we must show it in case you have not seen it. (If you have seen this proof, feel free to skip this section.)

Before proving that there are infinitely many primes, we state a mathematical fact that we will use soon.

Fact: Every composite integer larger than 1 has at least one prime factor.

Why is this fact true? Consider any composite integer n; because n is composite, it has factors other than itself and 1. Consider the smallest of these factors, which we call p. There is no way that p can be composite, since if it were, it would itself have a divisor other than itself and 1, which would also be a factor of n. This cannot be true since p is the smallest factor of n.

You will see where this fact becomes useful as we replicate Euclid’s argument in the following theorem, which is just as beautiful as his first.

Theorem. There are infinitely many prime numbers.

To prove this theorem, we will employ an approach called proof by contradiction, in which we assume the opposite of what we want to demonstrate and show that it must be false. In a later chapter, we will see how proof by contradiction is vital when we travel to the aforementioned dark heart of computer science. In the case of the primes, the opposite of what we want to demonstrate is that there are finitely many prime numbers. How can we demonstrate that this is false?

Since we are assuming that there are finitely many primes, we can use the variable n to refer to the total number of these primes. Furthermore, we can label all of these primes using the notation p1 , p2 , . . . , pn . Consider the number q formed by multiplying all of the primes together:

q = p1 · p2 · · · · · pn

STOP: Is q composite? Why?

Certainly, q is composite because all of the primes pi must be divisors of q. However, consider the number p that is one more than q:

p = q+1 = (p1 · p2 ····· pn)+1

STOP: Is p composite? Why or why not?

Because p is clearly larger than all of the finitely many primes pi, it must be composite. Yet think about what happens if we divide each prime into p. When we divide p1 into p, we obtain p2 p3 · · · pn with a remainder of 1. When we divide p2 into p, we obtain p1 p3 p4 ··· pn with a remainder of 1. And so on; any prime pi that we divide into p leaves a remainder of 1.

In short, none of the finitely many primes pi is a divisor of p. But according to the fact before this theorem, this cannot be true if p is composite. We can therefore conclude that p is prime! Since p is clearly larger than any of the primes pi, we have obtained our contradiction to there being only n primes, and we can conclude that the primes are infinite.

Factorials, arrays, and Eratosthenes’ insight

Now that we know that the primes do march on forever, we return to Eratosthenes and his desire to find primes quickly. To illustrate his insight, we will use a simpler example.

In a previous section, we wrote a factorial function. Say that we have used AnotherFactorial to compute 100!, and someone asks us what 101! is. We should feel silly if we had to go start the algorithm from scratch; after all, we have already computed 100! —we can simply multiply 100! by 101. In other words, it makes more sense to store intermediate factorials that we obtain in a table rather than discarding them along the way.

In general, a table or ordered list of n variables is called an array of length n. If the name of our array is a, then the first variable in the array is denoted a[0], the second variable is denoted a[1], and so on. The numbers 0, 1, and so on are called the indices of the array. You have probably noticed that we use 0-based indexing of the array, in which we start counting indices at 0 instead of 1, and which has been adopted by most modern programming languages.

STOP: What is the final index of an array of length n?

For the factorial example, we want a[0] to equal 0! = 1, a[1] to equal 1! = 1, a[2] to equal 2!=2, a[3] to equal 3!=6, and so on, up until a[n]=n!. Note that the length of this array is n + 1. We can state the generation of this array as a computational problem.

Factorial Array Problem

Input: An integer n.

Output: An array of length n + 1 containing the values 0!, 1!, 2!, . . . , n!

FactorialArray(n)
a ← array of length n
a
[0]←1
for every integer k between 1 and n
a
[k] ← a[k−1]·k
return a

STOP: The Fibonacci numbers are the classical sequence of integers given by (1,1,2,3,5,8,13,21,…), in which the first two members of the sequence are equal to 1, and every subsequent number is equal to the sum of the two preceding numbers. Write a function FibonacciArray that takes an integer n and returns an array containing the first n Fibonacci numbers.

We can now state our problem of finding primes as an array problem.

Prime Number Array Problem

Input: An integer n.

Output: An array primeBooleans of length n + 1 such that for every positive integer pn, primeBooleans[p] is true if p is prime and primeBooleans[p] is false otherwise.

The following pseudocode implements the trivial algorithm for prime finding to solve the Prime Number Array Problem. Note that this function calls IsPrime as a subroutine.

TrivialPrimeFinder(n)
primeBooleans ← array of n + 1 false boolean variables
for every integer p from 2 to n
if IsPrime(p) is true
primeBooleans[p] ← true
return
primeBooleans

Much like our reasoning with FactorialArray, the duplication of effort in TrivialPrimeFinder is the motivation behindEratosthenes’s idea for finding primes. The inefficiency in the trivial prime-finding approach is that once we find that a number is prime, we can automatically conclude that all multiples of this number are composite. For example, once we have determined that 1423 is prime, we automatically know that 2 · 1423 = 2846, 3 · 1423 = 4269,
4 · 1423 = 5692, and so on are all composite. In other words, every multiple of a prime number is composite. The generalization of this idea to an algorithm is called the Sieve of Eratosthenes, which is illustrated in the figure below and sometimes still taught to students as a way of finding primes by hand.

An animation of the Sieve of Eratosthenes for n = 120. Courtesy: https://en.wikipedia.org/wiki/Sieve_of_Eratosthenes#/media/File:Sieve_of_Eratosthenes_animation.gif

In this example, our largest number is n = 120. We start with p = 2 and cross off all multiples of 2, immediately concluding that almost half of the numbers considered are composite. We then look at the next number, p = 3, which is prime, and so we cross off all multiples of 3. Since p = 4 is composite, we skip it, as multiples of 4 will have already been labeled composite since they are also multiples of 2. In this manner, we cross off multiples of 5 and 7. Since 8, 9, and 10 are composite, we do not need to cross off their multiples. The next prime number would be 11, but note that 11 is larger than √120 = 10.954… The multiples of this number stop at 11 · 10 = 110, and therefore will already have been labeled composite in a previous step. We can therefore halt the process and look at all remaining numbers that have not been crossed off; these numbers (11, 13, 17, 19, etc.) must all be prime.

The following pseudocode implementing the Sieve of Eratosthenes starts by assuming that every number (other than 1) is prime. At each step, it looks for the smallest prime that we have not already identified, and then crosses off all multiples of this prime (for which it employs a subroutine).

SieveOfEratosthenes(n)
primeBooleans ← array of n + 1 true boolean variables
primeBooleans[0] ← false
primeBooleans[1] ← false
for every integer p between 2 and √n
if primeBooleans[p] = true
primeBooleans CrossOffMultiples(primeBooleans,p)
return primeBooleans
CrossOffMultiples(primeBooleans,p)
for every multiple k of p (from 2p to n)
primeBooleans[k] ← false
return
primeBooleans

STOP: It is possible to make SieveOfEratosthenes even faster. If you are mathematically inclined, see if you can see how. (Hint: do we ever cross off integers that have already been crossed off?)

Forming a list of prime numbers

We now have two prime finding algorithms. Yet if someone were to ask us what the prime numbers up to some positive integer n are, we would currently only be able to give them an array of boolean variables, where a variable is true if the number is prime and false otherwise. It would be better to provide a list of the integers that are prime.

The list of primes can be represented by an array primes. Although there are ways of estimating the number of primes up to and including n, we will not know how long this array should be in advance. Instead, we will start primes as having no elements; we will then append every prime number p found to the end of the array. After this process, primes will store the prime numbers between 2 and n in increasing order. The following pseudocode for a function ListPrimes implements this idea, using SieveOfEratosthenes as a subroutine as well as a function calledappend that is built into many languages. The notation primesappend(primes, p) indicates that we take an array primes and an integer p as input, append p to primes, and then return this array, which we use to update primes.

ListPrimes(n)
primes ← array of integers of length 0
primeBooleansSieveOfEratosthenes(n)
for every integer p from 0 to n
if primeBooleans[p] = true
primesappend(primes, p)
return primes

Timing our two prime-finding algorithms

You may still be skeptical that two algorithms for something as straightforward as finding prime numbers could really be different in terms of performance. So we have already implemented TrivialPrimeFinder and SieveOfEratosthenes for you (in Go) for a variety of different input values of n, and we have compiled the results in the table below. Each row of this table also contains the speedup obtained from running SieveOfEratosthenes, which is the ratio of the running time of TrivialPrimeFinder against this faster approach.

Running times for TrivialPrimeFinder and SieveOfEratosthenes for a few different input values of n, along with the speedups provided for these values. Functions were implemented in Go 1.11 and run on a mid-2012 Macbook Pro with an Intel Core i5 Processor.

You will notice a manifestation of a general rule in the table above, which is that the larger an input dataset we provide to two algorithms, the greater the speedup for the faster algorithm. We hope that you now see why speed is such an important consideration when designing algorithms, especially in the modern era, as the amount of data that we are generating is growing at an ever-faster rate, so we need fast algorithms more than ever before.

Even given the clear need for efficient algorithms, it may not seem as though prime numbers have any relevance to modern computing. And yet the ability to find large prime numbers quickly is central to a fundamental development in computer science called public key cryptography that you probably use every day.

Conclusion: Public Key Cryptography

A fundamental property of the internet is that when you transmit a message (email, bank transaction, or even a private Google query), this message should be understood by the recipient, but not by anyone else who may be eavesdropping on the transmission. We therefore need to encrypt, or transform the message in some way, so that only the desired recipient can decrypt the transmission on the other end of the channel to reveal the original message.

Most encryption schemes that we might imagine are called symmetric, meaning that we use the same approach for encrypting and decrypting the message, called a key. For example, say that we encrypt a word by taking the next letter in the alphabet for each character, so that HELLO becomes encrypted as IFMMP. This key is symmetric because we can decrypt a message by taking the previous letter in the alphabet at each position. Although a code-breaker could quickly unravel this scheme, there is a much greater problem that arises for symmetric encryption approaches, which is that the key is private; the sender and receiver must agree upon the scheme in advance, since sending it on a public channel would defeat the purpose of encryption altogether. Private-key approaches are generally useless online, where we cannot meet the recipients of our messages in person. Private keys also present an apparent catch-22: the key itself is a message that we want to hide from eavesdroppers.

This straightforward problem of information-hiding seemed unsolvable for most of human history, until the late 1970s, when three mathematicians devised a new encryption scheme based on a public key. Their critical insight was that knowing the key used for encryption does not automatically imply that it is easy to decrypt the message, if the recipient knows certain additional details about the key.

This may sound mystical, but say that the recipient picks two very large prime numbers p and q (they may be around 300 digits long in practice), and form n = p · q, which they publish as the public key. Without getting into the details of how the encryption is performed, the key will be asymmetric in that knowing n does not help an eavesdropper know how to decrypt the encrypted message; the only way to decrypt the message is to know the primes p and q. No one has ever found an algorithm that can quickly factor a very large number n, and so someone listening in on the channel who sees the encrypted message will not be able to determine p and q; the recipient, on the other hand, knows these numbers and can decrypt the encrypted message quickly.

Public key cryptography exemplifies a seminal idea in computer science that rests on a foundation of beautiful mathematics. In the next chapter, we will see how computation can be used to find answers to biological questions without needing to run a single experiment, and we hope that you will join us.

In the meantime, I provide a video at the bottom showing how this can all be implemented in Go. You can learn how to install Go here:

You can find more information about Programming for Lovers at the course homepage on Canvas: https://canvas.instructure.com/enroll/RKFKKP.

A lecture introducing Go from the ground up and accompanying this material.

--

--

Phillip Compeau
Programming for Lovers

Associate Teaching Professor and Asst Dept Head for Education in the Computational Biology Department in Carnegie Mellon University’s School of Computer Science