# Why Would You Care About the Halting Problem?

I’m going to shamelessly declare in this very first sentence that without the halting problem there probably wouldn’t be computer science today. Or maybe there would, but it would look very, very differently.

Can you recall the last time you thought about the halting problem? For me, it came up a few times in my undergraduate computer science courses, but that was about it. In my day-to-day job as a software engineer, we do not hold meetings to discuss the halting problem. We do not brainstorm the limits of computation. Why would we? It is a busy world, and we have features, tests and scalability to work on. We are focused on the future, and we actively using *the foundation from the past* to advance to the future.

As a brief overview, the halting problem is the question that Alan Turing answered back in 1936, that in modern terms would sound like this: “Is there a program that determines whether another program halts or not?” As a quick example, the following Python program runs forever, i.e. does not halt:

def runForever():

while (True):

print("Do I halt?")

My goal in this story is two things. First, I want to cleanly explain the Halting Problem, Turing Machines, Universal Turing Machines, and the implication of these findings on the development of computer science and modern computers. Second, I want to make you excited about it.

When I first learned about Alan Turing and his epic research, I didn’t take enough time to think about it in detail and understand why it is really so important. I took many foundational concepts of computer science for granted. It wasn’t until later, when I watched “The Imitation Game”, a movie about life of Alan Turing as a codebreaker at Bletchley Park during WWII. The storyline is about Turing developing a machine that cracked the code the Nazis used as a main source of communication. Just a little bit of Hollywood-style exaggerated science (completely reasonable for the sake of good cinema, by the way) presented in this film was enough to spark my interest in his work, and start with an innocent Google search: “Turing”.

I later found myself in a reading marathon, digging through Turing’s work in the British Archives and his notes on digital computers and artificial intelligence, his biography, tens of articles on his legacy, and, of course, his groundbreaking paper *“On Computable Numbers with An Application To The Entscheidungsproblem”*. Lastly, I read a book called *“Turing’s Vision”* by Chris Bernhardt, which focuses almost exclusively on that paper, and explains it in great detail. This all made me very excited, and I cannot stay silent about it anymore, so I must share my learnings with you. Onward!

#### The Entscheidungsproblem

In late 1920’s the mathematician David Hilbert presents the following question to the mathematical community: “Is there a procedure that takes any first-order logic statement and decides whether it’s universally valid?” By universally valid we mean valid on *every* input. Now, what does a first-order logic statement even mean?

Consider a real-world statement: “If a person is driving, he/she is not texting” (which is, oh, so not true in reality). Let’s convert it to a first-order logic statement. Let’s first rephrase the sentence to “for every person, if a person is driving, he/she is not texting”, or . Suppose, “driving” and “texting” are functions C and D, respectively; then, the statement above would translate to “for every *y*, if *y *is C, *y* is not D”. Finally, using first-order logic symbols, we get ** ∀y(C(y) → !D(y)) **statement, where

**is a quantifier “for every”,**

*∀***y**is a person, “

**→**” is an implication (if left side is true, right is also true), and “

**!**” is a negation. First order logic allows us to formalize real-world statements via the use of quantifiers and variables . Can there be a procedure that determines whether the statement above is valid for every y? Does every human obey this rule?

Hilbert is convinced that such an algorithm exists. He calls it the Entscheidungsproblem, or “decision problem”. (Moreover: Hilbert actually had a list of 23 unsolved at that time problems, and Entscheidungsproblem was one of the questions that comprised Problem #2. Some of them were solved overtime, and some are still unsolved, but this is, unfortunately, outside of the scope of this article. For more, check out Hilbert’s Problems).

Fast-forward a few years later: in 1936 24-year old Alan Turing is looking for a topic for his PhD thesis at Princeton. He soon realizes that he wants to work on Entscheidungsproblem. For this proof, Turing needs to formally define an algorithm (you might have already realized Hilbert’s magical decision problem is certainly an algorithm, but at that time it was not really defined). Now, this is where the juicy stuff begins. Turing’s creative approach not only proved Hilbert wrong, but also introduced multiple powerful ideas that shaped what we now refer to as “computer science”.

**Turing Machine (aka Automatic Machine)**

Turing introduced a model of computation that he called “Automatic Machine”, which is defined as follows. The machine operates on an infinite tape, which is split into equally-sized cells. Each cell may or may not contain a symbol, 0 or 1. The head of the machine reads a symbol from a cell, then consults a pre-defined *finite* list of instructions. I should add that each instruction has the following information:

*current state A*(current cell and the symbol we just read from that cell)*action*(write new symbol to the current cell)*next state B*(next cell to move the head to)

When an instruction is executed by the machine, the machine learns the next instruction to execute (by checking next state field in the current instruction). Note that “next state” could also be a constant “halt”, which is the final state (or, as described by Turing, accepted state). The machine then ceases its execution. What we just described is essentially an algorithm (in the form of a user-defined instruction set) performed on a particular input (in the form of initial values in each cell of the tape).

The image below shows an example execution of a Turing Machine (TM). It is computing 2+1, or generally, a sum of two numbers in unary representation. The black triangle represents machine’s head. Letters A, B, and C to the left represent state names.

#### Universal Turing Machine (UTM)

Universal Turing Machine is a Turing Machine, which has an important capability. Recall that Turing machine consults pre-defined instruction set and computes on a given input. But what if a TM could simulate execution of *any* TM, by taking not only an input, but also a definition (instruction set) of any TM? This is exactly what Universal Turing Machine does. This simple, but powerful change is considered an inspiration behind modern stored-program computer: a computer design, which is defined by placing program instructions in memory (we will talk more about this later).

### UTM Instruction Set Representation

You might be wondering: how would a UTM actually take a regular TM as an input? We can come up with an encoding! We have previously discussed contents of a single instruction, but let’s inspect it in more detail. Each instructions contains the following symbols:

- Sᵢ: current state
- Cᵣ: symbol at current cell
- write symbol, where C₀ is blank, C₁ is 0 and C₂ is 1
- List of moves to perform, where “R” and “L” mean that machine would move to the right and left, respectively
- Sₓ: next state (instruction)

Let’s dissect a concrete example of an instruction, say, “S₁ C₁ C₂ L L S₂;”. Here, in state 1, we read the current cell value 0, write a value of 1, then move left twice before arriving at state 2. Additionally, in the end we have a symbol “;”, which is a separator for instructions (since there can be multiple of those).

Now let’s replace Sᵢ with B followed by S repeated i times, Cᵣ with B followed by C repeated r times and so on. In our example, such replacement would produce the following encoding: “BSBCBCCBLLBSS;”. Finally, we replace each letter with a unique single digit. For example, B=1, S=2, C=3, L=4, R=5, ;=6 would produce “12131331441226”. If there are multiple instructions, we do the same thing (recall this is why we have the “;” separator). In fact, both 1 and 6 are only needed to separate instructions and fields within an instruction.

This is a complete TM configuration. We have just uniquely described it by only using 6 digits! This is important; we are now able to encode any TM as an argument for a UTM!

### Entscheidungsproblem To Halting Problem

In his paper “On Computable Numbers, with an Application to the Entscheidungsproblem”, Turing asked the following question: “Is it possible to write a machine that determines whether another machine halts its computation?” (though he phrased it a bit differently, mostly around a machine printing a finite number of symbols). This is not exactly the question that Hilbert asked. How would the proof for the halting problem help us with the Entscheidungsproblem? Turns out that the Halting Problem reduces to the Entscheidungsproblem!

To understand this a little better, let us rewind back to 1930s. Before formulating the Entscheidungsproblem the way we know it today, Hilbert was convinced that a general procedure for determining validity of any theorem indeed exists. At that time, a young mathematician Kurt Godel storms in with his Incompleteness Theorem, and the following statement, which caused Hilbert to scratch his head: “This sentence is not provable”. Let’s inspect it closer: this statement can be either true or false. We must conclude that it cannot be false; otherwise, it would be provable, which is a contradiction. So it must be true, meaning we cannot prove the validity of this statement. Hilbert’s question boils down to a search algorithm, where given a list of accepted truths (axioms), we can either reach a terminal state (*halt*), i.e. prove that a statement is valid or find its contradiction, or we can search forever, never knowing that the statement is unprovable. Does this sound familiar? This is exactly the halting problem.

A slightly simpler way of thinking about it: if we want to find an algorithm to determine if a statement is valid, we should at least know whether the algorithm finished or not (if it does not, we would never get any decision back!). So if no algorithm to determine whether a program halts or not exists, then no solution to the Entscheidungsproblem must exist either!

**Cantor’s Diagonalisation**

We are now very close the grand finale of proving Hilbert wrong! For that, we only need to go over one essential piece of the proof, which is Cantor’s Diagonal Argument.

German mathematician Georg Cantor was interested in countable and uncountable infinities. A set is countable if it satisfies two requirements. First, every element in the set of natural numbers has a one-to-one mapping to an element in the given set. Second, every element in the given set must have such mapping, so no element is left uncovered. For example, all possible humans are countable. But if each human has a pair of shoes, are those pairs of shoes also countable? Let’s say these shoes are Allbirds (Airpods also participated in the example selection process, by the way). As long as we can find an owner for every pair of Allbirds, our shoe set is countable.

Essentially, we just showed that every positive even number (recall shoes come in pairs) maps to a natural number, which means that a set of positive even numbers is countable!

While we are on the topic on countable numbers… what about real numbers? Cantor showed that real numbers are uncountable via a truly creative approach. Let’s say the set of real numbers Z is indeed countable, starting with s₁, s₂, s₃… etc. This means that every natural number maps one-to-one to a real number, and every real number has such mapping. Let’s make a table with these mappings, and represent natural numbers in binary, just so that the proof is a bit more intuitive.

Let’s now construct a new real number, sᵢ, where each bit in the position *x* is the opposite of sₓ[x]. In other words, we construct our new number by flipping the first bit of s₁, the second bit of s₂ and so on (going diagonally).

Everything seems to work out until we reach the iᵗʰ bit. By the construction definition, iᵗʰ bit must be an inverse of itself, which is impossible. Hence, such number cannot have a one-to-one mapping to a natural number, which contradicts our original assumption that real numbers are countable. Set Z must then be uncountable!

### Diagonalisation and The Halting Problem

We’ve reached the point where the pieces come together. Let us now apply Cantor’s Diagonal Argument to the halting problem. Assume that a UTM that can compute whether a TM halts exists. To exhibit the results of such a machine, let us construct a table A, where horizontal axis ** k** represents all possible inputs i

*ᵢ*. Then, the vertical axis

**represents all possible definitions of Turing Machines, T**

*j**ᵢ*. A result in A

**represents whether the program halts (1) or not (0). The table now contains answers to whether each TM halts on any input.**

*[k][j]*Similarly to our previous proof that real numbers are uncountable, let us construct a Turing Machine *D*, such that on input* i1*, *D *does the opposite of T1, i.e. it does not halt. We apply this method to each input in our axis ** k**, until we hit input

*d*, where a problem we’ve seen before arises. I think, at this point, you know where I’m going with this. Just like with real numbers uncountability proof, D cannot be in the table A. This means that a set of Turing Machine definitions and its inputs must be uncountable. This also means that our original assumption about the existence of a machine that can decide whether another one halts must be wrong. Finally, the halting problem must be undecidable then.

**The Implications of the Halting Problem**

We just proved that the halting problem is unsolvable. But why is it really so important? There are multiple things here that are worth mentioning.

#### Understanding of computational limits

As computers are used everywhere these days, it is not uncommon to get an impression that they can compute anything. We have just learned some important computational limits. While we probably all agree that computers can’t compute the meaning of life, we just learned that they can’t decide whether a program halts. They can’t even decide if a program prints a letter ‘A’. The undecidability of the halting problem lets us reason about problems that are solvable, and problems that are virtually impossible. Knowing this allows us to make reasonable assumptions, and accept imperfections in our code.

A lot of problems in the world of software engineering are practically the halting problem in disguise. Take compiler warnings as an example. Suppose you’d like to have a warning every time you introduce a codepath that is never used. This is impossible, precisely because of the halting problem. Or maybe you decide to be a bit defensive with your program, and place several asserts. You’d like to ensure that these *asserts* will always assert. This is also precisely the halting problem!

I should mention that in the examples above, I omitted one important detail. I assumed that the programming languages we are talking about are Turing-complete. The definition of a Turing-complete language is simple: it is a language that can simulate any TM. Anything that a TM is able to compute, a Turing-complete language can compute too. That being said, there are languages that are not Turing-complete, and completely decidable (examples would be HTML or JSON). Such languages prohibit recursion and iteration. While it is pretty cool that they are fully decidable, I’d find my imagination rebel, and beg for more expressivity.

#### An inspiration for the modern stored-program computer.

The idea of keeping the instruction set in digital memory (just like TM description and input on the same tape of UTM) is implemented in virtually every personal computer today. In fact, vast majority of modern day computers are stored-program; For example, the common and popular Von Neumann computer design, is also stored-program. In Von Neumann architecture, both instruction set as well as data are kept in the same storage unit.

Why is this significant? You see, prior to a stored-program computer, computers weren’t viewed as reprogrammable machines (and even those that were, stored their programs externally, for example, on punch cards). They were hard-wired to perform a single duty.

On the other hand, by placing program instruction in memory, stored-program computers result in much more flexibility than before. It becomes fairly simple to change your program (or even do hacky things like writing programs that modify themselves!) This architecture really defines a general-purpose computer.

In his paper “*On Alan Turing and the Origins of Digital Computers*”, Brian Randell confirms the Turing/Von Neumann relation, and quotes Dr S. Frankel, who used to work with Von Neumann:

I know that in or about 1943 or ’44 von Neumann was well aware of the fundamental importance of Turing’s paper of 1936… Von Neumann introduced me to that paper and at his urging I studied it with care. Many people have acclaimed von Neumann as the “father of the computer” (in a modern sense of the term) but I am sure that he would never have made that mistake himself. He might well be called the midwife, perhaps, but he firmly emphasized to me, and to others I am sure, that the fundamental conception is owing to Turing — in so far as not anticipated by Babbage… Both Turing and von Neumann, of course, also made substantial contributions to the “reduction to practice” of these concepts but I would not regard these as comparable in importance with the introduction and explication of the concept of a computer able to store in its memory its program of activities and of modifying that program in the course of these activities.

#### A genuine celebration of creativity

The real beauty of Turing’s paper lies in the approach. Prior to Turing, in April 1936, his doctoral advisor, Alonzo Church went on to publish his own proof of unsolvability of the halting problem using lambda calculus. While his proof is also valid and universally accepted, it’s much more complex. The innovation of Turing machines, Universal Turing Machines and elegant application of Cantor’s diagonalisation makes Turing’s paper a truly remarkable piece of work, that I hope you get to think about a little more now that you have read this.

** Epilogue (A Brief History of **̶t̶i̶m̶e̶ …

*Me Writing this)*It’s a miracle! I wanted to write this article for months, and it is finally here. I wrote the initial version in July, but something felt really off about it. I then realized that I was just throwing a bunch of theorems together, without any particular structure. My essay turned into a beanbag of definitions, and non-segued explanations next to each other, which is not what I wanted. I later rewrote the essay almost entirely, fixing its immaturities and embracing critical thinking in every sentence I wrote. I’m more or less satisfied with the result; perhaps the writing style isn’t quite there — I write code for living daily, so I definitely did not expect any Shakespearian writing pop out of the blue. I also want to say special thanks to Leonard Apeltsin and Marina L. for the valuable feedback and support!