Working through Sipser, a textbook on Theoretical Computer Science

Robert L. Read


— with John K. Gibbons

Representation of a Turing Machine,

My friend John and I have been meeting weekly to workout, have dinner, and in some cases, to study together.

Recently, John and I decided to work through a textbook on theoretical computer science together. For John, this was new material, and he wanted to apply to graduate school. For me, it was old hat; I had taken one undergraduate and one graduate course on this, about 30 years ago. Both of us are older than most students, but we are both dedicated to perpetual learning.

More importantly, John was fascinated by the famous “P=NP” problem, and I am interested in the edges of “Church’s Thesis” (more on both below.)

We chose Introduction to the Theory of Computation (Second Edition) by Michael Sipser. According to the preface, “This book is intended as upper-level undergraduate or introductory text in computer theory.” I frankly have a hard time seeing how any student could absorb this material in a single semester. It took us 18 months of working a few hours each week. The book has 10 chapters; I think we did approximately 120 problems as homework.

We are glad we did. Both of us feel more confident and enlightened.
Most of these problems are proofs of one form or another. We probably got most of the right. Sometimes one of us would present a solution, and the other would say, “Oh, whoops, thank you, I didn’t think of that case, my solution is clearly wrong.” Usually we agreed. Both of us struggled with what constitutes a really good proof; for example, the solution Sipser gives to selected problems are very terse.

Every math student probably knows the feeling of being able to see why something is true, but not quite being able to express it in words. The act of writing down the solution that is the most beneficial mental exercise. We did not have the benefit of any grading of our work. On what percentage of problems would we have been given full credit? We can’t say. Of course, if we were taking the course formally, we could have adjusted to feedback, which is, after all, the purpose of grading.

So much for the experience of working through a textbook at our age. I’d like to now try to give you a glimpse of the beauty that we partially learned.

Euclid alone
Has looked on beauty bare. Fortunate they
Who, though once only and then but far away,
Have heard her massive sandal set on stone. — Edna St. Vincent Milay.

* * *

Theoretical computer science (TCS) exists in a nebulous relation to discrete mathematics and physics. Fundamentally it is about what can be computed by a machine, though machine must be understood as going beyond sprockets and camshafts, and even transistors. TCS studies what can be computed or calculated by fixed rules, whether mathematical or mechanical.

Until certain discoveries in the 1930s this might not have seemed terribly interesting. Perhaps it seemed that anything could be computed, although obviously some things would be harder than others. It turns out this is not so. There are things which no algorithm can ever accomplish. More astonishingly, there are in fact classes of problems that are clearly harder than others. The structure of these classes — what is strictly harder than what — is of tremendous practicality in our modern world. The elucidation of the fact that there is such a structure — a hierarchy of sorts, but not a strict ranking — is one of the great accomplishments TCS.

The book begins simply enough, however, with the simplest class of problems, those accepted by simple machines called “finite state machines” that can be beautifully represented as the transition between states. Inherent in formalizing this problem is something that I had seen, but was surprising to John: formalizing problems in terms of “languages”, which are particular sets of strings of symbols.

It might seem more basic to somehow start with numbers than symbols; but it is not so.

TCS defines computation as determination carried out by a machine whose input is a string of symbols is in a language or not, and whose possible outcomes are Accept (the string is in the language), Reject (it isn’t), or Stuck in a loop. Only unbounded languages are interesting; if there are only a finite number of strings in a language, there is always some simple machine that accepts it (though “simple” does not mean “humanly constructible” for sufficiently large languages.)

The simplest class of machines, the Finite State Automata, (FSA), can recognize interesting languages, but not every language. For example, it cannot compute if the parentheses are balanced in a expression of arbitrary length.

The nature of an FSA is that it processes each symbol in a string and is always in one and exactly one state. We might, however, ask a theoretical question inspired (today, but not historically) by quantum mechanics: what if we allowed the machine to be in a superposition of multiple states at once? Such a machine is called “non-deterministic”. Perhaps the first startling result of TCS is that non-deterministic Finite State Automata can compute nothing more (nor less) than deterministic Finite State Automata.

If we add a small amount of memory to a Finite State Automaton, in the form of stack which we can push things onto and take things off from the top, but not look inside, then we can compute things not computable with an FSA. This is a “push-down automata”, and it can recognize balanced parentheses easily. One fascinating result in TCS is the equivalence of seemingly different models of computation: a Finite State Automata is exactly as powerful as a Regular Expression, and a push-down automata is equivalent to a Context Free Grammar, both of which is valuable to working computer programmers.

If we expand the notion of the memory further to a “tape”, which we can read by moving a reading had back and forth across the tape, then we have the famous “Turing machine”. A Turing machine can compute much more than push-down automata. Modern electronics computers are almost always Turing machines, though they access memory differently, they can compute exactly what Turing machines can compute.

It is possible to create a Turing Machine that reads the description of a different Turing machine from the tape and simulates that different Turing machine. This “Universal Turning Machine” means that you can make a fully programmable computer. There is nothing computable by a computer
which cannot be computed by another computer simulating it (perhaps a bit more slowly.) This means that all Turing Machines have the same power, which is entirely nonobvious until demonstrated. Imagine an engineer in 1901 saying “Yes, but my machine is very precisely crafted and made from the
finest materials.” It matters not a whit if it is made from gossamer and gold or gumdrops and glowflies.

And what is that it can compute, exactly? Well, it turns out is not quite everything. The famous “Halting Problem” shows that nobody can create a computer program guaranteed to answer what might seem a simple
problem: Does a given Turing machine B executed on a given input w halt or not?

The ingenious proof is a “trippy loop”. Suppose you could build such a machine, call it M. M is supposed to take <B,w> and say whether B halts when run on w, not matter what B is. Then suppose that I steal
the plans for your machine, and build another machine, called A. By its action A simulates your machine M on an input, and if your machine M
says “Yes the input machine halts on that input” then A goes into an infinite loop, thus never halting. If your machine says “No the input machine does not halt on that input”, then A halts.

Now what does your machine M do when presented with this machine and an input string <A,w>?

Let us suppose that M reports that <A,w> halts. This produces a contradiction, because A was carefully constructed to enter an infinite loop if M reports that it A halts on w.

Let us suppose that M reports that <A,w> does not halt. This also produces a contradiction, because A was carefully constructed to halt on w if M reports that <A,w> does not halt.

This proof depends only on our ability to simulate one Turing Machine by another.

So there are languages which cannot be computed by any Turing Machine.

But perhaps the problem is with Turing Machines; maybe there is another model of computation that can compute more? Sipser’s textbook does not mention the lambda calculus, the other great model of computation, but in fact it is easy to prove that the lambda calculus and Turing Machines can each simulate each other, and therefore are of equal power. This is true of every other proposed model of computation that anyone has ever thought of, and is named Church’s Thesis. It is not quite a proof that no such more powerful model of computation can exist, but it a solidly entrenched result.

This may seem theoretical; perhaps you care more not about what can be computed but what can be computed efficiently with a modern electronic computer. Well, TCS has something for everyone.

With this formal machinery, we can easily define a class of problems called “P”: what languages can be accepted in time no greater than polynomial of the length of the input? (Such a class is much smaller than would could be computed by an exponential of the length of the input, for example.)

Well, there are many things you might want to compute which are “in P”, and quite a few things which provably cannot be computed that efficiently, to say nothing of what can by computed in “n squared” vs. “n cubed” vs. “n to the fourth”, all of which are different. In fact a great deal of practical research is discovering more efficient algorithms, a typical such paper would conclude by saying “language L which was previously known to be in O(n⁴) is now known to be in O(n^(5/2)) on the basis of this algorithm.”

But now it gets weird. Recall that every Turing Machine is just a Finite Automata bolted onto a tape used as storage. We previously mentioned Non-deterministic Finite Automata? What if we could build a non-deterministic Turing Machine? Would that be more powerful than a deterministic one or not?

This may seem stupid, but non-deterministic Turing Machines are very useful, even though we don’t know how to build one, for showing that a given problem is within a given complexity class. The class “NP” (non-deterministic polynomial) is the class of problems solvable in polynomial time by a non-deterministic TM.

It turns out this has a simpler way of thinking about it: Can the Turing Machine guess the correct answer and then check it in polynomial time? Perhaps it is clear that “guessing and checking” is sometimes easier to think about than figuring out how to find the correct answer.

The great question is does such a Turing Machine actually compute anything that cannot be computed by a deterministic polynomial time Turing Machine? In other word, is “P=NP?”

This is one of the most famous and studied problems in the world today, with tremendous practical implications.

TCS has not solved this problem, but has created important tools. One of them is complexity class “NP-complete”. There are dozens if not hundreds of named, important problems which are NP-complete. In practice what this means is that if a solution is found which solves one of them efficiently, humanity has an efficient solution to all of them.

And now, after much study, we understand this.

“The noblest pleasure is the joy of understanding.” — Leonardo da Vinci

As Freeman Dyson said, the Universe in infinite in all directions, whether we look at the at the small or the large scales. Theoretical Computer Science is a way of looking at the Universe that exposes beautiful new dimensions.



Robert L. Read

Public Inventor. Founder of Public Invention. Co-founder of @18F. Presidential Innovation Fellow. Agilist. PhD Comp. Sci. Amateur mathematician.