What does it mean for a computer to be “Universal”?

9 min readFeb 12, 2019

Part 3 of Is “Is the brain a computer?” even a good question?

It’s too late to change the answer to “Is the brain a computer?”, and it distracts from the really useful questions about the relations between computers and brains. Nevertheless, a deeper look finds that brains stretch the definition of computing, perhaps beyond the breaking point.

This is part 3 of a series of brief essays (sometimes very brief) on aspects of this question. Part 1 contains the introduction and an index to the whole series.

Universal computability

As the mathematical theory of computation developed alongside the technology of computing, a subfield concerned with the capabilities of various computing architectures emerged. This field of automata and computability theory studied many different kinds of computing architectures, sorted them into different capability classes, and discovered many properties of them that were interesting enough to be called “theorems”. Remarkably, as the systems grew in complexity and capability, at a certain level they all became equally capable — any of them could become any of the others, and do anything that the others could do.

In order to achieve universal capability, a system needs to have at least these properties:

Operate on infinitely large sequences of a finite set of symbols.
Conversion of a start configuration to an end configuration, or fail to stop. Not all real, useful computer programs terminate, e.g. an air traffic control system must continue running as long as there are planes in the air. Objects being converted range from single bits to mathematical reals to complex objects such as tensors of fixed-length integers.
Ability to change the course of succeeding operations depending on the result of previous operations.
Provide for interconversion among code, data, and execution (conversion of execution results into data to be computed with is important for machine learning, for example).

These properties are necessary for universality, but they’re not sufficient. To prove a system is a universal system, you need to do one of two things: one of them is to show a detailed method of enumerating every computation in the infinite set of possible computations, and show how the system you’re studying can potentially perform every one of those computations. This is a complex and tedious process. Alternatively, you can choose one of the existing systems that has already been laboriously shown to be universal, and show that your new system is equivalent to it. Alan Turing, Alonzo Church, and Emil Post each independently proved universality via the first method in the 1930s, while nearly every other proof of universality since then has been via the second method. In honor of their accomplishments, the idea that every possible computation can be computed by one of those three systems is called the Church-Turing Thesis.

The systems created by Church and Post were studied mostly by mathematicians, but the system created by Turing was easily seen to be a mathematical model of a physical device, and the study of its properties was taken up by engineering researchers in addition to mathematicians. The engineers’ tinkering background led to a myriad of different architectures and by the 1970s, there were entire books devoted to cataloging and classifying the different kinds of computing automata, such as Marvin Minsky’s “Computation: Finite and Infinite Machines”.

The idea of universality, that every computationally universal system can be any of the others, changes the level of analysis that research programs focus on away from compiling lists of different kinds of systems and the properties that they have. Once we’re in the realm of universal systems, arguments about which system is better are no longer arguments about which one can do more things — they all can do anything. What remains are distinctions among efficiencies for various purposes, and which systems are most convenient to think and reason about.

The study of the foundations of computational efficiency crystallized in the work of Gregory Chaitin and Andrei Kolmogorov as they invented algorithmic complexity theory, which among other important results, identified the fact that absolute complexity measures do not exist: complexity is always relative to some specific set of hardware capabilities. This means that advances in engineering, such as the invention of memristors or quantum gates or phase-change memory, can have wide-ranging effects on the landscape of practical computational capabilities. In biological systems, the evolution of specialized synaptic junctions between cells, and of myelin insulation on neuronal axons had equally profound effects on the behavioral capabilities of those organisms which were able to exploit their advantages.

In its purest form, universality is achieved by one system emulating another, that is by implementing a virtual machine. Creating a virtual machine with more convenient properties for the task at hand has become pervasive in modern system architectures — apps on modern computers sometimes run 4 or 5 VMs deep. Without careful analysis, it’s often difficult to identify all the different VMs that may be involved in the operation of even a common PC word processor program. What we normally think of as the CPU itself may be a “machine language” VM, running on a hardware microcode layer, The app itself may be built on a language-specific VM such as the Java JVM. Sophisticated PC users may run multiple operating systems in VM environments such as Oracle VirtualBox or VMware Player or Parallels. Applications that run in “the cloud” have major portions of their code that is actually executing via system-sharing VMs such as Xen, VMware or Microsoft Hyper-V as implemented by Amazon Web Services, Google Cloud, Microsoft Azure, or many other smaller cloud providers, at virtualization ratios with 20 or more customers’ VMs per real CPU. Varieties of virtualization blur the distinction between hardware and software using techniques such as paravirtualization where code executes in the lower host machine most of the time, without the overhead of emulation by a VM, and only traps into the VM code for certain privileged instructions.

What do functions do?

The stream of research launched by Alonzo Church was based on the idea of combining functions to create new functions. Functions had been around in mathematics for hundreds or thousands of years, but the notion of operations that create new functions seems to have appeared with the calculus of Newton and Leibniz.

In common usage, the function of something is its purpose. We routinely say things like the function of the wings of an airplane is to provide lift.. In math, on the other hand, functions don’t “do” anything, they simply represent the relationship between a domain and a range, such as the sine function that represents the relationship between an angle and the ratio of the length of the opposite side to the length of the hypotenuse of a right triangle containing that angle. In the most well-known styles of programming, a function is a section of code that returns a result. That result can be obtained by “computing” it.

However, there’s a style of computing called “functional programming” where there are only functions, variables and constants. Results occur not by computing them, but by evaluating the result of applying functions to the results of evaluating other functions, up to primitive data types like symbols and numbers. Most importantly, functions can return other functions as their results. When this style is followed strictly, the notion of a sequence of operations is no longer fundamental, just as in math the sequence of operations is not explicitly specified, and it becomes much easier to reason rigorously about the properties of functions, making programming errors significantly less likely.

It’s even possible to represent numbers as nothing but compositions of other functions. These are called “Church numerals” after Alonzo Church, who invented the method of computing with functions called the lambda calculus. The lambda calculus was the first functional language, and was succeeded by many other functional languages, including Lisp, OCaml, Haskell, and Erlang.

In the lambda calculus, variables can be anything. That openness is hidden but still there even when working strictly with numbers, since the numeral zero was defined as a function that returns another function with an arbitrary argument. The mathematical form of the Church-Turing Thesis is concerned with recursive functions of integers, and mathematicians working in the area of computability and the foundations of mathematics appear to have largely ignored the full implications of allowing variables to go beyond simple objects like integers, sets, and functions, and be real numbers and more complex mathematical structures, such as tensors or even topological manifolds. For designers of functional programming languages, however, facilities for building and managing complex data structures are a key concern.

It’s possible to take the notion of variables processed by functions even further, into the real world, and consider the world as streams of events, transformed by functions into other streams of events. This leads to very recent programming paradigms such as functional reactive programming (FRP), and suggests that FRP systems may be a natural platform for modeling systems with complex behaviors such as mammalian brains in a style where comprehensibility and validity of intermediate levels of description is important, and where the language that the model is expressed in needs to be similar to the language of the description of the system being modeled, and where formally verifiable proofs of system properties provide confidence in the correctness and consistency of slippery cognitive and philosophical concepts.

Non-procedural computing

A third stream of universal computing architectures was originated by Emil Post at about the same time that Turing and Church were working on the problem. The simplest of these architectures are called term rewriting systems or production systems, and their programming style is called declarative, or non-procedural, versus the sequential, procedural style of more conventional programming. Declarative programming systems are more common than many people realize: database query languages such as SQL are primarily non-procedural. The common spreadsheet embodies a non-procedural framework as well, since spreadsheet cells obtain their values in no particular order, depending not on any explicit program but on the data that is entered into other linked cells.

One aspect that made the discovery of the mechanisms of molecular genetics in the 1960s so exciting to many people was the way it functioned like a Turing Machine, when many more biological processes operate non-procedurally. In contrast to the string of symbols on a tape that is modified at a single working locus like DNA and RNA processing by polymerase enzymes or by ribosomes, Post systems operate on the “best match” symbol pattern wherever it occurs in the string of symbols that is being processed, in a way that is like cellular differentiation in a tissue or like associative recall of a memory in a situational context.

Post systems replace one symbol at a time; expanded systems that replace all the matches at once were introduced in 1968 by Aristid Lindenmayer, and are called L-systems. They can be used to produce quite realistic renderings of algae and plant growth. The L-systems that produce plant-like structures are not universal, however. Universality requires rules that delete symbols and that move symbols from one location to another, in a way similar to organisms with cellular apoptosis (“programmed cell death”) and migration. The space of body plans developmentally reachable by organisms with cells capable of apoptosis and migration is vastly larger than the space of body plans exhibited by algae and plants, and have not been studied synthetically in the same way. Understanding the range of possible animals should be important not only to understand sequencing constraints during the explosion of animal body plans that occurred in the Ediacaran and Cambrian periods 650 million to 500 million years ago, but also for exobiology.

Like functional computing, non-procedural computing provides a better model for the way brains compute than Turing or von Neumann computing. While the classic associative chaining originated by behaviorist psychologists and used by behavioral neuroscientists provides behavioral outcomes that are sequences of movements, the brain achieves those sequences via a radically different mechanism than by a sequential program — the pattern matching phase of associative sequencing is intrinsically parallel, and “program” elements are not executed in any explicit order. There’s nothing in any brain that corresponds to a Turing Machine’s tape head, or that corresponds to the program counter of a von Neumann machine. The brain of a rat in a maze is not operating anything like a program on a stored program computer. Non-procedural systems have been used for artificial intelligence since the 1980s, and continue to be used to model high-level cognitive processes today.

The dominance of computer hardware with linear addressable RAM can be ascribed to both historical reasons such as Alan Turing’s wartime assignment to cryptography work, and to the mathematical importance of the “decision problem”, which overshadowed the development of analog computers such as the Mark I, and to the technical simplicity of a RAM memory cell, which created an economic cascade that makes it far less expensive to emulate a functional or declarative system on a procedural computer than to build an equivalently mature system of another type from the ground up. The result was that although hardware associative memories can be found in specialized roles such as the cache subsystems of conventional computers and in the routing tables of network processors, textbooks in computer architecture such as that of Hennessy & Patterson effectively omitted more exotic functional and associative-memory systems like Lisp Machines and Goodyear Aerospace STARAN, which represent hardware realizations of non-von Neumann computation.

Go on to Part 4

Go back to the Index

What does it mean for a computer to be “Universal”?

It’s too late to change the answer to “Is the brain a computer?”, and it distracts from the really useful questions about the relations between computers and brains. Nevertheless, a deeper look finds that brains stretch the definition of computing, perhaps beyond the breaking point.

Universal computability

What do functions do?

Non-procedural computing

Written by George McKee