“What’s the next big trend in programming? Maybe it’s sloppy programming” — Guy Steele on the Future of Programming Languages
I want to discuss with you today four different kinds of programming. This framework should help you better understand the uniqueness of Deep Learning as a new way of programming. I characterize Deep Learning as a technique for programming massively parallel computer systems to perform useful tasks. We will explore here why this is revolutionary and is a new kind of programming.
Another kind of programming is known as declarative programming. This kind of programming does not burden the user with the details of the exact sequence of steps that must be performed. Rather, a user only needs to specify (or declare) the form of the final solution. The burden of figuring out the exact steps to execute to arrive at the specified solution is algorithmically discovered by the system. Spreadsheets are an example of this kind of programming. With Spreadsheets, you don’t specify how a computer should compute its results; rather you only need to specify the dependencies between the cells to compute a final conclusion. You could have a long chain of dependencies, and the spreadsheet will figure out which to calculate first. The query language SQL is also a well-known example of declarative programming. A SQL processor optimizes the kinds of retrievals it needs to execute to arrive at a table of data that satisfies the user-specified query. Other examples of declarative programming are Haskell (i.e., functional programming) and Prolog (i.e., logic programming). Code written in declarative programming languages are either interpreted by or compiled to imperative instructions. Non-constructive mathematics, of the symbolic computation kind, can also be classified as declarative programming.
A subclass of declarative programming comes from practices that predate mechanical computers. You may have heard of the subject “linear programming,” “integer programming,” and “non-linear programming. These are optimization algorithms where the use of the word programming was synonymous to planning. Just as the word ‘computer’ used to refer to people performing calculations, similarly ‘programming’ meant something different before the introduction of mechanical computers. In general, though, constraints are specified, and a solution is discovered that satisfies the constraints. A more general form of this is known as constraint programming. Yann LeCun has proposed the term “differential programming” as a substitute for Deep Learning.
Definition of RECOMBINANT
Relating to or exhibiting genetic recombination; relating to or containing genetically engineered DNA; produced by…
There is a fourth kind of programming that I would christen as “generative programming” (note: generative programming has been previously used to describe programming program generators) or “recombinant programming”. Recombinant is a word that originates in genetics. The generators of organisms and the process of evolution originates from the recombination of genetic material. Thus, the Recombinant Programming is an appropriate metaphor to describe this entirely new kind of proigramming.
This new kind of programming has as its origins methods in connectionist inspired artificial intelligence. In complexity sciences, dynamics is a consequence of progression. It derives from methods coming from Deep Learning, evolutionary algorithms, and reinforcement learning. Recombinant programming is best visually demonstrated by what is known as Generative Adversarial Networks. I will argue here that this should be classified differently from declarative programming or its subset constraint programming. One main distinction is the generality of the algorithm. However, it’s essential to make this distinction because it shifts the mindset from one of pursuing a reductionist approach to one that is based on ideas in the complexity sciences. The reductionist perspective is counterproductive in understanding the nature of this new kind of programming.
If you think imperative programming is hard, then you would be surprised that implementing a declarative programming language is even harder. To achieve something like SQL, a programmer must know how to translate a formal specification language into a sequence of instructions that correctly satisfies the different sentences that can be expressed in the language. A computer program has to be written that understand the language and the steps to translate the language into a solution. There is a massive cognitive bar that needs to be hurdled to implement a declarative language.
The advantage of declarative languages is that it is easier for humans to express their needs. Although many Domain Specific Languages (DSLs) have been invented, it is merely too difficult to create more of them. Even if you could implement these economically, it’s also difficult for the user to learn all the variety of DSLs.
Constraint programming, differential programming (i.e. Deep Learning) and recombinant programming share a common trait. The program or algorithm that discovers the solution is fixed. In other words, a programmer does not need to write a program that translates the specification into a solution. Unfortunately, though, the fixed program is applicable only in narrow domains. This deficiency is known as the “No Free Lunch” theorem in machine learning. You can’t use a linear programming algorithm to solve an integer programming problem. Deep Learning, however, has a unique kind of general capability that the same kind of algorithm (i.e., stochastic gradient descent) appears to apply to many problems.
There are distinct advantages to a fixed ‘master algorithm’ that solves all kinds of declarative specifications. This may appear to be an impossibility. However, if we relax the strictness requirements and prioritize the best effort approach then perhaps we can avoid the complexity. This prioritization is in fact what evolutional biology follows. The rich diversity that we find in biology and the evolution of general intelligence in humans is a testament to the viability of this approach.
Generative or recombinant programming is based on meta-learning algorithms. These are algorithms that are of the same fixed form, but in its construction of a solution, do not have the same level of cognitive understanding as required to construct the translations required of declarative programming language. This is what Daniel Dennett refers to as competence without comprehension. A GAN is capable of generating high fidelity realistic human faces but is unable to comprehend what it generated. It only knows if it is close to being realistic or not. Biological processes generate human beings from instructions encoded in our DNA but is unable to comprehend the process it is executing. In both cases, there is no programmer that specifies the sequence of steps required.
There exist only meta-learning algorithms that mindlessly generate solution one generative incremental action at a time.
Why is this classification important? It is important because it tells us about the kinds of models or reality that we can invent. One can think of the models that we create as belonging to two general classes. There are descriptive models and there are generative models. Descriptive models of reality are the mental models that we employ to describe the ‘shape’ of reality. Humans create causal models of the world to help reason about the world. All these models are descriptive models, some descriptive models with the greater predictive capability (i.e., Chemistry) than other models (i.e. Alchemy).
Reductionists will have the delusion that there are no limits to the descriptive models that we invent. However, we are aware of these limitations in our studies in the complexity sciences. Despite our precise understanding of physics and chemistry, we remain unable to formulate strong predictive models of biology. Yet, biology does what it does without the need for descriptive models. Biology only needs generative models, but how are generative models different from descriptive models?
This could be a universal characteristic of generative models. This is because generative models are ‘grown’ to favor robustness.
Generative models are compressed models of how to generate solutions. These compressed models, however, are not compressed in a way to facilitate comprehension. They are compressed to facilitate robust generation. A recent discovery in genetics known as the ‘omnigenic hypothesis’ reveals that coding instructions in DNA are not clustered in chunks but rather is spread out across the entire DNA. Deep Learning has similar characteristics in that if we attempt to create networks that generalize well, we lose interpretability (see: Deep Learning Uncertainty Principle). This uncertainty could be a universal characteristic of generative models. Generative models are ‘grown’ to favor robustness. Generative models in nature have no motivation to favor interpretability. Furthermore, robustness is enhanced through the use of redundancy and randomization.
As we explore generative models in deeper detail we will discover that the forces that lead to their creation are also the same forces that encourage its obfuscation. The execution of generative models inevitably works in mysterious ways. It’s important to realize the kind of information that a generative model captures. It captures information only generation instruction, this does not mean that it includes information that facilitates its interpretability. Following instructions requires lower intelligence than understanding instructions.
When Stephen Wolfram wrote his book “A New Kind of Science” that argued that simple models can lead to complex phenomena. What Wolfram was unable to present was a method to program simple models to create useful phenomena. I suspect the reason why Wolfram’s ideas did not gain much traction is that the narrative does not provide a hint of how to invent useful solutions. A new kind of programming, “Recombinant programming”, fills the missing hole in a new kind of science. For the first time in the history of human technology, we have discovered algorithms that we can “program” to generate useful behavior.