A Generative Model for Discovering the Unknown

Published in

Intuition Machine

7 min readNov 10, 2018

Previously, I wrote about an actor’s state of knowledge. Here, I will explore the limits of knowledge to better understand that nature of the attainability or unattainability of knowledge. This has pragmatic importance in that a framework tells what is within the realm of possibility and what is not. This prunes our search space for knowledge to what is known to be attainable.

Here’s a schema of knowledge attainability:

Nescience defined here as “unattainable knowledge”.

Trainability is the ability to learn how to discriminate what is already known. Generalization is the capability to process what was previously unknown. Universality is classified here under ‘known unknowables’ in systems that exhibit universality (i.e. Formal Systems, Turing Machines, Cellular Automata) are known to have dynamics that are undecidable or unknowable. The paper “Self-referential basis of undecidable dynamics: from The Liar Paradox and The Halting Problem to The Edge of Chaos” discussed the emergence of undecidable dynamics. This undecidability as a consequence of the self-referential features of computational systems:

Self-referential basis of undecidable dynamics: from The Liar Paradox and The Halting Problem to The Edge of Chaos

How does knowing that is initially unknown become known? To illustrate, prior to Gödel, it was believed that consistent and complete formal systems can be discovered. Gödel proved for any formal system there exists knowledge that is unknowable. In short, Gödel discovered a previously unknown about formal systems. It just so happens that what he discovered was a definitive statement about the limits of knowability.

Pragmatically, we exclusively seek a solution to the problem of “unknown knowables” using of course systems that exhibit universality. My thinking about navigating uncertainty or knowledge discovery is inspired by Stuart Kauffman’s “patterns of evolution”. It’s best to adopt his vocabulary to better articulate my ideas on this. I do recommend that you read:

Prolegomenon to patterns in evolution

Despite Darwin, we remain children of Newton and dream of a grand theory that is epistemologically complete and would…

www.sciencedirect.co

I propose a post-entailing law explanatory framework in which Actuals arise in evolution that constitute new boundary conditions that are enabling constraints that create new, typically unprestatable, adjacent possible opportunities for further evolution, in which new Actuals arise, in a persistent becoming. Evolution flows into a typically unprestatable succession of adjacent possibles.

Knowledge discovery also employs the same post-entailing law proposed by Kauffman, wherein the new in the universe of knowledge are enabling constraints that create new adjacent possible knowledge. Then when some new knowledge is discovered in the previously unknown adjacent possible then new knowledge becomes emergent in a novel way that could not have been predicted previously. Radical emergence arises in evolution and in learning that is impossible to capture with present-day mathematics. This is because the boundary conditions keep changing with each new discovery of knowledge.

There is currently some mathematical understanding of the distribution of innovation. This is discussed in “Mathematical Model Reveals the Patterns of How Innovations Arises”.

https://www.technologyreview.com/s/603366/mathematical-model-reveals-the-patterns-of-how-innovations-arise/

In the cited paper, that innovation is enabled by “the adjacent possible”. That is those patterns that are one step away from existing learned patterns. So rather than developing patterns that have no connection, new patterns are realized through existing patterns and the thus new areas of unexplored patterns are discovered:

by providing the first quantitative characterization of the dynamics of correlated novelties, could be a starting point for a deeper understanding of the different nature of triggering events (timeliness, scales, spreading, individual vs. collective properties) along with the signatures of the adjacent possible at the individual and collective level, its structure and its restructuring under individual innovative events.

The simulations in this paper indicate that the discovery of novelties is the same mechanisms as evolutionary innovation:

The same model accounts for both phenomenon. It seems that the pattern behind the way we discover novelties — new songs, books, etc. — is the same as the pattern behind the way innovations emerge from the adjacent possible.

The above research established the similarity of evolution and knowledge discovery through a descriptive model. This is an indication that the underlying mechanisms may be identical. There are however many kinds of models of the world:

**[1810.04261] A Tale of Three Probabilistic Families**

A discriminative model is what a machine learning system generates when training a classifier. It discovers how to recognize patterns that associate an input signal with a class label. A descriptive model specifies the probability distribution of a signal extracted from the descriptive feature statistics extracted from the signal. Thermodynamics is empirical in its characterization of systems through the aggregate measurement of a large collection of particles. The motivation of fields like Statistical Mechanics is to derive the distributions from first principles. It is common for many to be unaware of the relationship between Statistical Mechanics and Thermodynamics, but this is important to understand how probability is employed in real science. Probabilistic graph models attempt to generate distributions, however not from first principles, but rather from arbitrary guesses from a selection of prior distributions. A generative model is similar to a descriptive model but is derived from the bottom up. It begins with the dynamics of many subcomponents and generates bulk behavior. The astonishing results of a GAN is an example of this kind of model.

This classification is pragmatically important because too many people seem to conflate these three models as being the same (see Latent Variable Models). It is however not uncommon to employ these three classes of distributive models in a single system.

The question I seek is, how can we create generative models that exhibit intuitive ingenuity. Previously, I wrote about ingenuity and brought to attention the ideas of Christopher Alexander regarding “generative” systems. Generative systems lead to a holistic system with emergent properties where the sum of parts is greater than the whole. Emergence exists because as each new innovative component is discovered, an incremental new set of combinations is introduced that can lead to new capabilities previously non-existent before its discovery.

What are the intrinsic characteristics of a generative component?

Here are three characteristics that I’ve identified:

Adaptive by default. Components can be used with minimal friction or cost.
Reconfigurable. Sub-components can be re-arranged to be valuable in different contexts.
Generative. Components can be linked with other components to create new kinds of generative components.

The generative model for computers is the logic gate (nand or nor) that can be replicated and interconnected. That’s is all (memory simulated via a self-referential set of gates) that is fundamentally needed to create von-Neumann computers.

As another example of a generative system, we can look at software programs. Programs can be characterized as having a few components. That is assignment (assigning values to memory), sequencing (steps of instruction), conditionals (selecting instruction paths) and loops (repetition). The reconfiguration of these components leads to valuable programs. Also, as Computer Science has advanced, we have added other conveniences such as call stacks (subroutines), naming (variables), etc. There are all however easily emulated by the core generative components. That is, higher generative components are created out of the core generative components. This kind of design modularization is the key to the rapid advances in Computer Science.

We already know how to create systems that manipulate instructions and information that is explicitly known via conventional software. As expressed here, a system that discovers the unknown will likely work similarly to an evolutionary process:

Where the modularity of cognitive components undergo a process of creation, competition, cooperation, and destruction. The algorithms for a cognitive system are of course certainly more advanced than natural evolution. The difference is that evolution concerns the collective, while cognition concerns the individual. Although the same the post-entailing law is exhibited for both the collective and the individual. The individual differs from the collective because of the additional constraint of the notion of self. Understanding subjectivity is essential in making progress towards Artificial General Intelligence.

To understand this progression towards higher order cognitive capability, one can consult the capability maturity model described earlier:

What is the generative system that can at least lead to automation that can discover the implicitly known? (Level One)

What generative system can discover unknowns? (Level Two)

What generative system can discover what is unknown? (Level Three)

What generative system can discover new generative systems? (Level Four)

What generative system can discover what is unknowable? (Level Five)

[1811.03259] Bias and Generalization in Deep Generative Models: An Empirical Study

Abstract: In high dimensional settings, density estimation algorithms rely crucially on their inductive bias. Despite…

arxiv.org

Embedded Agents — Machine Intelligence Research Institute

Suppose you want to build a robot to achieve some real-world goal for you-a goal that requires the robot to learn for…

intelligence.org

[1806.10230] Guided evolutionary strategies: escaping the curse of dimensionality in random search

Abstract: Many applications in machine learning require optimizing a function whose true gradient is unknown, but where…