The Asymmetry of Information Discovery and Causal Entropy

Published in

Intuition Machine

7 min readOct 14, 2018

View of the earth with the least amount of land. The Austronesians were the first to navigate and colonize this vast “water world.”

One of my favorite tools for discovering new ideas is Twitter (see: https://twitter.com/IntuitMachine). The brevity and informal nature of Twitter encourages the tweeting of original and nascent ideas that have not been fully fleshed out. This post is inspired by a set of tweets by David Manheim. David works in existential risk mitigation, computational modeling, and epidemiology. He tweets:

He writes that (1) even simple systems can be intractable. He then writes that (2) Persistent and slow systems seem to defeat fast and clever systems and (3) the impossibility of prevention of security breaches. The reason I follow David is simply that he commented on a tweet of mine regarding Goodhart’s law. It’s also important to tweet about your ideas, and this has the side effect of discovering other people who are even thinking about similar ideas.

Anyway, David’s tweet got me thinking more about this. David points out the frustrating difficulty it has become to secure computer systems. The increasing reports of compromised systems, we are now just collectively developing an intuition about this. It is now becoming obvious how difficult it is to secure increasingly complicated systems.

Queries that ask what we already know are very different from questions that ask what we don’t know. Although we can enumerate what we know we don’t know. Some unknowns are outside our models. This asymmetry in information discovery is fundamental.

Information discovery is asymmetric. It is easy to discover what you already know. You can query your memory. It is unknown how long it will take to discover what you don’t know. Querying reality is open-ended. The asymmetric cost for discovery is a fundamental truth that most rational thinkers seem to ignore.

One fundamental form of information asymmetry is found in public-key cryptography (i.e. asymmetric cryptography). The mechanism that makes this possible is that it is significantly less costly to decode or validate an encrypted message than it is to decipher it without a private key. There is just no way to avoid the massive computational cost of reconstructing an unknown private key.

Another example this information asymmetry, that is becoming intuitive to many, involves our inability to build systems that are impervious to hacking is compromised by our need to create more complicated systems. It is easy to make something to do something. It’s impossible to predict all the permutations of what it can’t do.

Developing bug-free systems require enumerating our assumptions and writing tests to verify these assumptions. It is possible to enumerate all our assumptions, but it is impossible to know ahead of time if we have a complete set of assumptions. This is why the practice of Test Driven Development (TDD)is effective. TDD acknowledges that testing is a continuous knowledge discovery process.

Discovering errors and flaws in our systems is an iterative and knowledge discovery process. It is a process that requires more effort than the original discovery of new features and capabilities. This is why we always seem to overestimate our ability to deliver bug-free code. What is typically missing in our estimates is the information of the unknown. These are the bugs that we aren’t aware of.

TDD has the valuable by-product that it encodes our assumptions of the system in an explicit form. Many times our assumptions are implicit, and it is only through the process of writing a test does it become explicit and thus assumption can that can be referred to, tracked and analyzed.

The limitation of any model ( derived either by induction or counter-factual reasoning) is that its scope of validity is unknown. That is why intelligent systems must be able to introspect their models. Absent explicit introspective models, it is impossible to identify needed corrections. This is why an Inside-Out architecture that is derived from self-awareness models are essential for Artificial General Intelligence (AGI). You cannot reason about the inconsistencies of a model if you don’t have an explicit version of that model.

The economic consequences of asymmetric information lead to the Nobel prize in economics in 2001. The economic model of information asymmetry involves the asymmetry of information between parties. Here I propose the asymmetry in the cost of acquiring information that is known versus information that is unknown. Discovering new information is more costly. However, what is not intuitively obvious is that the unknown isn’t a static amount. Rather it is dynamic in that the more knowledge that accumulates, the greater the possibility of new permutation of that knowledge. Said differently, the greater the possibility for technological innovation and thus the greater the uncertainty. Technological innovation is usually unknown before the discovery of the stepping stones required for its realization.

Information entropy assumes that the total number of uncertainty is a fixed amount and that what information you are certain of subtracts from total uncertainty.

https://en.wikipedia.org/wiki/Mutual_information

Entropy is defined to be fixed; this is in contradiction to this idea of asymmetric information discovery. That is a chunk of certainty removes a proportional chunk from uncertainty. Rather, it seems that a chunk of certainty just adds to one’s knowledge, but doesn’t reduce the chunk of uncertainty. If we consider the discovery knowledge as equivalent to the discovery of new tools (i.e., like a virtual tool), then new possibilities are created in the use of that tool that was previously inaccessible. In other words, the space of possibilities increases and therefore the solution space must also increase.

New knowledge inevitably creates new permutations of what is possible. I would like to call this kind of entropy, ‘causal entropy’ to distinguish it from the usual information entropy. That is, as knowledge is organized the number of possibilities of changes that have a material causal effect increases.

New knowledge counter-intuitively expands the causal uncertainty. This is because of new knowledge feedbacks into itself. To illustrate this, consider again a newly released software service on the internet. With each new additional feature that is added, the uncertainty of vulnerability from adversarial attacks increases. In the security world, they call this an increase in ‘cyber-attack surface area.’ The analogy with security is intuitive for anyone who has played the game of Risk. The countries that are most vulnerable are the ones that have the greatest options to attack. Flexibility creates opportunities, and its by-product is exposure to greater uncertainty.

What begins clear is that any approach that employs adaptive knowledge discovery must also employ adaptive error correcting mechanisms to compensate for the increase in uncertainty (that’s a consequence in the increase in knowledge). This is why a Level Two system is superior to a Level One (Stimulus-Response) system.

This inability to enumerate unknown possibilities has been glossed over by many logical thinkers. We cannot ignore “the excluded middle.” That is, we cannot prove something that exists through a proof of contradiction.

“In standard mathematics, one can prove the existence of a mathematical object without “finding” that object explicitly, by assuming its non-existence and then deriving a contradiction from that assumption.”

There is an area of mathematics ( See: Constructive mathematics ) that does explore this notion if a more in-depth and formal manner. Incidentally, this connection to Constructive Mathematics, I serendipitously was made aware of from a tweet by Joscha Bach:

I’ve not heard of constructive mathematics previously!

This tweet leads me to my own slightly modified interpretation:

Gödel: Classical math has limits in simulating classical math

Turing: Computation can simulate other computation

Church: Computation has limits in predicting other computation

Smith: Computation is unlimited in defining classical math

Here I make the connection that the informal notion of computation and the formal notion of constructive mathematics.

Albert Einstein in Induction and Deduction in Physics (1919) wrote:

‘A theory can thus be recognized as erroneous if there is a logical error in its deductions, or as inadequate if a fact is not in agreement with its consequences. But the truth of a theory can never be proven. For one never knows that even in the future no experience will be encountered which contradicts its consequences, and still, other systems of thought are always conceivable which are capable of joining together the same given facts.’

Just because something is shown to be true most of the time, doesn’t mean it actually will always true. That’s the limitation of learning via induction. Just so you understand this, I leave you with another tweet from John Carlos Baez where he surprisingly observes:

The meta-pattern that you find in the above equations is that each subsequent equation accumulates the same information from the previous equation. It’s analogous to a self-iterative function but different in that it doesn’t accumulate state, but rather it accumulates symbolic expressions. When you find this meta-pattern of self-iteration or alternatively self-reinforcement, then the causal entropy must increase with new knowledge.

Further Reading

Five Stages of Accepting Constructive Mathematics

Ignorance and Nescience

Ignorance and nescience may be considered the same, but when looking at the core etymology of each word, we discover…

evolveconsciousness.org

[cond-mat/9907412] The rate of entropy increase at the edge of chaos

Abstract: Under certain conditions, the rate of increase of the statistical entropy of a simple, fully chaotic…

arxiv.org

The Inadequacy of Shannon’s Information in Understanding Cognition

What is information? Aaron Sloman has been exploring this question for quite a while. Sloman argues that we cannot…

medium.com

Deep Learning Unknowable Knowns

One good way to frame the question of the limits of Deep Learning is in the context of the Principle of Computational…