Artificial Intuition — A Breakthrough Cognitive Paradigm
In a previous post, I introduced the Meta Meta-Model of Deep Learning. However, I did not introduce its details. A word of warning for the reader, the concepts in this section is in flux and in undergoing a lot of changes. Therefore, this article is just a reflection of my current understanding of the language of Deep Learning Meta Meta-Model. That’s definitely a mouth full, so to make life simpler for everyone, I just call this the Deep Learning Canonical Patterns. These patterns are documented in the Deep Learning Design Patterns Wiki.
In this post I will explore further the characteristics of Artificial Intuition with the goal of describing a set of patterns that can aid us in formulating novel architectures for Deep Learning. In a previous post “Deep Learning and Artificial Intuition”, I introduced the idea that there are two distinct cognitive mechanisms, one based on logical inference and another based on intuition. At least 6 decades have been spent exploring cognitive mechanisms based on logical inference without making much progress towards AGI. Deep Learning, a breakthrough discovered in 2012, revealed an alternative promising research approach based on the a different cognitive paradigm.
In the field of Psychology, Kahneman and Tversky researches the interplay of these two kinds of cognitive function in a book “Thinking, Fast and Slow”. The book has been highly praised:
New York Times columnist David Brooks recently declared that Kahneman and Tversky’s work “will be remembered hundreds of years from now,” and that it is “a crucial pivot point in the way we see ourselves.” They are, Brooks said, “like the Lewis and Clark of the mind.”
Kahneman’s book explores human cognitive biases and employs the dual cognitive processes as a root cause of these biases. In this post however, I will be exploring system 1 (i.e. intuition), more specifically artificial intuition and the mechanisms that give rise to it.
The origins of Deep Learning of course has had a long history. The approach originates from the Connectionist approach and derives much of its philosophy from ideas found in the Complexity sciences (see: “Tribes of AI”). In a nutshell, the idea is that emergent complex behavior can arise from simple mechanisms. Chaos and complexity are the two driving forces that exist in complex systems. I wrote earlier about the connection of these in a post “Chaos, Complexity and Deep Learning”.
Our goal then is to either explain or better understand how emergent features arise through chaos and complexity. Here are some key features and some questions that require good answers:
Self-Organization: How does a system self-organize itself so that behavior required for survivability are encouraged and destructive behavior discouraged? How does complex organizational structure arise from simple structures?
Robustness: How does a system organize itself to become more tolerant to failure? How does a system gain the adaptability required to survive in unexpected environments?
Diversity: Adaptability and survivability requires diversity that may be less optimal than a homogeneous solution. Mixture of experts or ensemble methods point to the value of diversity in improving predictability.
Abstraction: How does a system learn the abstractions required to perform accurate predictions in a hostile complex environment? How does generalization arise from the learning of abstractions?
Adaptation: What mechanisms of adaption are necessary to compensate for incorrect predictions? How can a system forget learned behavior that may be detrimental to its survival?
Bounded Prediction: How the computational resources to perform predictions be bounded such that they can me made in a timely manner important for survivability? How can a system learn to optimize its predictions to fit within fixed bounds?
Coordination: How can a system learn to coordinate its actions with other participating actors? An environment not only includes inanimate objects, but also other systems that have learning capabilities. How can a system not only learn its environment but also learn how to interact with other learning systems?
These features of complex adaptive systems all relate to a previous discussion on “3 Essential Deep Learning Abilities”. That is Expressibility, Trainability and Generalizability. One of the clear traps that exists among practitioners is that we can inadvertently bring in detrimental methods that originate from our mathematical or engineering training. That is, we take ideas such as the need for optimality, the requirement for sparse solutions, the need for interpretability and understandable solutions, the need for completeness and repeatable guaranteed behavior. These needs are of course desirable, however we should not optimize for these as a starting point. This leads to pre-mature optimization, an idea that we are all familiar with in computer science. Rather, we should all embrace first complexity and chaos and work out solutions that holistically incorporate these as a given. Research in a Deep Learning is a major paradigm shift and thus requires a different kind of thinking.
The first big conceptual leap that we have to make is to understand that learning systems evolve in non-equilibrium settings. I write about this in brief detail in a post “Non-Equilibrium Information Dynamics.” Stated in a different way, researchers should be very cautious about employing statistical or alternatively bulk thermodynamic metrics in their analysis of these systems. It is my belief that one of the most glaring inappropriate tools in the study of AI is the use of Bayesian methods. I can understand its utility in the domain of logical inference, however I doubt its effectiveness in a domain of intuitive systems.
The second conceptual leap is to understand that our of what “Generalization” means is quite grossly inadequate. The use of the term in Machine Learning is extremely liberal. Furthermore, the Machine Learning approach of ‘curve fitting’ and thus interpolation and therefore generalization between adjacent points in the fitted curve, breaks down under the recently discovered notion of rote memorization of Deep Learning. How rote memorization can lead to generalization is a fuzzy idea at best. In fact, Kahneman’s research points out that human cognitive biases exists because of flawed reasoning in our intutitive system 1 inference. Said in otherwords, very poor and flawed generalization. To conclude, rote memorization leads to a kind of generalization that is inherently flawed!
A third conceptual leap is to accept that Deep Learning systems may be computational systems just like Von Neumann computers. The primary difference is that there is a discovered mechanism (SGD) for these system to learn from data as opposed to computers that require programmers. Neural Networks are usually treated as continuous dynamical systems. Deep Learning systems have one common requirement, in that the computational layers must be differentiable. Computers by contrast do not have differentiable subcomponents. Cellular Automata (used in Evolutionary paradigms) also do not have differentiable components. Cellular Automata and Genetic Algorithms aren’t as successful in learning from data as DL. Yet, if all DL does is rote memorization, then they aren’t very different from Von Neumann computers. At the core though, DL consists mostly of threshold units that are not that remotely distinct from NAND/NOR gates that we find in logic circuitry. Why then is differentiability such an important requirement for trainability? Are continuous dynamical systems a real requirement or are we overlooking a more general principle? Why does SGD lead to learning? As we track the latest research in DL, we are beginning to discover that DL looks more and more like Von-Neumann computers (see: “Conditional Logic”) and less like the simple dynamical systems we find in Physics. I think we can draw some inspiration in the complexity of ‘random boolean networks’ that describe biological processes.
There remains plenty of open questions on the true nature of artificial intuition systems. Mankind has stumbled on a kind of artificial intuition in the form of Deep Learning, however is it possible to discover other kinds of architectures that exhibit similar capabilities? As of this writing, we have not found alternatives. However, one should realize though that we know of at least two kinds of architectures that lead to intuition. That is Deep Learning and biological brains. Although these two systems are functionally similar, the computational mechanisms are like night and day (see: “Misconceptions of Deep Learning”).
P.S. In a related news, MIT is mining the intuition of its students to arrive at better algorithms for planning.