An interview with Monica Anderson
Artificial General Intelligence (AGI) is an emerging field aiming at the building of “thinking machines”; that is, general-purpose systems with intelligence comparable to that of the human mind. What is currently labeled ‘artificial intelligence’ is largely narrow automated knowledge work, lacking the flexibility and adaptability seen in animal intelligence. The pursuit of AGI begins at a foundational level, asking fundamental questions about models of cognition, knowledge acquisition, making choices through reason, thinking and conceiving the world in adaptive and intuitive ways.
An interview with Monica Anderson, an AGI researcher from California.
Monica L. Anderson is CEO of Syntience Inc., a Silicon Valley research corporation developing a novel natural language understanding algorithm. In her 20 year career in traditional Artificial Intelligence (1980–2000) she has developed several programming languages, created three expert systems for Cisco, worked with Natural Language Processing at Kanisa Inc. and Search Quality at Google. She holds a Master of Science, Computer Science with a minor in Electrical Engineering from Linköping University, Sweden 1985.
Watch this “Dual Process Theory” video as a primer to her work.
Monica, what is currently marketed as “Artificial Intelligence” is mostly narrow, or “artificial narrow intelligence”, which I among others have associated with specific knowledge work. Do you agree with this categorization and if so, what are the limitations of existing “AI” approaches that prevent it from becoming the basis for AGI?
The field of AI was turned on its head in 2012 when Deep Learning, a Neural Network technology invented by Geoff Hinton and others started winning competitions in several important sub-domains of AI such as image recognition, speech understanding, signal processing, Go, and Atari video games. Thousands of researchers left traditional AI and jumped on the bandwagon, and an influx of younger talent has resulted in a deluge of research papers in a volume that no human can keep up with. Neural Networks had been explored before and had failed, mostly because our computers were much too small. So we now have contenders for a “New” and an “Old” AI. It is not the first time this has been claimed, but this time there’s something behind those claims.
The main capability we can observe in the new systems is that they can “Perform Autonomous Reduction”. This is my term for the ability to look at low level data, such as sensory input from a camera, a microphone, or characters in a book, and discover the high level meaning behind the pixels, sounds, and characters.
This is a capability we’ve only previously seen in much weaker forms in systems like Genetic Algorithms. And this ability to perform Reduction is exactly what was missing in the “old” kind of AI. For many good reasons, as it turns out.
You’ve equated current AI approaches to the “reasoning” and “slow” part of the Dual Process Theory (James, Kahneman, et al.), and point out that brains spend 0.001% on reasoning and the rest on “understanding”. You’ve also said that AI’s Reductionist approach solves only “toy problems” and what is needed are “understanding machines”.
What’s a canonical example of a problem solved by the “understanding” of an AGI? Are there any examples today of AGI in the real world (not in the lab)?
The state of the art in image Understanding can be illustrated by showing a computer (running the right kind of Deep Learning networks) a picture and it will tell us “There’s a woman in a white dress holding a tennis racket and two people wearing green behind her”. That demonstrates a lot of “Understanding” of what’s in the image.
Google has released a new, Neural Network based translation service that can even translate Chinese text in an image on your cell phone screen into English. In real time. Without going to the cloud for computational support — it’s all done in the phone.
Android phones have used this technology for its voice understanding since late 2012. Apple is using it now. These out-compete all prior approaches by a wide margin.
Signal processing can extract speech on the phone from a noisy environment much better than previous technologies; the differences in quality dwarf decades of minor, incremental improvements using Reductionist methods.
The use of the acronym “AGI” requires a clarification. The G stands for “General” and it is taken to mean the system can solve any problem in any problem domain. Well, language understanding is (to the first degree) a matter of training on a corpus in the target language. But there are no limitations on the target language. These systems can learn any language, to the extent they can learn any one of them. This means they have “general” language competence, but not necessarily “general” intelligence beyond language. It is a major step on the way to independence from human-crafted Models of the World, but it’s not solving many problems outside the domain of all human languages. Yet.
You refer often to “model free methods” and their role in artificial understanding approaches. An artificial neural network is “model free” but its training data (inputs) are still focused on a specific narrow objective and this data is anything but “scant”. You’ve argued that AGI systems need to “jump to conclusions on scant evidence(1)”, what is a model free method if not an ANN?
I have identified a dozen or so “primitive” Model Free Methods (MFM) that can be combined into more complex MFMs. And the Neural Networks we use are built out of those components. We can imagine how other kinds of systems built out of the same primitives could exhibit similar fractional traits of intelligence. But if your goal is to construct “Systems capable of Autonomous Reduction” then it turns out Neural Networks are close to optimal for the task. Which is why Neural Networks and Brains resemble each other at many levels — because they are solving the same problem of Reduction.
Examples of these primitive methods are trial-and-error (aka Generate-and-test), Enumeration, Remembering failures, Remembering successes, table lookup in a table the program fills out by itself while running (not done by a programmer), mindless copying, adaptation, evolution, narrative, consultation, delegation, and markets. Deep Learning Neural Networks contain tables (arrays) that contain information about past successes and failures.
Neural Networks do jump to conclusions on scant evidence. And we can use “the six pillars of Reductionism” to prove that ANNs are Holistic (Model Free): Reductionist systems generally provide:
- Optimality (the best answer)
- Completeness (all the answers)
- Repeatability (same answers every time)
- Parsimony (don’t waste resources)
- Transparency of Process (understand how the answer was arrived at)
- Scrutability (we can understand the answer itself).
Reductionist systems, such as computer programs and old style AI, could provide guarantees for all of them. Holistic systems, including ANNs, cannot provide guarantees for any of these.
Neither can brains. Which should be a clue to AI implementers everywhere.
Supervised learning is a little bit Reductionist; I’ve come to the conclusion I’m not going to fight this battle. But a lot of researchers are leaving the supervised methods behind and are moving to unsupervised and self-supervised methods, since they know that this is the only way to get to AGI, to gain freedom from spoon-fed information from programmers and experts.
Minsky and others have said that AGI will require “multiple different types of systems”(2) rather than a “unified theory” approach to intelligence, much like the human brain has numerous different systems.
Do you agree with this view? If so how would you categorize the essential different systems needed for AGI?
Not really. I believe most of the brain uses what some call “A Single Cortical Algorithm” to learn and to understand. This algorithm is the target of my research, and I have a good idea of the general structure of such an algorithm. The strategy to look for “separable components” is part of the Reductionist toolkit. It generally works well almost everywhere. But the very success of such methods makes a Holistic single cortical algorithm much harder to accept for people like Minsky, who was raised, educated, and acted in a Reductionist framework.
You like to say that AGI need only be “cattle level” to be valuable. Why is the cow your starting point for AGI, rather than a simpler creature?
Some people have suggested starting with “ant intelligence”. But just like a single neuron, a single ant is not intelligent; it is a Reductionist system following a fixed program with just a few bits of state. We cannot measure the intelligence of an ant, so building systems containing ant simulations are about as difficult to create as systems simulating neurons.
Hofstadter says an anthill is intelligent. But we don’t have good IQ tests for anthills either.
How much of our brain do we need for understanding language? Nobody knows. It is certainly not located in some single lobe like Reductionists are fond of claiming. But we can estimate it’s not 100% of the human brain, and so I settled on the size of a cow’s brain as a placeholder for “a brain of adequate size we can actually measure the performance of”
It is important to note that both ants and neurons are unintelligent. Intelligence has to emerge from unintelligent components; otherwise you are cheating.
I also use the fact that we raise cattle for food as a limit for person-hood, again, somewhat leaning on Hofstadter and his Huneker scale in “I am a Strange Loop”(3). I say “cattle level intelligence deserves cattle level privileges”. We slaughter cows for food; I say it is therefore OK to turn off your AI. It is an app running on your laptop and you are allowed to shut it down. This may well change over time, but it is the only sane viewpoint at the moment, in my opinion. I kill my AI programs dozens of times per day when I see they are not measuring up to my quality criteria.
Intelligences are not scientific. Minds are best-effort conclusion-jumping correlation-discovering Holistic machines. AGIs must be implemented the same way.
Part 2 here.
(2) 2014 interview w/Kurzweil https://www.youtube.com/watch?v=RZ3ahBm3dCk