Your Personal AI (PAI): Pt 4 — Deep Agents (Deep Learning and Natural Intelligence)
The Brave New World of Personalized Smart Agents & their Data
A Multi-Part Series
This is an excerpt from my book, The Foresight Guide, free online at ForesightGuide.com. The Guide intros the field of professional foresight, and offers a Big Picture view of our accelerating 21st century future.
Why will our smart agents and PAIs soon become as indispensible as the web and our smartphones are today? Why will most of us be joking — and some of us seriously thinking — that our PAIs are “our better selves” in 2040, and for some of us, even 2030? To understand this key aspect of our global future, our next two posts will take a deep look at deep learning, a new paradigm of not only machine learning, but of future computer development.
This will be a long post, as it is about the technology behind the greatest story of our collective future, the advent of machines that think and feel like us, so I make no apologies for its length. Plenty of people will write the short versions. But there are many doubts and misconceptions on these topics, so the length will hopefully clear up a few of both.
There are also some rewards at the end, to make up for this post’s length. The first reward is the “PAI superlongevity” (a new “superpower”, the ability to live as long as we consider ourselves useful) and “Mind Meld” (aka, “Merging With our PAIs” prediction. It describes how humanity will increasingly use these highly personalized and naturally intelligent technologies — deep learning and personal AIs— to solve the “death problem”, in a few ways we’ve never seen before. The second reward is much more prosaic, some good investment tips and a few calls to action at the end.
Like any long-term exponential process, the growth of PAIs will start out looking slow, then become lightning fast, and at some point, we’ll see PAIs as simply a natural extension of us. For a nice intro to the power of exponentials, see Adrian Paenza’s Ted Ed lesson, How folding paper can get you to the moon (2012). Fold an ordinary piece of paper twenty-three times, and you get to the top of Big Ben. That’s surprising. Fold it twenty-two more times, you get to the Moon. That’s difficult to imagine. Fold it fifty-five more times (100 total folds, or informational doublings), and you’re now eight billion light years away from Earth. That’s almost impossible to imagine. But that’s how exponentials run. So knowing where they operate most on Earth (infotech and nanotech), and when they’ll end, has become critically strategically important.
It was nice to see Klaus Schwab, Chairman of the World Economic Forum, promote acceleration-awareness in his Fourth Industrial Revolution theme at Davos 2016, and in his new book, The Fourth Industrial Revolution (2016). But this message about exponentials is likely to continue to be ignored by most of the folks who should get it, for the time being. Most politicians, policy, institutional and corporate leaders are still stuck in old ways of thinking. They are definitely ignoring deep learning, and lacking any understanding of its major implications on their strategy, partnering, R&D, operations, marketing, business development, and corporate foresight work. But the longer they wait, the worse their competitive position will be.
For about five years now to insiders, and three years for everyone, it has been clear that deep learning, which uses reinforcement-based hierarchical neural networks and other variations of brain-inspired computing, will increasingly take over the field of machine learning, and perhaps in its next feat, the entire field of high-end computer design.
With these bottom-up approaches, code (and with computer design, the circuits) are grown, trained, and tested. It is not built by humans. While we have a basic logical and mathematical understanding of the inputs, we can’t understand or describe the algorithms or connectivity that emerges. As with our own brains, its complexity exceeds our minds.
The people who make these deep learners will increasingly not be doing what Sam Arbesman, author of the great new book Overcomplicated: Technology at the Limits of Comprehension (2016) calls “physics thinking”, where math, logic, rationality, and engineering dominate. Instead, they’ll be doing “biological thinking”. Beyond their basic architecture, which is very likely built to exploit some still-poorly-understood version of Bayesian (probabilistic) learning, the vast majority of the evolved internal complexity and connectivity of these deep learners won’t be describable by human-understood science. Here’s the key point to understand: Most of our top-down models in physics, math, logic, and other tools of our limited rationality will continue to fail to explain their higher features — they are too complex for such conceptual reduction.
What we’ll have instead is what we are now building in biology — a set of low-level (molecular, cellular, physiologic) and “bottom-up” learning and control algorithms, another set of grossly useful generalizations and analogies about emergent high-level forms and functions, and a lot of practical experience and intuition with what kind of data, training methods, and selection environments have been best, so far, at creating and improving the performance and intelligence in biological systems.
Recall NVIDIA’s work on self-driving cars, mentioned in our first post. Central to their approach is a toaster-sized supercomputer sitting in the car, the Drive PX, running a neural network that does computer vision, and which talks to a second neural net in the cloud, DriveNet, both of which “see” the world in a way roughly similar to the way human brains see (more on that later in this post). These learning networks are grown and trained like babies, not coded by humans. This brain-like approach in software and eventually in hardware will migrate to our industrial and domestic robots, and be at the heart of all our most complex systems.
The Next Chapter of Machine Intelligence
The next chapter in machine intelligence will involve what is called biologically-inspired computing. See Floreano & Mattiusi’s Bio-Inspired AI (2008) for a good older overview. When we borrow deeply from biology to guide our hardware and software design, we take advantage of the only known methods of making increasingly self-improving technology — the methods that led to our own emergence. Like biology itself, bio-inspired methods are mostly bottom-up and self-directed, rather than top-down, human-directed, or engineered. A mostly bottom up, and slightly top-down approach is how our own genes work in living systems, as the field of evo-devo biology demonstrates.
In my opinion, IARPA’s 2016 $100M MICORNS project (Machine Intelligence from CORtical NetworkS), seeking to reverse-engineer the structure and function of mammalian cerebral cortex to improve the performance of our deep learning software, is the best public money the US has spent on science and tech funding in the last 10 years. MICORNS (stupidly acronymned as MICrONS, just re-acronym it as MICORNS and you’ll remember it) employs powerful new tools in automated connectome imaging in dead and chemically-preserved brains, and in realtime observation, using two-photon confocal microscopy, of learning and control processes in living brains. Here are two excellent 2016 articles (Scientific American, SingularityHub) that explain the profound and largely unrecognized benefits of MICORNS for the world.
The money the US is spending on these reverse-engineering technologies through the 2013 BRAIN Initiative ($150M in 2016) is on par with what Europe committed to the 2013 Human Brain Project (HBP, $1.3B over 10 years). But while we smartly took a bottom-up approach, working on maps, tools, and data, the Europeans awarded all their money to a longshot top-down approach guided by one overly-ambitious person, Henry Markram, trying to simulate cortex in supercomputers, based on what (little) we presently know. By 2014, over 800 neuroscientists had signed letters saying the HBP project was doomed, and they wouldn’t cooperate with it. By 2015, the Eurocrats realized they needed to copy the US approach.
As we’ll see in this series, the science and technologies around deep learning are presently the greatest lever for improving the human condition, as all our science, engineering, and society (think of intelligent agents) will improve greatly as well. This post makes a case that a very bottom-up and user-involved approach to agent and PAI development, with key roles for open source, open data, modularity, and mass user testing and training, will allow us to make our software and computers more like our biology, and in the process, achieve our best individual and collective futures.
In a more foresighted world, the US would be funding ten times as much research on the neuroscience, computer science, and engineering of smarter computers. There are only three teams, at five institutions, being supported by MICORNS at present, and $100M is a pittance. Fortunately, China may jump in to this reverse-engineering work soon as well. China already outspends the US on deep learning research and nanotechnology research, even though they have only a third of our discretionary budget. It’s sad that the US has ceded funding leadership in these critical S&T domains to another nation. Eventually we’ll realize that is a major foresight mistake. See the 2016 OSTP report, Preparing for the Future of Artificial Intelligence, for details.
“Artificial” intelligence (AI) is a good description of where computer intelligence sits today. This intelligence is human-constructed, simplistic, and brittle. It feels as natural as a building, which can’t adapt beyond its design, and begins decaying as soon as humans stop repairing it. A single bit out of place in a configuration file in many current software systems can cause total system failure. We program most of today’s computers top-down, using rational, logical, engineered approaches. They aren’t yet autopoetic, or capable of self-replication and adaptation.
But they will be. They’ll be not only robust to error, but antifragile. That means they have not just security, but immune systems, which learn from catastrophe and error. Catastrophes and errors actually make antifragile systems stronger, just as dirt and infections strengthen our biological immune systems. Rather than being “built”, it’s better to say they’ll be “seeded”, grown, and trained. Folks like Dipankar Dasgupta have been researching artificial immune systems (AIS) for twenty years, with little recognition by mainstream computer science. Here’s his latest book. I am convinced that the better we understand neuroimmunology, the better we’ll realize that the combination of bio-inspired computers and technological immune systems are the only reliable and proven path to real security, in both biology and technology.
We will address NI safety in our post on Safe Agents. If naturally intelligent (NI) machines prove to be rapidly self-correcting and antifragile after bad things happen with them, just as living systems naturally are, and unlike almost all of today’s AI machines, it seems clear that we as a society will continue to build and use them to solve our pressing human problems.
Self-improving, antifragile intelligence is so different from today’s artificial intelligence it deserves a new name. So let’s call it “natural” intelligence (NI), and recognize that it must be deeply biologically-inspired. Again, bio-inspired machines aren’t coded and designed, but rather are grown, gardened, and tested by us, against big data and the world. They have the equivalent of both brains and immune systems, and an ever-growing ability to self-explore, self-repair, and self-improve.
The symbolic, rule-based, top-down, engineered, and human-comprehensible approaches to AI, which have delivered modest progress for fifty years, are just a small part of the human brain. You can be sure they’ll also be a small part of the machine brains to come. Our systems, software and computer designers will keep sliding toward naturally intelligent machines because working with them, once they reach a threshold level of intelligence and self-improvement ability, will be far more efficient and effective than continuing to design top-down, using the old paradigms.
We can also call bio-inspired computer hardware and software design “natural computing”, to distinguish it from the engineered, discrete, serial, rule-based, “nonbiological computing” that we still use in the vast majority of our IT systems. We saw the earliest signs of natural computing in first crude neural networks, Frank Rosenblatt’s perceptron, in 1957. But the perceptron didn’t have a good training algorithm, so this kind of computing made little progress for thirty years. A good training algorithm, backpropagation was invented by Geoff Hinton and others in 1986, and neural nets began to make progress after that. But natural computing had to wait another twenty years, for fast processors with good hardware parallelism and access to data. All told, it took fifty years for neural networks to become an overnight success.
Natural computing’s successes began in earnest around 2005, as we will see. By 2009, nonbiological approaches to machine learning began losing out to biological approaches. Natural computing includes minimally biologically-similar hardware, like NVIDIA’s Pascal (optimized for running neural net software, but not yet deeply biological), and more strongly biological hardware like IBM’s SyNAPSE and other neuromorphic chips, and a wide array of biologically-similar machine learning software and algorithms, like recurrent and convolutional neural nets, reinforcement learning, hierarchies, modularity, swarm intelligence, evolutionary developmental methods, and much more.
Bio-inspired computing methods includes biomimicry (biomimetics), the imitation of models, systems, and elements of nature to solve human problems, described well in Janine Benyus’s Biomimicry (2002). But they also take us beyond biology, which the word biomimicry doesn’t convey. Naturally intelligent computers will do things biology can’t, at speeds biological brains will never reach. They will learn to replicate, and generate their own adaptive complexity and intelligence, far faster and more stably than we ever could.
Our naturally intelligent PAIs will help us with many things, as this series seeks to address. But of everything our PAIs can and will help us with, thinking about how they will advance evidence-based thinking and collaborative scientific and technological research, and where that will take us, is perhaps the most exciting of all our opportunities ahead. Demis Hassabis, CEO of Google DeepMind makes that point in this lovely 14 min video at Falling Walls 2015, which is well worth a watch.
Even though deep learning systems are nowhere near as complex yet as biological brains, they will keep learning and operating at least seven million times faster than biological brains, which are limited by electrochemical rather than electrical communication speeds. So it won’t be that much longer before they “learn their way up” to our level of complexity. In fact, this NI future seems so useful and powerful, I predict future science will show it is a developmental outcome that emerges on all technological planets, an “attractor” that humanity cannot avoid.
Many people currently talking about machine intelligence are still missing the increasingly bio-inspired, bottom up, and evolutionary developmental (evo devo) nature of the new generation of machines. They still think in terms of the top-down, rationalist, engineered way that most machine intelligence has emerged to date. But that top-down approach depends on our slow and limited biological human minds to grow it, and has far less potential than the bottom-up, self-replicating methods now emerging.
Top-down, rational design schemes for creating machine ethics and engineering “safe AI” in our PAIs and robots will always be very limited in usefulness, in a world of increasingly bottom-up NI systems. Even in today’s rationally engineered computing environments, all our leading computer science algorithms and data structures are actually not fully rational, they are rationality-guided but computationally incomplete guesses at how to represent the world in a useful way. Logic, rationality, probability theory, and other top-down tools let us make better guesses, but they are still just guesses
Most fundamentally, all most complex things in the world, including life and minds, are both evolutionary and developmental. That means that they are almost entirely bottom-up, experimental systems (evolutionary) with a just few empirically-found rules for top-down, systemic guidance (development). Evo-devo biology is precisely how the most complex organisms on our planet self-organized their own amazing complexity. Evo devo methods are how tomorrow’s smart machines and agents will emerge, as these methods alone allow computers to increasingly guide their own self-improvement.
In his beautifully-written book on machine learning, The Master Algorithm, (2015), recommended earlier as background reading, computer scientist Pedro Domingos identifies “Five Tribes of Machine Learning”. Each tribe has been successful, to some degree, in building learning computers to date. Domingos’ Five Tribes, and in parentheses, the current favorite algorithms used by each, are:
1. Bayesianism (probabilistic inference)
2. Evolutionism (genetic programming)
3. Connectionism (backpropagation)
4. Analogizers (support vector machines)
5. Symbolists (inverse deduction)
Deep learning, which we’ll discuss at length in this post, is a kind of Connectionism, the Third Tribe on this list.
When we ask ourselves to write the story of life’s Intelligence Emergence Stack — the evolutionary developmental hierarchy in which intelligence emerged in living systems on Earth, there are good arguments that biology followed the order laid out above. This is not Domingo’s order in his book, as he does not (yet) view the universe from an evo devo perspective. But I for one am hopeful that one day, he will.
Let’s briefly back up these claims:
- Bayesian Intelligence. Molecular precursors to our first cells must have used chemistry to do probabilistic inference, in replicating chemical networks, to model and react to their immediate surroundings, and to support their survival, in molecular evo devo. One good book that takes this perspective is John Campbell’s Darwin Does Physics (2015). Campbell is a scholar in our Evo Devo Universe research community.
- Evolutionary Intelligence. Eventually life emerged, with its cells and genes, which are both evolutionary and developmental. While life arose from and still uses Bayesian processes, it encodes 3D form, function, and constraint at a much higher level of informational and computational abstraction than those processes typically do.
- Connectionist Intelligence. Eventually, a special subset of dominant multicellular life built neural networks (brains). All biological neural networks use a still-only-partly understood set of Bayesian inference algorithms, most neuroscientists think. But their evolutionary and connectionist architectures and abilities make them considerably more complex than what we understand as standard Bayesianism (we might call them “SuperBayesian”).
- Analogical Intelligence. Eventually, the most intelligent and dominant of these animals with brains began thinking in analogies, a process that all higher animals, including crows, can do.
- Symbolic Intelligence. Finally, humans began their runaway partnership with technology, and evolved and developed symbolic language, and later, formal symbolic reasoning in the Enlightenment (1600–1800).
As might be expected on reflection, artificial intelligence research has emerged in the exact reverse of this order. In the 1960s’ we began working on machine intelligence using top-down, rule-based and discrete symbolic reasoning —the epitome of Arbesman’s precise yet oversimplistic “physics thinking.” That was where the easiest work could be done at first, and “Artificial” was a great word to describe this entire process. Symbolic strategies made lots of early progress, and were greatly overhyped by some, but anyone with a biology background had little faith that they alone would create truly smart machines.
As symbolic progress slowed, we moved to support vector machines (analogizers) in the 1990s, a promising step deeper into the nature of intelligence. We also began experimenting with genetic programming and neural networks in the 1980s and 1990s, but each were still too early then to make much progress. In the early 1990s, we began making progress with Bayesian networks. Since 2009, as we’ll see below, connectionism, via deep learning, has become the latest important advance.
The dramatic recent success of deep learners and the return of connectionism marks a big transition, and I think we need new language for that transition. From now on, whenever we talk about the future of thinking machines, I we should be favoring the phrase “Natural Intelligence” over Artificial Intelligence, and begin phasing out that latter phrase, as it is increasingly irrelevant and incorrect.
That change of language can help signify, to those ready to hear it, just how momentous this shift to deep learning actually is. We’re finally working earnestly across all the layers of the stack. Our best strategy to build smarter machines, from here forward, is to try to recapitulate all the key intelligence innovations that nature has made to bring us to this point. What’s more, we will increasingly let our machines lead us in that journey, as they get ever more effective at their own natural learning.
Our machine learning community has a lot of work still to do in creating natural intelligence. Our current understanding of evolutionary developmental (evo devo) computing is quite primitive. Just like evolutionary biologists who continue to ignore evo-devo biology, all the processes of convergent evolution, and the way development controls evolutionary processes, today’s leading conferences on evolutionary computing, like GECCO, still don’t pay much attention to development. Evo devo computing, for its part, must be tied to the development, variation, and maintenance of connectionist networks in machines, just as genes guide a living brain’s neural networks. Finally, all of these tribes must be tied into Bayesianism. We need to understand why Bayesian methods led inevitably to the kinds of intelligences that life uses. Computational neuroscientists have built early Bayesian models of brain functions, and biologists use Bayesian networks to discover gene associations, but it will be a while before we understand evo devo systems in Bayesian terms. All this will be needed to create deeply naturally intelligent machines, and the technological singularity, in my opinion.
At present, a tiny but rapidly growing number of computer scientists now train and guide, rather than program and engineer, the new deep learning systems that are driving cars, and acting as the cloud-based “brains” behind our current smartphone agents. Most computer science will be done this way in the years ahead. Large numbers of computer scientists and users will be experimenting with and training, far more than designing or programming, tomorrow’s leading PAIs. For the future of NI, bet on evo devo, which is 95% bottom up, not rational design, or other top-down approaches. And bet on evo devo machines and the environment doing the “programming,” not human brains.
The growth of life and mind has always been a lot of evolutionary trial and error balanced by a small amount of slightly improved developmental processes, in each replication cycle. So too it seems likely to be with tomorrow’s computers. For more on that perspective, see my book precis, Evo Devo Universe (2008) and our interdisciplinary research community EvoDevoUniverse.com.
When we view the world from the wrong frameworks, life has a way of showing us our mistakes. I am hopeful that deep learning’s continued juggernaut in the machine learning space will make the many currently top-down, rationalist philosophers of AI understand the unique advantages of applying the evo devo paradigm to the future of technology. We shall see.
So we now have a rough roadmap for how the much-vaunted “technological singularity” will arrive, later this century. In fact, it is no longer a “singularity,” a point at which our models and foresight breaks down, but rather a rapidly approaching and natural transition that many of us now expect. So let’s call it a predictable phase transition in natural intelligence (NI), not a singularity, and bring it into the realm of hypothesis and science.
Why Neural Networks are So Naturally Intelligent
Let’s take a look now at neural networks, both in brains and machines, to see why they are so important to the future of postbiological intelligence.
To better understand our own natural intelligence, consider just three great advantages of neural networks (connectomes), which are at the heart of today’s deep learning machines:
First, neural networks fail gracefully when damaged by the environment, because useful information is never stored in one single place. Concepts, models, ideas, and predictions are always stored “a little bit everywhere”, represented in the number, locations, and strengths of synaptic weights. Such systems undergo what is called “graceful degradation” when damaged. As links are damaged, their performance slowly decreases, and it rarely dies all at once. In today’s artificially intelligent computers, changing one single bit in a config file can crash the whole software. Not so with natural intelligence. If a neural connection is destroyed by trauma, disease, or biochemical error, we may partially forget some aspect of the information we wanted to keep, but we can often repair and reestablish the memory by concentrating on some other aspect of the thing in question and “routing around the damage”. This is what you do when forget a person’s name but think about some other aspect of the person, until their name suddenly comes back. It’s also what you do when you walk back to the place in which you were thinking about what you wanted to do next in order to remember it, thus returning to the original net of mental associations in which you formed the idea. All human thinking and memory works in this incredible associative way.
Second, neural networks can access vast amounts of stored information in each processing step, because all information in the brain is just a few “degrees of separation” (switching circuits) away from all the other information. Our brains have neural switching speeds of roughly a thousand times a second. Electronic transistors can switch on and off billions of times a second, making them roughly seven orders of magnitude faster (10,000,000X) at this task than biological brains. But because we store information associatively, in the number and strength of connections between neurons, we can search our memory almost instantaneously to see if we already know a concept, a name, or a face. It may take just a hundred neural processing steps to scan our entire memory, for a concept, as each step has access to so much information, due to the massive parallelism of our connectome. That means, within seconds, we can say with confidence whether we know something, have a partial memory of it, or it feels fully new, at least according to our current search — of our entire brain! Conventional serial computers cannot do this. Even though they are billions of times faster, they are not parallel, or naturally intelligent. Each search step accesses so little information, that trying to search a similarly large database takes forever. They can’t make realtime, dynamic estimates of what they know and don’t know. But deep learning systems, especially hardware based ones, can do this. They remember like us.
Third, neural networks are always simultaneously comparing a vast number of parameters of anything of interest, as they both remember and think via synaptic connections, and doing something the machine learning folks and statisticians call dimensional reduction. Connectomes offer the most powerful informational and computational architecture that we know to continually explore, and efficiently reduce the dimensionality of, a “hyperparameter space” of large numbers of potentially interacting parameters.
Our associative brains are the ultimate “relational databases”, relating everything to everything else. The central problem of intelligence is always the appropriate mapping, fanning out (evolution), and pruning (development) of a mind, to best navigate the combinatorial explosions of possible representations of reality (model parameters). See Alice Zheng’s (@RainyData) “hyperparameter tuning” post for more on this “metalearning task” (something that must be done prior to actual learning).
When they are properly connected, neural networks can quickly sift and pay attention to just that small combination of parameters that seem most adaptive to the problem at hand. Associational architectures quickly “fan out” (an evolutionary process) into a vast number of possible associations, and then just as quickly “fan in”, or prune (a developmental process) to just the information that they think is still worth attending to, and this process is how we make predictions.
This ability to continually fan out and fan back in, while simultaneously comparing a vast number of competing information sources to form an intuition, a model, a prediction, or a plan, is an evo devo process that allows us to elegantly manage a torrent of incoming information, and simultaneously compare thousands of potentially relevant parameters in the world. Again, conventional computers can’t do this. But deep learning systems are learning how, which means they will increasingly not just remember, but also think — like us.
Neural networks aren’t perfect. Whether biological or technological, they can and do eventually become overtrained. Their weights can become inflexible, like an old human mind that has been trained exclusively on just one type of data and can no longer see other points of view. But we can get out of that trap by rejuvenating them, opening new connection space, and retraining on new data. We are a long way from figuring out how to do that with human biology, but we are already learning how to do that renewal with many of our deep learning machines.
As another major current limitation, today’s artificial neural networks also are not “compositional”, meaning they don’t yet know how to combine different pieces of information sequentially, in different ways, to do chains of thinking, following sequential rules. So the symbolic processing that today’s computers can do very well, and humans can do to a limited degree, needs to emerge in the deep learning networks of the future, to move them fully into natural intelligence. But we’ll get there, by better understanding our biology, and porting over more of the kinds of specialty processing it uses into our machines.
As Kerri Smith reports in How to map the circuits that define us, Nature, 9 Aug 2017, when you incorporate their size, shape, firing speed, receptor types, and what genes they express, some neuroscientists expect that mouse (and human) cortex has as many as 10,000 neuronal types. Then there are the networks themselves, typically small world clusters of various types, loosely linked or sequentially chained together to do valuable things. So there is tremendous fascinating complexity there. Fortunately our ability to map and upload it all is also growing at superexponential rates. Check out David Cyranoski’s piece, China launches (a new) brain-imaging factory, 16 Aug 2017 for one of several exciting recent examples.
So as neuroscience keeps advancing, we’ll keep using all the brain’s neural network structure and algorithms that we can copy, algorithms we will likely never fully understand, and create experimental versions of them in our hardware and software. We’ll train those neural networks with data and our feedback, not program them. Those systems will in turn themselves run vast numbers of new experiments, in their reconfigurable hardware, in their software, and in the way they interact in the world. Many of those experiments, of course, will be initiated by our PAIs and agents, and run on us, and the world. They’ll learn just like a baby learns, with progress and failures too, but constantly getting better by trial and error.
In a famous recent example, Google DeepMind’s deep learning network learned by itself how to play 49 video games from the Atari 2600, with no human training, in Feb 2015. It was immediately better than the best human players on 23 of these games, and in a few games, like Breakout, it uncovered optimal play strategies that humans didn’t realize were available.
As mediated reality grows (our last post) deep learning-backed software agents will be able to learn even faster from many virtual realities than from physical reality, once enough data and accuracy are in the simulation. They’ll continually take their most useful virtual learning back into the physical world. Learning is particularly rapid in virtual space because more iterations can be tried faster, as long as computational power and simulation detail are sufficient, with no risk to physical life, and with much less need for physical resources.
Biological neural networks do this virtual world simulation constantly already. It’s called dreaming, and imagination. So do deep learners now. See the dramatic visual examples of “inceptionism” by Mordvintsev et al. at Google for how today’s deep networks can “dream” or “imagine” the world around them. I’ve got a few of these artworks on my wall now, to remind me that our most bio-inspired computers are just now learning to dream, in limited ways. It’s truly a brave new world!
Again, remember that evolution and development in electronic systems, whether hardware or software, can happen far faster than in human brains. Evolutionary pattern recognition (thinking, imagination, dreaming) runs at roughly 100 mph (the speed of neural communication) in human brains. That’s fast within a small human brain, and this speed keeps us alive in the world, but the same processes run at the speed of light inside dynamically reconfigurable hardware-based neural networks, in neuromorphic chips. That’s at least seven million times faster than human brains. So you can see where all this is going.
Deep Learning: 2005 to the Present
Let’s do a quick recent history now of the most recent star of natural computing, deep learning, to see it in broader context. Again, deep learning is a type of bio-inspired computing that uses neural networks of different varieties (hierarchical, recurrent, convolutional, goal-directed, reinforcement-driven, etc.). It is the hottest new area of machine intelligence, and like any rapidly improving area, it is easily overhyped, especially for what it can deliver in the next five years. But beyond that, all bets are off with what these systems can deliver. They’re on the path to natural intelligence.
An interesting and unconventional place to begin our deep learning story is in 2005. In that year, Moore’s law in MOS integrated circuits ran into the first of a series of endings that will increasingly move us out of its fifty-year long “magic shrinking transistor” paradigm. All exponential growth in any substrate can only run for so long, then it must jump to a new substrate. 2005 brought the end of something called Dennard scaling, which meant that chips got too hot (leaked too much current) if you shrunk them any further, so so around that year chip companies began producing multicore CPUs. The chip industry didn’t want go multicore, as no one knew how to connect multicore chips in useful ways (parallel computing). But the end of Dennard scaling forced them to start making a bunch of first-gen, weakly parallel CPUs. As miniaturization limits grow, Intel’s former Chief Architect, Bob Colwell, predicted in 2013 that Moore’s law will be totally “dead within a decade.” If you care about natural intelligence, please pray for that prediction to be true! Only then will deep learning truly dominate, in both hardware and software domains, as we’ll see.
As Moore’s law was hitting its first ending in 2005, companies like NVIDIA which had had been making graphics processing units or GPUs to run video games since the mid-1990s, began realizing they were in a unique position to take a leadership position in the future of machine learning. At first, their chips used simple parallel processing in hardware and software, primarily for graphics. But as the video game industry exploded, GPUs rapidly improved their performance, with performance doubling times that were much faster than for CPUs (often doubling their performance per price every 12–16 months, instead of 18–24 months). By the late 1990’s, GPUs, not CPUs on motherboards, had become the best places to run the computationally intensive algorithms being used by the machine learning community. These simply parallel GPUs, in the graphics cards on our desktop computers, running our ever larger screens and our video games, can be thought of as Earth’s first mass-produced weakly bio-inspired hardware brains.
Thus 2005 can be argued as the the time when the chip industry began to move from “miniaturization exponentiation” into “parallelization exponentiation”, doubling the number of processors and circuits that can work together simultaneously in useful ways. Parallel exponentiation is much harder, because we humans don’t know how to best connect up parallel systems. When we were in the middle of the Moore’s law era of continually shrinking circuits, attempts to build massively parallel machines, like Danny Hillis’s impressive Connection Machine in the 1980s, unfortunately just couldn’t work. Their hardware became obsolete almost immediately after they were built. But just as importantly, we had no idea how to program those deeply parallel machines, and no incentive to do so, as we got so much more performance return by continuing to shrink standard, nonbiological, and serial Von Neumann computer architectures.
Fortunately, biology has had billions of years to make massively parallel self-improving systems, and after 2005, computer hardware and software begin to get parallel enough for us to start using bio-inspired methods. On my website in 2002, I predicted we’d need an end of Moore’s law and a rise of massive parallelism, neural nets, and bio-inspired computing to get real machine intelligence. So I’ve been gratified to see these emerge over the last decade.
Scholars who publish on exponential technology growth, in journals like Technological Forecasting & Social Change, tell us that individual exponentials always end. But if we live in a universe where nanotech and infotech are special, as I argued in Post 3 (The Agent Environment), then whenever any productive technology exponential ends, it creates technical and market opportunities for new exponentials to emerge, out of nanotech or infotech strategies that couldn’t work before. So as exponential miniaturization of digital circuits began to end in 2005, we created the first real opportunities for exponential parallelization of those circuits, and thus deep learning, to emerge. That new exponential is now the one to watch. The bottom line, for those of us who do foresight work, is be very careful to identify the appropriate exponentials relevant to our problem. They may not be the ones that most people are thinking about.
Ironically then, the beginning of the ending of Moore’s law is one of the best things that has happened to machine intelligence. As chips are stopping their magic shrinking game, it is becoming economically possible, for the first time ever, for chip companies to massively parallelize them, bringing more brainlike machines, what we can call Natural Intelligence, to the world. Artificial intelligence is top-down, human engineered machine learning. We’re moving out of that paradigm right now. Natural intelligence is bottom-up, self-guided, and deeply biologically inspired.
Natural intelligence will be the future of our most advanced CPUs and GPUs. They’ll become increasingly neuromorphic (brain-architecture inspired), like the experimental SyNAPSE chips by IBM and others, and those architectures will be controlled by technological versions of genes, hardware description languages that can evolve, and that each specify the kinds of neural network architectures that develop in each replication cycle. Again, human beings won’t program these naturally intelligent machines, as we aren’t smart enough, but I’m convinced they’ll be tomorrow’s best self-learning systems.
Let’s jump ahead now to 2009, another big year in the deep learning story. Neural networks can’t work well unless they have a lot of data to crunch, as well as machine learning professionals who believe crunching all that data will yield powerful results. In that year, Halevy, Norvig and Pereria of Google published a seminal opinion paper, The Unreasonable Effectiveness of Data, which described big progress being made in statistical, associational approaches to speech recognition, language translation, and language understanding. This widely-discussed paper was an important signal, both to machine learners and the technically literate community know just how important both statistical approaches and web scale data were becoming, and would increasingly be to the future of machine intelligence.
Also in 2009 a type of deep learning system called a Long Short-Term Memory network, developed by my friend Juergen Schmidhuber and his team at IDSIA in Switzerland, became the first deep learning system (recurrent neural network) to win an international machine learning competition, against other traditional, much less bio-inspired approaches. Their system won first for handwriting recognition (ICDAR 2009), then later for traffic sign recognition (IJCNN 2011), then for a variety of image recognition tests (ISBI and ICPR 2012). Their 2011 win was the first to achieve what Schmidhuber calls “superhuman performance” in complex visual recognition, beating humans at recognizing traffic signs in the wild.
In 2010, Kaggle, the leading predictive modelling competition platform emerged, creating a new place for data scientists to openly compete to produce the best predictive software. They’ve grown to half a million registered “Kagglers” since. Many of the world’s deep learning practitioners engage in contests and share code on Kaggle today.
In 2011 and 2012 academic teams using neural networks again won character recognition, traffic sign recognition, and medical imaging tests against other machine learning approaches. The ILSVRC 2012 ImageNet competition was perhaps the turning point event for deep learning, as neural networks were so successful discriminating images on ImageNet (a common image data set used by machine learning community) in that competition, that most machine learners then turned away from hand-built “feature engineering” toward unsupervised feature learning using deep learning. Google, Facebook, Microsoft, and other majors immediately noticed this change and began acquiring deep learning research teams and startups around the world.
By 2011, NVIDIA was also doing increasingly complex parallel hardware and software design, using their GPUs as accelerators for large financial and supercomputing clients. After 2012, inspired by deep learning’s advances, NVIDIA began to plan a major pivot of their company toward artificial intelligence, to try to sustain their manufacturing leadership position in this rapidly emerging field.
The success of deep learners entered the public consciousness in June 2012, with John Markoff’s New York Times article, How Many Computers to Identify a Cat? 16,000. This article described Andrew Ng and Jeff Dean’s team at Google, which used 16,000 processors, in a network of one billion connections, that identified cats, and other objects, from 10 million YouTube videos, using an unsupervised (autonomous) approach.
This Google Brain network is a nine layer system, only three of which are particularly complex (structures called sparse autoencoders). It could only recognize cat faces head on, while humans can recognize them in any pose. But the cat was out of the bag, so to speak :) Not just industry insiders, but techies everywhere began following the deep learning story which has been accelerating ever since.
After 2012, deep learning began working well in a variety of applications, such as auto-captioning of images, in language translation, in computer vision, and in several other fields. For an excellent window into this prolific period, see Jeremy Howard, “The Wonderful and Terrifying Implications of Computers That Can Learn,” TEDxBrussels 2014.
See also Steve Omohundro’s (@steveom) great TEDx Talk, What’s Happening With Artificial Intelligence? (2016). His second slide highlights a few of the multi-billion dollar investments we’ve seen in AI over the last three years.
Let’s look at a few highlights from this most recent period. In 2014, Andrew Ng, formerly at Google, joined Baidu to build a speech recognition system entirely via deep learning. This was very ambitious, as all previous speech recognition systems had involved significant amounts of human-directed training and feature engineering. Also in that year, several companies made some huge investments in deep learning, as summarized in the slide above.
In 2015, Baidu announced their deep learning network was the first to reach superhuman performance in the recognition of short clips of speech spoken over the phone (“Baidu’s Deep-Learning System Rivals People at Speech Recognition,” Tech Review, 2015). Coincident with this, Baidu has launched a smart agent, Duer, to help smartphone users do various tasks. China is now investing heavily in Baidu and other Chinese deep learning firms, and in deep learning education, in an effort to match the West on this critical technology. If our politicians stay ignorant of its strategic value, we may eventually end up getting the lead snatched away from us, the way Britain invented the chemical industry, then lost it to the more pragmatic Germans over four decades prior to WW I.
Also in 2015, we saw NVIDIA’s self-driving car, a much more rapidly emerging, and more bottom up system than the mapping-based approach to self-driving cars that Google has been developing for ten years, since Sebastian Thrun’s team won the DARPA 2005 self-driving car competition. Perhaps the most amazing thing about the NVIDIA car was that it learned to reach near-human level performance over just six months in 2015. With the right hardware, software, the right problem, and good training data, these systems can rapidly gain human level proficiency (picture below).
The layers in these deep learning systems aren’t nearly as complex as the human brain yet. The human visual system, for example, is still much more elaborate. A task like face recognition in our brain begins with neural nets in the retina of your eye, then goes to midbrain relay nets called LGN, then goes to six layers of visual cortex at the back of your brain in V1, then to the six layers of V2, then V3 and V4, then to the fusiform face area (another six layer region of cortex, specialized to process faces) and then to individual cells, including a number of so-called grandmother cells, single cells (or sometimes, small networks) that are tuned to recognize individual faces, like the face of your nanny, and no one else. We have a ways to go before our deep learning systems are as complex as this. But we will get there, exponentially.
Facebook’s Yann LeCun (@ylecun) is a deep learning leader who is presently building the best face recognition solution available on the planet. It may have superhuman performance narrowly already, and it will reach achieve it broadly soon. The FBI launched a $1B face recognition project in 2012, but knowing how federal institutions contract such work, I predict it will be junk, and one of the deep learning IT leaders listed above will get there first.
If the FBI really wanted to get their solution on time and on budget, they could have done a few large parallel contracts with a variety of IT leaders, not defense contractors, and a majority of smaller incremental competitions on Kaggle, with tens of millions in education and startup bounties available for any small team or sole practitioner who deployed anything semi-competent during the competition.
That would spend a lot less for much faster, better and deeper technical and social returns. But taking a mostly bottom-up, evo devo strategy would have required their recognizing that face recognition is a tool all of us will have. They won’t be able to corner it, or even get there first. A society of Little Brothers (mass souveillance of each other, via our PAIs) is not only inevitable, it is far safer and more antifragile than the Big Brother (surveillance) society that some of our security leaders falsely envision.
Deep learning apps dominated NVIDIA’s GTC 2016. See this May 6th NVIDIA piece on how their engineers taught a car to drive using their Drive PX hardware and other software and lots of training data. NVIDIA shipped a board last year, the GTX Titan X, that folks can use to train neural networks on their home PCs, and they’ve got a new GPU (Pascal) and board (Tesla P100) that will be 10X faster at running deep networks, shipping next month.
In March 2016, Google DeepMind’s computer scientists and neuroscientists built a program, AlphaGo, that beat Lee Sedol, the world’s best ranked player in Go, four games out of five. Go is exponentially more complex than chess. See this lovely video for more on how a relatively small team of fifteen employees at DeepMind accomplished this amazing feat, using a blend of deep learning and reinforcement learning, and clever training and goal-development architecting for this amazing hardware and software “brain.” [Note: It also turned out, per Google CEO Sundar Pichai (@sundar_pichai) at Google I/O on May 18th, that Google built a custom ASIC chip for their deep learners, what they call a Tensor Processing Unit, which they say is ten times more efficient per watt than commercial GPUs and FPGAs. It’s great to see Google in the chip-making business for machine learning! I hope that continues.]
So we’re truly off to the races now with deep learning, and we’ll see a new generation of programmers using these increasingly biologically-inspired approaches to machine learning in the coming decade, for a vast range of uses. See Eric Siegel’s Predictive Analytics (2013) for some of the areas machine learning is already disrupting. We will see deep learning increasingly prevalent in automation and robotics of all kinds in coming years.
These successes are vindications for folks like Geoff Hinton, one of the fathers of connectionist computing’s most useful algorithm to date, backpropagation, in 1986. At that time, computers weren’t fast or parallel enough, and data sets big enough, for neural networks to deliver many human-surpassing results. Now they are, and Hinton leads a large deep learning team at Google.
They are also a vindication for technologist Jeff Hawkins, who published an influential book, On Intelligence, in 2004, arguing that a special kind of neural network, an HTM network, modeled after the human cortex, would be key to the future of machine intelligence. Hawkins and his colleague, Dileep George, now running Vicarious, made some progress with their HTM-variant networks, and they opened their platform to community use. But without the resources of a Google, Microsoft, IBM, or an NVIDIA, they couldn’t quite jump start this field at the time. They also must be smiling today.
There are now many entry-level resources for learning more about deep learning. There are tons of YouTube videos on deep learning, many on recent achievements, including cat recognizers, speech recognizers, autocaptioning, video game playing, game playing, and self-driving cars. NVIDIA has good deep learning tutorials, including Deep Learning in a Nutshell (2015). Michael Nielsen has a great free online textbook, Neural Networks and Deep Learning (2016). Presentations like A Short History of and Intro to Deep Learning, John Kaufhold (89 slides). Browse the DeepLearning.net wiki for conferences and resources. See Quora’s tags for Deep Learning, Convolutional Neural Networks, etc. Join Reddit’s Machine Learning and Deep Learning communities. For places to work or invest, see Venture Scanner’s list of nearly 1,000 AI companies. Some analysts have estimated that about a fifth of these are presently employing or developing deep learning competencies in their solutions. That percentage will obviously rise, among the future leaders.
Deep Learning Captures Real Neurobiology
Are deep learning neural networks really biologically inspired, or are they just a “toy model”, slightly useful but not complex enough to capture the way the brain actually works? A new paper by Yamins and DiCarlo, Using goal-driven deep learning models to understand sensory cortex, Nature Neuroscience 19:356–365, Mar 2016, makes a big step forward in putting this question to rest.
Their paper demonstrates that even today’s simple deep learners duplicate many powerful features of how neurons in human visual sensory cortex process information and predict visual images. It also gives research guidance to computer scientists and neuroscientists over next five years. The paper is behind a paywall, but here is an excerpt of the front page.
See also DiCarlo et. al.’s 2014 paper, which directly compares the representational performance for visual object recognition of DNNs (deep neural networks) to the primate brain, finding them both efficient at constructing representational spaces in which objects of the same category are close, and objects of different categories are far apart, even with large variations in the object example, position, scale, and background. This isn’t our father’s A.I., it’s natural intelligence, or N.I.
Papers like these show us that deep learners already strongly mimic how we mammals make sense of and remember the world. DNNs are likely still missing some of our basic algorithms however. We don’t really know because long-term memory encoding has not yet been fully cracked by neuroscientists to date, though we are fast closing in on the prize.
One of the things we do know about human memory is that its most important and basic component by far is the shape and variety of synapses of the 10,000 dendritic spines (on average) that lead into every individual cortical neuron in our brains. This gross basic connectivity and synaptic weighting is crudely captured in today’s deep learners. A good book on spines, which explores how they form neural circuits and memories, is Rafael Yuste’s Dendritic Spines (2010).
In Nobel prize-worthy work published in 2014, Steve Ramirez and Xu Liu implanted a fake memory of a traumatic event, a foot shock, into a living mouse’s brain, by altering the shape of dendritic spines in their brain with an optically sensitive transgenic protein (ChR2) and laser light, in an area called the hippocampus, which stores the most recent two days of our memory, and which writes some of those short term memories to long term memory (in cortex) when we sleep. This and similar experiments have confirmed decades-old theories that our memories are stored in the architecture and connectivity of the thousands of dendritic spines that connect every one of our pyramidal neurons to each other in our brains.
These very special neurons are 80% of our 25 billion cortical neurons, and they hold all of our higher memory and personality. Curiously, the pyramidal neurons in our prefrontal cortex, where we conduct all our highest thinking and planning, have totally maxed out the number of connections they can make to other neurons. Prefrontal cortex pyramidal neurons have on average 23 times more dendritic spines than the same neurons in our primary visual cortex. There is simply no more room around these particularly helpful neurons to make more physical connections to neighboring neurons. But there will be such room in your PAI’s neural network, you can be sure.
Many of the dynamic features of neural architecture still elude us. Most molecular features can likely be ignored in a first model, as they exist to keep biological cells alive, not to allow them to think or remember. But some dynamic features are central to learning and memory. They involve things like Attractor Networks (Scholarpedia article) and Neurotransmitter Field Theory (Greer & Tuceryan 2010), and it will take a while to figure them out. But with 30,000 bright neuroscientists attending the annual Society for Neuroscience meeting, and hundred of specialty neuroscience conferences, we’re getting closer to learning the full rules of neural learning and memory every year. If you want to study these topics further, or explore a career in this field, here’s a great free online textbook, Computational Cognitive Neuroscience (2014).
Consider this insight about our brains that recent neuroscience work has suggested. Bourne and Harris (2007) tell us that in human brains, roughly 65% of our spines are ‘thin’, 25% are ‘mushroom’ spines, and the remaining 10% are stubby, branched, or other ‘immature’ forms. See the picture at left for the different shapes. They propose that thin spines are what we do our thinking with, interpreting our sensory data and relating it to our memories and motor outputs, and mushroom spines are where we store our stable long-term memories. If this educated guess proves true, it will turn out that about one quarter of the connections in human cortex are dedicated to memory storage, and the rest dedicated to thinking, about our outside world, and our own memories. That would make us each 75% thinking, and 25% memory machines. Pretty neat, huh?
Our PAIs Manage Our Biases, and Make Us Perpetual Learners
To get back to where we are today, we know there will be many valleys and swamps to cross before most of us view ourselves as part-agent, part-biology. We’ll continue to experience social prejudice and conflict from biased, inflexible, extremist biological brains in human society, for decades to come. So besides improving our PAIs, we’ve got to keep empowering human beings, growing their empathy, and moving them from ideology to evidence-based thinking. But now that deep learning is on the scene, I believe we’ll make increasingly more progress decreasing human prejudice and bias by improving our PAIs, even more than our brains. Both strategies are important, but the first is far more exponential, for deep universal reasons.
As we come to see our agents as a natural part of us, we’ll re-understand ourselves as lifelong learners, as perpetual children, as experimenters, and as investigators. Our accelerating personal learning abilities via our agents will make us much less inflexible, dogmatic and judgmental of others. When it isn’t so hard to change our views, via our agents views, and when our agents know and have mapped our cognitive and social biases, and are helping us to manage them, every position will become more lightly held, able to be improved by the latest theories and data. At least in our PAI’s mind.
Solving the Death Problem, via Subtle Uploading
This last section belongs in a later post, Health PAIs, in this series, but I wanted to write it now and get it out of the way. Please feel free to read or skip it as interested.
When we presently die, many of us realize that most of our unique self, including, for most of us, the majority of our unique experiences, ideas, values, goals, and personality, dies with us. But once we have personal AIs, I’d like you to realize that is no longer the case. Interacting with our PAIs will be a way for our natural machine intelligence to increasingly upload us into a new substrate, whether we want to be uploaded or not. I like to call PAI emergence a process of “subtle uploading”. It is already sneaking up on people already, in tons of little ways. But even though PAI uploading is subtle and incremental, it is also powerfully exponential. Subtle uploading is a planet-scale process that our philosophers, academics, and pundits will increasingly appreciate and debate in coming years. Yet most of us, I expect, will just joke about it and get on with our lives.
Imagine what it will feel like when your 2030’s PAI, knowing your past statements and watching your facial expressions, can complete your sentences for you when you are having a senior moment. Growing up, or even as an adult, you may have played the “mirror game”, completing the sentences of someone you know well, because it’s fun to guess them and sometimes you want to help them say faster what you think they are likely to say. A computational linguist once told me that if you have two years of past data on what a person says, you can correctly guess the word on the tip of their tongue, by past context, 80% of the time. I don’t know if those numbers are accurate, but I deeply trust the general concept.
I expect our most advanced PAIs will feel like mental twins within just a generation of their commercial use. My original name for them in my 2003 article was “digital twins”, and they will have high-quality records of the large majority of our past experiences, ideas, values, and goals. Those unique aspects of self will no longer die with us, if we don’t want them to. They will be able to live on in our PAIs for our family and friends to interact with in whatever way they like.
There are of course lots of mental health implications for this. There will be empowering and disempowering ways to interact with the PAIs of past loved ones, as you can imagine. Those who die, or if they didn’t think it through, those who live on, will have to decide whether to keep them running, and if so, whether to let Google or whoever continue to make them smarter on the back end. Many of us would feel more empowered and empathic to have our parent’s PAIs continue to survive, and be improved, after they passed away. Continually improving ancestor PAIs may “interview” survivors to get their recollections of the deceased, and improve the usefulness and value of the ancestor simulation.
We’ll also have to decide how much we want to alter our ancestor PAI’s traits going forward. For example, if your biological mother always favored your sibling over you in certain contexts, or was an alcoholic, you would likely want the ability to reduce or eliminate some of those traits in the version of her PAI that you interact with after her death. In many Western cultures, we’d find those freedoms to be valuable. But in some African and Asian cultures, such modification would today be considered disrespectful to our ancestors. So we’re going to see a lot of interesting discussions ahead.
In general, once ancestor PAIs arrive in our world, I imagine that a growing fraction of us will feel a lot less traumatized by the high level of informational destruction that presently accompanies biological death.
Of all our current social problems, arguably the greatest tragedy presently inflicting humanity, happening 55 million times a year at present, a number that will rise annually until roughly 2050, is the inevitable death of each of us, due to the “disposable” (not a very nice word, but an accurate one) nature of human biology.
But personal PAIs seem likely to exponentially reduce, and then finally eliminate, the death problem. We certainly won’t see human-guided medicine or molecular biology solve this problem any time this century. See my article, Limits to Biology, 2001, if you still think biological immortality is possible, or could ever be achieved by biological humans, using science. It’s far too hard a problem for us (vs our coming AIs) to solve in any reasonable time frame.
Unfortunately, we biologicals begin falling apart, from the inside out, in scores of convergent ways as soon as we are born, due to imperfect error correction at the molecular level. Humans and today’s AIs aren’t anywhere near smart enough to solve that problem. By contrast, building self-improving evo devo machine intelligences is very much within our capability, and I could easily see them eventually growing powerful enough to understand and improve us at a molecular level.
But by the time they can do that, it also seems equally clear that we’ll consider our PAIs, and their incredible postbiological environment, as the place “we” want to go.
Certainly there will be some of us who wish to remain “native”. But if we can create postbiological consciousness, and fully simulate biology within our machines, then for most of us, staying in our “wetware” may be far too slow, frail, and limited by comparison.
As a presently mortal species, we biologicals don’t like to think too much about the scale or impact of death. Ernest Becker’s Pulitzer prize-winning book, The Denial of Death, 1974/1997, discusses many ways modern cultures lie to ourselves about death, telling ancient fables of the afterlife, fables that delay and dissuade us from solving the problem, and also cause us to avoiding the challenge of fully living, and improving our own imperfect science and morality, right here and now.
It is my hope that in coming years, as more and more of us see the unique advantages of PAIs for solving the death problem, we’ll see a growing momentum behind their development, and more efforts to remove the many short-term roadblocks standing in their way.
Once a substantial fraction of us realize that these solutions actually work, we can expect that most of our cultures will finally stop pretending that personal mental death is a good thing (in the vast majority of cases it isn’t, if that person had a say in things), and we’ll upgrade our religious faiths to be consistent with a new world of superlongevity and perpetual growth and renewal, for all who might personally desire that future.
I expect the first solution we will now discuss, PAI superlongevity, will have the biggest impact, and the second, mind melds, will be the “nail in the coffin” for biological intelligence, driving the vast majority of us into postbiological form, though it will likely take a lot longer to emerge. Let me know if you agree.
Ancestor PAIs and Mind Melds:
How Billions of Folks Alive Today May Be Postbiological, Eventually
Per pound, biological brains are the most complex things on Earth today. But they are no longer exponential. Several studies have argued that they are nearly as optimized as they can get, given what they are made of — three pounds of electromagnetic meat. Due to their imperfect molecular error correction, all differentiated neural tissue must progressively age and die, even as we each grow wiser with experience. Given their glacial slowness, their lack of exponentiality, and their inability to maintain their existence indefinitely, it seems clear that if we can, we’ll outgrow them, moving our minds into something just as natural, but better — postbiological life.
This Great Transition, involving the “uploading” of biological minds and bodies into much faster and hardier machine substrates, is the most near term way I can see that humanity will solve the problem of death.
It is clarifying for us to admit that all of humanity is already being uploaded right now into the digital world, bit by bit. We’re been doing this with all our easier data since the birth of digital computing. As deep learning advances, we’ll move on to our harder data and algorithms.
Don’t believe me? Read this amazing review, by John Lisman, The Challenge of Understanding the Brain (PDF) 2015, of how much we already understand about the brain as a computational system. As Lisman says, the first half of the 21st century is likely to be forever remembered as the period during which the brain, and all its algorithms, finally came to be understood (and by implication, replicated in machines).
That’s just how the development of intelligence apparently works in universes like ours. Everywhere complexity exists, the transition of leading intelligence from physics to chemistry to biology to biominds to cyberminds may be the only exponential path available. Thus our next great transition, when biology begets postbiological life, may be a universal developmental process that arises on every Earth-like planet in our universe.
Physicists call definable shifts to a different set of environmental dynamics phase transitions. Think of the transitions from gas to liquid to solid in our high school chemistry. A popular term for phase transitions is “singularities.” Many natural phase transitions, such as the creation of black holes from a dying star, involve exponential processes prior to the transition. When mathematician and sci-fi author Vernor Vinge wrote The Coming Technological Singularity in 1993, about the exponential 21st century emergence of greater-than-human machine intelligence, he was thinking of a phase transition. A whole new kind of world will exist after that point, one with a different set of global dynamics, whether we want that outcome or not.
I like to call the exponential Great Transition to postbiological life that we are presently engaged in a “slippery singularity”, as I don’t expect machine life and intelligence to emerge via any single definable process or event. Even deep learning is just one piece, albeit an important one, of this process. Instead, we can expect many gentle changes, most just big enough to be noticeable, each pushing us toward this future, and a steady build up of those changes until they flow all around us, like a river carrying us downstream.
Some folks will be scared of and reactive to these changes, but on average, most of the changes will be so incremental and beneficial that most of us won’t be interested in resisting them. The changes that are clearly disruptive, like growing rich-poor divides and technological unemployment, also have obvious social solutions ahead, like a Universal Basic Income (UBI), or what I like to colloquially call “taxing the machines.”
A tax on the use of advanced technology (in practice, simply a tax on income production) is arguably be the fairest way to redistribute some of all the exponential wealth we’ll continue to see. After all, it wasn’t the rich or the corporations that are primary sources of this wealth. They are just enablers. It is the machines, and the science around them, that is the most responsible. Perhaps the inventor of electricity, Michael Faraday, had something like this concept in mind when he apocryphally responded to a 19th century minister’s question “But what is it [electricity] good for?” with the retort “One day, sir, you may tax it.” Amen!
Bill Gates independently came to this insight as well, as reported in this Economist article, A Tax on Robots? 25 Feb 2017. In spirit, he’s thinking of exactly the right problem. In practice however, a direct technology tax would not be a good idea, for at least two reasons. First, an advanced technology tax would be a bureaucratic nightmare to implement. It’s hard to get a stable definition of what advanced technology even is, as it continues to morph and improve all around us. Second, the largest current source of growing inequity isn’t the growth of AI, or of automation displacing workers. That will come later.
Today, it is the rise of “winner take all” dynamics, what network scholars call “preferential attachment”, that is most responsible for wealth and income concentration in our new highly networked era. For more on that, see the economists Frank and Cook’s The Winner Take All Society: Why the Few at the Top Get So Much More than the Rest of Us, 1996. They recognized early on that the digital mass media dynamics of modern society are the prime current cause of growing inequality. The richest individuals and firms are greatly helped, by our new networked, data rich society, to get richer.
As I see it, accelerating technology will continue to create social, political, and economic inequities, and reduce individual opportunity in a variety of ways, some obvious and many less so. Smart societies, seeking to maintain productive levels of inequity, and avoid corrupting ones, can create more progressive tax structures for larger firms and individual estates (perhaps the best proxy for an advanced technology tax), much better antitrust and transparency systems, higher taxes for less accountable private firms, aggressively promote public stock ownership in public firms, deeply subsidize STEM education and entrepreneurship, and do many other effective policies.
I think how quickly and well democratic societies move toward these, I would argue, evidence-based policies, will be determined, in part, by how soon we each get political PAIs. The smarter we all get, either digitally or biologically, the more we can all more easily choose evidence-based strategies to improve our well being.
Let’s look now at three paths for how the problem of death seems most likely to be solved in coming decades: PAI superlongevity (personal AIs we leave behind for our loved ones, with useful lifespans much longer than our biological lives), mind melds, and brain preservation. All may end up being socially important, to different degrees and at different times.
Let’s discuss PAI superlongevity first. Hopefully this series has convinced you that the more we use our PAIs, the more we’ll consider them to be a part of us. A some point we’ll even consider our PAIs as “our better selves”, in some ways, as they cleverly and consistently nudge and coach us in the directions and toward the values that our conscious minds want us to go, directions, priorities, and values that we choose, in moments of conscious resolve.
As biological beings, we will remain continually subject to the many urges of our unconscious processes, over which our conscious minds are only weakly and intermittently in control. See Leonard Mlodinow’s excellent Subliminal: How Your Unconscious Mind Rules Your Behavior, 2013, if you don’t believe me on this point. But our PAIs will have no such constraints, and we’ll increasingly rely on them to become better people, in every sense of the word.
Thus within just a decade or two from now, I expect billions of folks will consider their PAIs, which were increasingly useful to them throughout their lives, as a “good enough” version of themselves to leave behind for their children, friends, and the world. We already leave behind our Facebook pages and in some families, our Gmail accounts for our children, and with PAIs managing all that data it will become far more intimate and useful.
As it offers the simplest value proposition and requires the least behavior change, I think leaving behind an ancestor PAI will be the dominant form of cultural “immortality” (technically, superlongevity) we can expect.
Many folks, not willing to offend some members their religious communities, may deny that their PAIs are actually weak versions of themselves. But such denials mean little to the reality of intelligence development. While at first only a minority will be interested in leaving ancestor PAIs, it might grow to a majority in a stunningly short timeframe (consider how fast gay marriage was accepted, after contentious debate, for a similar example). In the end, most folks may be deeply comforted by the recognition that their best stories, ideas, and records of their lives can live on, for as long as they might be useful, available to loved ones and for some, to the world, after they have died.
Of course, the AI and data for those PAIs will keep getting exponentially better as well, if you enable that feature (and some won’t), so the PAI of any deceased person can keep improving , hoovering up more data about that person, and even conversing with loved ones about them after their death. Done sensitively, I’m sure that for some families this post-death improvement of the PAI will become a popular choice.
As they interact with these ancestor PAIs, some survivors will modify them as well. For an obvious example, if your late mother wasn’t as nurturing as you liked, and you inherited a copy of her PAI, you will certainly be able to change those personality features, via some simple interface or conversation, toward a vision you consider a more “ideal mom.” You’ll also be able to turn that over to experts, who can use the latest evidence-based psychology procedures to make that “better mom.”
Some of our more traditional cultures may be horrified at first by this treatment of our ancestors, but others will surely embrace it. As with other digital technologies like social networks and video games, which can be evidence based and empowering, or perpetuate disempowering fantasies and filter bubbles, ancestor PAIs can be used for good or ill. But for those who use them, having PAIs of late friends and family who cared about you still present in your life will be a powerfully humanizing advance. Their existence and popularity will change our attitude toward, and anxiety regarding, biological death.
When your PAI is smart enough to complete your sentences when you are having a senior moment, and when you think it has most of your best stories and ideas captured within it over decades of use, leaving it behind to represent you to others will make your own biological death a much less subjectively violent affair. You’ll see your PAI as a first generation “upload” of you. An upload of only 20% of you, perhaps. But with good design, it will be the 20% that represents 80% of what you cared about most, in life.
Let’s look now at a second major path to postbiology, mind melds, a path that hundreds of millions of us might take prior to the close of this century, if scientific developments continue at the pace they have in recent decades.
Understanding our mind meld future begins by realizing that every one of us are already what Marvin Minsky called a “Society of Mind.” This means we have many independent mindsets inside our own brain, each a distinct yet overlapping set of neural networks, that stores its own redundant data (as well as accessing common data). This diversity allows us to see the world from many simultaneous viewpoints, and to argue with ourselves over every important decision we must make.
Our minds, it turns out are very much like beehives, with each mindset being like an individual bee, doing a constant waggle dance with its thoughts and internal dialog, trying to convince the whole hive to do something, for our best interest. We maintain this redundancy and diversity of viewpoints in our brains because it makes us more adaptive, but sometimes it malfunctions, as we see when our minds split into multiple personalities during trauma. Most of the time, though, the quality and quantity of information shared between each mindset is so high, that it is most useful for us to think of ourselves as one person. Even though we are, in actuality, also a society of mindsets.
You can probably see where this is going. At some point later this century, let’s guess circa 2080 or so, your PAI will ask you, or your children if you are no longer around, if you would like a direct “mind link”, or BCI (brain computer interface), to its own neural networks, via the use of removable nanobots (transducers) in your brain that allow you to wirelessly and continously connect your two natural intelligences. This would require that sufficiently powerful forms of General AI (GAI) have emerged, something I’ve argued might happen in our 2060’s, present trends extended. It is only the GAIs, in my view, that could develop such deeply powerful nanotech.
All the efforts of today’s brain machine interface companies, like Kernel, I expect will be very underwhelming and incremental, as biological humans will make very little progress on molecular nanotechnology in coming decades. Like biological immortality, it’s just too hard a problem for us. But not for GAIs, in my view. I can easily imagine that GAIs, with human supervision and much more advanced quantum computers (which we’ll need to do all that molecular simulation) might invent molecular nanotechnology within a generation (just a guess) after they are on the scene. That puts this scenario in 2080 or later, in my view.
Consider that this 2080’s (or 2100?) mind link and the nanotech behind it, will allow you not only to talk to and argue with the mindsets within your own biological brain, but you’ll now be able to use the same high-bandwidth neural language to talk to and argue with the mindsets in your PAI as well.
I think you can see where this scenario is headed. Once your mind link is sharing sufficiently high quality, high-quantity, and high-speed neural information, it will be more useful for you not to think of yourselves as two minds, but one. You’ve now “melded” with your PAI.
Direct mind links without the use of neural technology were first popularized by parapsychology researchers, as telepathy (often claimed, never found). Real BCI research began in the early 1970s at UCLA, and it happens in hundreds of labs today. Mind links using nanotech have been done well in sci-fi since the 1960s. An on-screen depiction of a mind meld with a computer happened for the first time ever (to my knowledge) in Star Trek: The Original Series, when Spock did a Vulcan mind meld with a computer called Nomad, whence I get the name for this scenario. See Nexus (2012), by futurist Ramez Naam for a neat recent sci-fi story featuring mind meld nanotech.
Eventually, the nanobots your PAI offers you may let you do things like neural synchronization, which is my current favorite hypothesis, among all the competing ones currently on offer, for how consciousness arises. Neural synchronization, facilitated by specialized long range connective tracts like the claustrum, seems likely to me to be at the center of the self-awareness we find in all complex animals, like mammals. That synchronization, combined with humanity’s particularly prosocial (planning, self-modeling, other-modeling) brains, creates the special kind of self-consciousness that gives us an identity narrative to go with this synchronization, whenever it occurs.
See the lovely 12 min video, What is consciousness?, The Economist (2015) for a current take on research in this area. Neural synchronization and integration tell us why we don’t have consciousness in sleep or under anesthesia, and why our consciousness rises and falls so much in intensity throughout the day.
As we learn more about how consiousness works in biological brains, we’ll incorporate those methods of neural network synchronization and integration into our artificial neural networks. At some point, our leading GAIs will start to develop self awareness, and humans and our PAIs will develop one shared consciousness. We’ll slowly realize we’ve grown into new, larger “us”. That us will still have raging arguments and disagreements, as we do when we argue in our own minds (such diversity of mind is always helpful for any finite intelligence) but now we’ll recognize that we can easily move our point of view between our biological and our electronic self. We’ll see them as a single integrated identity, not two.
If you have an interest in graduate biology and want more on neural synchronization and the mechanisms that do the synchronizing, see Buzsaki’s Rhythms of the Brain (2011) and read up on ephaptic coupling. As with memory encoding, neuroscientists don’t yet have all the details, but consciousness is no longer a mystery we expect we’ll never understand. It is a fully physical process, a puzzle that we’ve already partially solved. Tomorrow’s computational neuroscientists will surely crack it’s remaining details, and we’ll duplicate it fully in our neural network-based machines.
We’ll also solve making neural nets that not only think and remember, but feel. Every one of our different feelings (anger, happiness, sadness, envy, etc.) are, at their core, different strings of positive and negative sentiments that we have associated with past actions, like notes in a song we hear whenever we think about various subjects or possible future actions. We build these “feeling songs” using neural networks specialized for sentiment — the amygdala, limbic system, and parts of our prefrontal cortex — the latter being what doctors cut through in their surgical lobotomies during the 1930s-1960s, creating much more passive and emotionless people. These sentiment networks give us “gut feelings” about what to do next. We need those feelings when rationality fails us, as it often does. Patients with lesions in these emotional networks can’t feel, but still can still access sentiment memories in their prefrontal cortex. As neuroscientist Antonio Damasio describes in Descartes’ Error (2005), some of these patients can rationally argue forever the merits and drawbacks of various actions, but they can’t make decisions, and are as unmotivated as a lobotomy patient.
This is a clue that if you want to stay motivated in life, let yourself consciously feel both the highs and lows of your day, and observe closely how those feelings relate to your thoughts, and vice versa. You may need to think more at times, as with unconscious bias or anger. You may also need to feel more at times, as with procrastination due to unconscious fears. When you consciously feel and acknowledge your emotion, whatever it is, and think about how your thoughts triggered it and whether it was useful, you can get on with making changes to both your thoughts and feelings that will give you real progress. Both are sets of neural networks, trainable by your mind. For students, The Oxford Handbook of Affective Computing (2014) is an nice survey of work to bring sentiment to computers.
This work makes a good case that naturally intelligent machines will require sentiment networks (gut feelings) as they get smarter. As no finite physical mind can ever be “Godlike” in its intelligence, and no being ever has perfect information, rationality and logic will regularly fail our PAIs, just as it does with us. In that environment, gut feelings and moral sentiments will always inform and motivate us to make better decisions.
In the early stages of a mind meld, we might notice many differences in the way our biological and electronic minds think and feel. At first, we’ll surely still feel most at home in our biology. But our PAI, our electronic self, will be learning, thinking, and feeling, millions of times faster than our biology does. It is that differential rate of learning, long documented by scholars of accelerating change, that leads me to expect our PAIs will eventually also feel like our biological selves.
Consider also that one of the first things your new “hybrid” you will want to do is scan and back up all your biological brain’s memories into your electronic brain, as your biology will continue to age and die.
As you perform this memory backup, you might be amazed to realize that you can recall your life’s memories, and think with them, millions of times faster in your electronic mind than your biological mind. Increasingly, as your electronic mind improves, you may even feel your center of consciousness moving out of your biology and into your PAI.
Again, your electronic mind may feel in several ways more primitive than your biological mind in the early years of this technology. You’ll know you haven’t captured everything yet from your biology, and you’ll continue to upgrade your PAI every year. But as your PAI encodes exponentially more bio-inspired algorithms, and invents new ones biology doesn’t have, your personal AI seems likely to become the place where “you” increasingly live your mental life.
Consider now that when our biological body dies after years of such a mind meld, it may not feel like death, from our perspective. It might feel instead like metamorphosis, or natural change of form. The transition could be like going from childhood into puberty, or a caterpillar into a butterfly.
That’s the slippery singularity. Using the mind meld, a modern version of the Moravec Transfer, you slide right into your postbiological form, and you do it in the most natural way imaginable, with no interruption of your personal consciousness or feelings— in fact, with an enlargement of them. As a postbiological being, you will feel like a perpetual child, constantly able to grow and learn. You also have a potentially perpetual lifespan, assuming you want to stick around and keep improving, as many folks surely would.
One of my favorite works of foresight from the 20th century foretold a lot of this. Grant Fjermedal’s The Tomorrow Makers: The Brave New World of Living-Brain Machines (1986) , written thirty-five years before the deep learning revolution circa 2010, paints an incredibly prescient look at the future of AI as it increasingly simulates the thinking and feeling processes used by biological brains and bodies. Fjermedal interviews three of the four then-50ish founding fathers of AI: Alan Newell, Marvin Minsky, and John McCarthy (Herbert Simon is the fourth). He also interviews a young Hans Moravec, Danny Hillis, Rodney Brooks, AI skeptics including Hubert Dreyfus, and many students and hackers then passionately pursuing the dream of building software and robotic simulations of human beings, and “downloading” the contents of our human minds into robotically-embodied computers, what we now more commonly call mind uploading. He explores the personal, social, economic, political, military, and spiritual implications of this fantastical idea, touring CMU, MIT, Stanford, Berkeley, DC, New York, Minneapolis, and Japan, and he comes to believe this Great Transition of Mind from biology to technology is inevitable, eventually. The important questions, as he recognized then, are how will we choose to do it. Will we empower individuals, or continue to reduce their freedoms, intelligence, and wealth relative to corporations, the rich, and the state? Will we do this work recklessly, driven only by commercial, ownership, and power interests, while minimizing values of safety, privacy, and user control, or will we prioritize those values at every step, even as this slows down the process and increases its expense? The choice is ours.
Brain Preservation — A Not-So-Obvious Option to Live Again
Let’s look now at the last path to postbiology we should discuss — brain preservation. Since the 1960’s, the practice of cryonics has made it possible to freeze your brain at death, but very few folks have done this (just a few hundred so far) in all that time. There are many reasons for the reluctance. One big reason is that we have no solid scientific evidence yet that it might work, only guesses. Another is that it has been very expensive. Few would take a large sum of money away from their children ($80,000 is a typical cost at present), either via insurance or in a lump sum, for such an uncertain future return. There’s even a derogatory term I’ve heard applied to those brave (and wealthy) folks who do, the “cryoselfish.”
Fortunately, advances in brain science are now closing in on the mechanisms of memory. Within a decade or two, I think we’ll see simple memories recovered from “uploaded” animal brains. We also have new and much less expensive methods like plastination, which doesn’t require refrigeration, being used to preserve and upload simple animal brains (including worm, zebrafish and fly brains) into computers today. We just don’t know how to read the memories encoded in those uploads yet. But the neural code will be cracked, and when that happens, it will be obvious to everyone that a plastinated or cryonically preserved brain stores useful retreivable information. Eventually, we’ll realize that the entire person’s personality is preserved, and that they can come back later, as an upload.
Brain preservation today splits into three camps of folks who are interested in it. See my Medium article, The Transporter Test and the Three Camps of Brain Preservation, 2016, for more. In coming decades, I expect all three brain preservation camps will grow, but it is the Uploaders (folks who are fine with the idea of coming back inside a computer), rather than the Reanimators (folks who only want to come back as biology) that will grow the most. As the cost of brain preservation drops, and its validation grows, more and more folks will know someone who as done it, and will themselves become interested in doing it.
Later in this century, validated brain preservation may be available around the world, the lowest cost versions may be under $10,000, neuroscientists will largely agree that memories and even personalities are preserved, computer scientists will largely agree that future computers will be able to cheaply scan and upload those memories, and the procedure may even be covered by health insurance and available in all major cities in special facilities, hospices, and hospitals in our most progressive (and wealthy) nations.
As a co-founder of the Brain Preservation Foundation in 2010, I’d love to see all of this happen by mid-century. I also think that in any society where, say, 100,000 people have used this technology, we will see that society’s values move toward something we can call a Preservation Value Set, advancing science, progress, future, sustainability, truth-and-justice, preservation, diversity, and community-oriented values in those societies. So I consider brain preservation a very worthy goal.
All this said, I think the brain preservation path to the problem of death will always be a very small minority, relative to PAI superlongevity and (eventually) mind melds.
Why so? Let me offer three reasons.
One, brain preservation requires the largest leap of faith about the future, and the greatest behavior change. The other two solutions are far simpler to understand and offer more obvious benefits, in the here and now.
Two, brain preservation is by far the least exponential of these three processes. Brain preservation technologies will grow slowly relative to the other two solutions.
Three, we humans always understimate the long-run power of exponential processes. Few of us realize how satisfying it will be to leave behind a personal AI for our loved ones in 2040, how much less grief we’ll have about our own death as a result, and just how smart those PAIs will be. For most people, I think PAI superlongevity, will be immortality enough.
As futurists (and all of us are futurists on occasion), I believe it is our moral responsibility to tell good stories about our Great Transition often and well, to keep checking them against the emerging science, and to build our biology-inspired machines and technologies as well, as quickly, and as accessibly as we can. The choices we make with them today determine how many humans will benefit from them tomorrow. Our postbiological destination may be inevitable, but the quality of the path we walk toward it is entirely in our hands.
We’re on the edge of an amazing world, and there’s never been a better time to be intelligent, future-oriented optimists. Thank you for reading.
Calls to Action
● Consider putting some of your speculative investment savings into a company using or improving deep learning. My top pick at present is NVIDIA (NVDA). They are trading at 45 with a P/E of 39. They have gained 225% over the last 18 months, and they may drop a bit soon due to profit-taking, as they’ve just recently run up a 125% increase. Nevertheless I predict they will gain at least 80% or more in value yet again over the next 12–36 mos. The impact of deep learning is still greatly undervalued in the business world, and NVIDIA is a solid company in the right place and time to be a Levi Strauss & Co. to the coming Gold Rush.
● Consider funding an individual’s deep learning, computer science or neuroscience training or research on GoFundMe or a similar site, and investing for equity in a deep learning startup on an equity crowdfunding site like StartEngine and Crowdfunder (available to any of us) or one of the (presently) 122 deep learning startups on AngelList (for accredited investors only, unless you join a syndicate).
CC 4.0. Anyone may share or adapt, but please with link and attribution.
Think others might like this? If so, give it a clap, thanks!