“[We must] replace the conventional metaphor — a sequence of steps — with the notion of a community of interacting entities.” — Lynn Andea Stein in Challenging the Computer Metaphor
It’s an established fact that biological cognition is fundamentally based on swarm intelligence. Michael Levin drives this fact home with his research on bioelectrical computation. Levin’s work demonstrates that simple biological organisms such as single cell organisms exhibit complex behaviors. Furthermore, multicellular organisms have a complex coordination mechanism that gives rise to controlled morphogenesis and self-repairability. You can find Michael Levin’s revealing talk about this primitive biological capability here:
In his new book “From Bacteria to Bach and Back”, Daniel Dennett describes neurons as having similar capabilities as that found in simpler multicellular creatures. Neurons are effectively “mini-robots” that are individually competent and it is in their collective interactions that we have complex organism behavior. In other words, each neuron behaves autonomously and reacts to its own local perception within the ecosystem that is the brain.
This reminds me of the competing parallel architectures of the past known as Single Instruction Multiple Data (SIMD) and Multiple Instruction Multiple Data (MIMD). In the earlier days of the development of supercomputers, there were two camps. Those that were proponents of SIMD and those that argued for MIMD. One of the more famous SIMD systems was the Connection Machine where Richard Feynman served as a consultant. Here is Feynman with a Connection Machine T-shirt:
Cray also was a SIMD system with its massive vector processing units. Over time, manufacturers developed a hybrid approach to create massively parallel supercomputers. This hybrid approach involved having multiple independent CPUs that had access to vector processing units. Deep Learning also was able to leverage SIMD architectures in the form of Graphical Processing Units (GPU) and just recently Tensor Core units that perform SIMD based matrix-based operations. The mathematical formulations of Deep Learning are ideally suited for SIMD based parallel computation.
This, of course, doesn’t reflect how biological computation works. Each neuron (cells in the case of bioelectric computation) has its own behavior and does not get its program instructions from a top-level controller. One obvious difference between Deep Learning hardware and biology is the massive energy efficiency difference. This motivates the research in designs for extremely low powered multi-core computing. The KiloCore project is working on such a device with 1,000 cores that have the unique characteristic of performing 115 billion instructions per second on only 0.7 Watts of power. This can be powered by a single AA battery and its 100 times more power-efficient than a laptop processor.
Gyrfalcon is a startup that is pursuing a low power multi-core solution for embedded Deep Learning solutions. Their low powered device is capable of 16.8 TOPs at 700mW. This technology has 28,000 cores with each having its own memory. This architecture reduces the power consumption spent from data movement in a conventional architecture. This architecture, however, appears to employ SIMD though. So the power difference may not have to do with MIMD vs SIMD but have more to do with efficient local computation.
I think we need to move toward a different kind of computer. Fortunately I have one here.
GraphCore’s IPU is reported to have 14,000 independent threads. The claim here is that this new kind of ‘Graph engine’ will lead to the next generation of machine intelligence.
The question I would like to pose is whether the present-day paradigm of SIMD computation hindering progress by restricting research to leverage the wrong kind of architecture. Biology uses MIMD and is there sufficient unexplored difference between these two architectures that may be hindering Deep Learning progress?
The universal mechanism of Deep Learning is gradient descent and if you think about this, this is a synchronization step that is the same across all layers of the architecture. One could argue that the data received by each core to perform its update is different and therefore the actual computation at each core is fundamentally also different. However, each core is not autonomous and is in lockstep synchronization. Is autonomous computation then fundamentally different from synchronized computation?
Let’s explore further research in a truly autonomous computation known as swarm intelligence. How does swarm intelligence differ from Deep Learning? Here are the principles of swarm computing:
1. Awareness-Each member must be aware of its surroundings and abilities.
2. Autonomy -Each member must operate as an autonomous master (not as a slave); this is essential to coordinate the allocation of labor.
3. Solidarity -Each member must cooperate in solidarity; when a task is completed each member should autonomously look for a new task (leveraging its current position).
4. Expandability -The system must permit expansion where members are dynamically aggregated.
5. Resiliency -The system must be self-healing; when members are removed, the remaining members should undertake the unfinished tasks.
What is evident is that this kind of computation is highly adaptive and self-repairing. It has features that appear to be missing in present-day Deep Learning architecture. A Deep Learning neuron is responsible only for aggregating its inputs, computing a forward function and updating its internal weight. DL neurons are fundamentally single pattern recognition robots. In contrast, a swarm “neuron” would be responsible for all of the above. In short, not only do they recognize patterns, but they seem to be designed to exhibit complex cooperative behavior. Are the mechanisms of cooperative population evolution dramatically different from DL gradient descent?
Michael Levin has an interesting slide that originates from the father of Cybernetics (Norbert Wiener):
Are we thus building cognitive architectures from the wrong stepping stones? Can we build adaptive autonomous systems from components that are originally not autonomous? Biology appears to be revealing to us that autonomy is a fundamental building block and perhaps the adaptability of complex life’s cognition is an emergent property of semi-autonomous components. Furthermore, how do autonomous components lose their own autonomy and evolve towards synergistic co-dependence? There are interesting questions about emergent behavior that are conspicuously absent in current Deep Learning discourse.
Evolutionary biologist Tecumseh Fitch writes in “Nano-Intentionality — A Defense of Intrinsic Intentionality”:
Although we can make machines that “represent” and “choose” among alternatives and even “learn”, by my argument they are not aware. The scare quotes are necessary, because truly mental learning and awareness are composite reflections of a more fundamentally and intrinsically nano-intentional capacity existing in the neural components of an organism’s brain.
Thus there is a non-zero possibility that Deep Learning is flawed because the capabilities of its neurons lack this nano-intentionality. An alternative perspective is that Deep Learning works because it captures nano-intentionality!