Will we ever compute like a brain?

PM
13 min readJul 27, 2021

--

The majority of significant breakthroughs in computer science and Artificial Intelligence have been the result of an explosive increase in computation power. Up to this point, growth has been exponential, with computational power roughly doubling every 2 years.

At the same time the complexity of deep learning tasks is increasing even more rapidly. So there is a widening gap between the computation power needed for modern AI applications and the computational resources we have access to, and it is very quickly becoming a major issue.

Figure 1: Growth rates of computational power for deep learning tasks and hardware performance. Source: The Computational Limits of Deep Learning paper

Many times in the past humanity has turned to Mother Nature in order to find solutions to complex engineering problems. The human brain demonstrates that there is likely a way of computing that utilises less transistors, energy, and time, and is equally as effective. Instead of just ramping up our existing hardware, attempts are now being made to mimic the brain and its basic principles.

Neuromorphic hardware thus generally refers to either hardware that is organized similarly to the human brain, or else utilizes some of its basic principles for computation purposes. Brain-inspired features include highly parallel computing, analog & in-memory computing, and new neural networks architectures (i.e. spiking neural networks).

In this article, we try to comprehensively detail the limitations of classical computing, and how neuromorphic hardware may be able to overcome them.

We also give an overview of the current landscape of academic and commercial startups that are active in this area:

Full startup map is available below

Deep learning hardware gap

Historically, the growth in computing power per dollar was roughly equivalent to the growth in computing power per chip, so the cost of running deep learning models remained more or less stable over time.

However, the development of more sophisticated deep learning models has required more and more computational power for training and further large scale applications. This forced the entire industry to switch to multi-core GPUs in 2009, initially yielding a 5 to 15x increase of computational power, which rose to up to 35x by 2012.

Further switching to GPU-based systems and ASICs yielded a 10x per year increase from 2012 to 2019, but this was still not a large enough increase for the mass adoption of deep learning algorithms. Much of this increase also came from models being run for longer periods of time, on a greater number of machines.

We live in an era of insufficient hardware performance, and it seems that this disparity between complexity of deep learning tasks and computational resources is still widening. As a result, we will further explore the causes of this issue, so we may better understand how to overcome it.

Deep learning computations will rocket in the next decade

AI applications are requiring more computing power each year. Specifically, according to OpenAI, the amount of computational resources required to train state-of-the-art neural networks doubles every 3.4 months.

Just as an example, AlexNet was trained using 2 GPUs for 5–6 days in 2012, ResNeXt-101 was trained with 8 GPUs for over 10 days in 2012, and in 2019, NoisyStudent was trained with 1,000 TPUs for 6 days.

Figure 2. Neural networks by computational resources used/ year. Source: openai.com

Computation power growth will face limitations

Available computing power is still growing at an impressive rate. However, progress in computation power may slow or even stop entirely in the upcoming years due to 3 main problems: energy consumption, the Moore’s law limit, and the Von Neumann bottleneck. Let’s now discuss these in greater detail.

Problem 1: Energy consumption

Fundamental physics tells us that there is a minimum amount of energy required to process 1 bit of information, and modern computers are now using 1,000 times this. This is due to the Shannon — von Neumman — Landauer equation, which links information processing with an increase in entropy.

Deep Learning applications will be one of the driving factors of a rise in energy consumption. Just one learning iteration is estimated to cost $12M for GPT-3 creators, and uses 20 megawatts of energy. The Evolved Transformer machine translation system used more than 2 million GPU hours and also cost millions of dollars to run.

To further underscore the amount of computational power being used, it is worth noting that by 2030, 13% of the world’s electricity consumption will be attributed to data centers.

At the same time, the number of low-power-consumption applications will also rise substantially over the next several years. Edge Deep Learning applications will likely play a more and more significant role. Autonomous cars, wearables, smart homes, IoT, and similar edge segments are also expected to grow, with the edge computing market as a whole predicted to grow with a CAGR of 29% by 2026.

Problem 2: Moore’s law

The number of transistors on chips have doubled every 24 months for the last 40 years, but today, the dynamics of Moore’s laws are beginning to slow down. We are close to the efficiency plateau, and some experts argue that we are now living in a post-Moore world.

Figure 3. Processor performance over time. Source: cmu.edu

Modern silicon-based transistors are so small that quantum effects, like quantum tunneling, already heavily affect their design. This is problematic because scientists have not yet devised a way of overcoming such a small and fundamental effect. It therefore appears that there is also little scope for improvement in terms of further shrinking transistors.

Problem 3: Von Neumann bottleneck

Almost all computers today are built using von Neumann architecture: this means that the memory unit of a computer is separated from the processing unit.

Deep Learning applications need to transfer huge amounts of data to and from their memory. However, the speed of bus (data highway) between a computer’s memory and its computing core is limited, and not quick enough for modern requirements.

Figure 4. Diagram of Von Neumman bottleneck. Source: semiwiki.com

The current approach to neural networks training may be regarded as a crude, brute-force method: big neural networks are currently only affordable to large corporations because they require such huge computational costs.

To solve the problem, researchers have begun to study the brain, because it is considered to be a much more efficient system while figuratively being a universal Artificial Intelligence (and achieving better results in many tasks), it consumes a fantastically small amount of energy: 20 watts, compared with the 20 megawatts needed for the GPT-3.

These issues demonstrate a strong trend of depleting computational resources in the near future. In view of this, let’s now discuss what solutions may be derived from studying the human brain.

Why does the brain is more efficient than our best computers?

The brain is a set of neurons, connected via dendrites. Information is transmitted from one neuron to another by diffusing chemicals through the synapse, a small gap between neurons. Their conductivity is programmed chemically, and serves as a natural analog to the weights in artificial neural networks. Such a structure performs computations for various tasks the brain must carry out.

Figure 5. Diagram of neuron transmission at the synapse. Source: khanacademy.org

The appeal of the brain as a computer is that it is a lightweight and elegant system. The following table outlines its key advantages.

Table 1. Comparison of the human brain and current computers as systems of computing.

These advantages lead to faster and more energy-efficient computing and on-the-go learning than traditional machines. It is for this reason why the industry is looking forward to the development of tech which mimics the human brain.

We will next discuss recent innovations in software and hardware that have notable similarities with the brain’s inner structure including usage of spiking neural networks, highly parallel systems, in-memory & analog computing:

1. Spiking neural networks

Spiking neural networks represent the new generation of neural networks, and are entirely distinct from any architecture commonly used in production today. In general, common architectures instead might be classified as Deep Learning Networks. Let’s now briefly note the difference between classic Deep Learning (i.e. Convolutional Neural Networks) and Spiking Neural Networks.

Deep Neural Networks
Networks Inspired by continuous brain research in zoology (20th century). The system is modern, state-of-the-art, and solves complex problems.

Spiking Neural Networks
Inspired by the latest human brain research (21st century). The network is suitable for applications where fast responses, low latency, and high energy efficiency are needed. It therefore might potentially solve the same problems modern deep learning does.

How Spiking Neural Networks work

The main feature of SNN is information encoding. Instead of numbers, it uses a series of short electrical impulses known as spikes. The spikes are generated by neurons and sent as inputs to subsequent neurons. These neuron connections (synapses) have weights, just like in a usual neural network. Synaptic weight refers to the strength of a connection between two neurons. The more spikes with bigger weights the neuron receives as an input, the closer its charge is to the firing threshold. If the threshold is exceeded, the neuron then generates a spike and discharges.

Figure 6. Model of Spiking Neural Networks. Source: eenewsautomotive.com

As soon as the first iterations of SNNs were designed, they have been applied to tasks such as mobile AI applications, adaptive robotics & computer vision.

The main challenges towards practical adoption of SNNs are complexity of software development, lack of training algorithms & special SNN hardware.

2. Highly parallel systems

The brain is a highly parallel system, meaning that it consists of several areas (groups of neurons), each processing their own tasks. We are close to the limit of computational power per one core. As a result, to increase computational power, engineers now have no choice but to put as many cores in one chip as possible.

Most of the operations in AI computations may be represented as matrix multiplications. Thanks to the math of matrices, this operation might be split into many autonomous threads. In such a setup, one computational core may work on the first row, another core might compute the second row, and so on. This is a key for parallelization and thus for the overall acceleration of AI tasks.

Sean Lee, the chief hardware architect in Cerebras, holds the world’s largest silicon AI chip. Source: bbc.com

Highly parallel systems like ASICs (application-specific integrated circuit) are extremely suitable for such kind of operations. That is the reason why during last 4 years the whole AI industry has been steadily switching to it resulting in projected massive adoption of ASICs until 2025 for both server & edge applications.

Figure 6. The preferred architectures for compute are shifting in data centers and the edge. Source: mckinsey.com

3. Analog computing

Analog computing refers to a paradigm of computing using real-world physical processes to emulate mathematical functions and thus perform those calculations.

It is usually implemented via crossbar arrays, which are arrays of controllable resistances. This may be a transistor-resistor pair, a memristor, a flash cell, or another device. Resistances represent the matrix values (in most cases these are the weights of the neural network). These inputs are then fed into the system as voltages, and the resulting currents are inversely proportional to the resistance and input voltage. In doing this, multiplications can be calculated without any actual calculations at all, just by using Ohm’s law.

Figure 7. Model of how analog computing may facilitate calculations.Source: https://www.nature.com/articles/s41928-020-0435-7

This method enables vector-matrix multiplication to be performed fully in analog. By supplementing the system with digital-to-analog and analog-to-digital converters, we can unlock the possibility to multiply matrices fully in analog, with minimal energy and time costs.

Analog crossbar architecture also enables to eliminate Von Neumann bottleneck. Memory is needed only for storing required resistance levels (weights). Likewise the human brain uses the same structure to both compute and store memories — neurons.

Analog chips might show impressive energy efficiency in terms of operations per watt (up to 100x vs classic GPUs & up to 25x vs ASICs, according to our research).

Figure 8. Roadmap for 1,000x improvement in AI compute performance efficiency over the next decade. Source: ibm.com

The main challenge towards practical adoption of such circuits are various non-ideal characteristics in the memory devices, e.g. device noise, drift in the conductive state, temperature sensitivity, variations in device current-voltage characteristics, and defective memory cells.

Finally, there is a brief overview of the current neuromorphic & Artificial Inteligence chipset landscape.

Market overview

The global AI chipset market size is $7.6B is projected to grow with a 40% CAGR in the next 5 years.

Startups in the segment have already raised more than $9B. Major computing players like Google, IBM, Intel and Qualcomm are all heavily investing in research and commercial projects.

We can currently see a substantial and ongoing increase in the demand for AI computations. McKinsey analysts, for example, forecast AI applications hardware to grow 5 times faster than non-AI:

Figure 9. Forecasted growth rates for the AI and non-AI semiconductor markets. Source: mckinsey.com

Yole development reviews several promising markets for brain-inspired computing and sensing, and predicts major applications for neuromorphic technology in the automotive, industrial, mobile, and medical consumer sectors, among others.

Multiple researchers now predict that neuromorphic architecture will become the dominant type for computing Deep Learning tasks during next decade.

Figure 10. Three waves of computing. Source: Neuromorphic computing report

Neuromorphic research projects

The mentioned projects are trying to mimic human brain while emulating neurons & synapses on a hardware and/or software level:

  • SpiNNaker v 1.0 (2011, University of Manchester) — Pioneering research project (hardware and software framework) for SNNs and neuromorphic computing. First on-chip programmable digital chip with 8 neurosynaptical cores, 250K neurons & 80M synapses.
  • BrainScaleS (2011, Heidelberg University) — Waifer scale system. Precurser project of HBP. On-chip programmable mixed-signal chip with 18 neurosynaptical core, 4,5K neurons & 4,2M synapses.
  • TrueNorth (2014, IBM) — Neuromorphic circuit to emulate 1M neurons & 250M synapses. It is not on-chip programmable digital chip with 4096 neurosynaptical cores mainly for inference applications.
  • Loihi (2018, Intel) — Neuromorphic circuit to emulate 130K neurons & 130M synapses. On-chip programmable digital chip with 128 neurosynaptical cores for training & inference.
  • NeuroGrid (2014, Stanford University)- Neuromorphic circuit to emulate 1M neurons & 6B synapses. It is not on-chip programmable subthreshold mixed-signal chips using 16 neurosynaptical cores.
  • SpiNNaker v 2.0 (2018, TU Drezden, University of Manchester) — New generation of a SpiNNaker project funded by European Human Brain Project. Combination of conventional deep learning and event-based AI.
  • Tianjic (2019, Tsinghua University, University of California) — Neuromorphic circuit to emulate 40K neurons & 10M synapses, 156 cores. Might work with classical and spiking neural networks.

Startups

In our research, we identified several noticeable exits and unicorns in this segment, as well as 80+ neuromorphic and AI-focused chipset startups.

Noticeable exits

  • NUVIA (acquired by Qualcomm in 2021) — a US semiconductor company focusing on designing system-on-chips (SoC) and CPU cores for edge and 5G applications. The company was acquired by Qualcomm for $4B. Backed by Blackrock, Dell, Atlantic Bridge Capital, and Fidelity Management.
  • Cambricon Technologies (IPO in 2020) — a Chinese developer of intelligent chips for cloud servers, terminals, and robots. Raised $370M during their IPO on the Shanghai Stock Exchange in July 2020. Achieved a Post-IPO valuation of $9B. Backed by Chinese Academy Of Sciences, Alibaba, and Lenovo.
  • Habana (acquired by Intel in 2019) — a US semiconductor company focusing on developing disruptive solutions for data center and cloud efficiency. The company was acquired by Intel for $2B. Backed by Intel Capital, Bessemer Venture Partners, and Samsung Catalyst Fund.
  • Nervana (acquired by Intel in 2016) — a US-based developer of deep learning frameworks and AI chips. The company was acquired by Intel for $350M. Backed by CME Ventures, DCVC, and Lux Capital Management.

Noticeable VC deals

  • SambaNova Systems — a US AI hardware designer that raised $678M of Series D venture funding in a deal led by SoftBank in April 2021. The pre-money valuation was $4.4B. Backed by BlackRock, Intel Capital & GV.
  • Graphcore — a UK AI developer of the Intelligence Processing Unit (IPU), a microprocessor designed for AI and machine learning applications. They raised a $222M Series E round in December 2020. The pre-money valuation was $2.5B. Backed by Sequoia Capital, Samsung, Microsoft & Bosch.
  • Horizon Robotics — a Chinese chips designer, focusing on artificial intelligence computing for smart mobility. Raised corporate funding from Weihao Chuangxin in April 2021 at a $3B valuation. Backed by Sequoia Capital, Intel Capital & SK Hynix.
  • Cerebras — a US AI hardware designer, focusing on ultra-large chips. Raised $88M through a combination of debt and Series D venture funding in November 2019, putting the company’s pre-money valuation at $1.61B. Backed by Sequoia Capital, Foundation Capital & Benchmark.
Figure 11. Top startups by capitalization.

Startups map

Funding statistics

Table 2. Funding across device classes
Table 3. Funding across regions

Summary

  1. Despite exponential growth of computational power there is still a widening gap between the computation power needed for modern Deep Learning applications and the computational resources.
  2. The amount of computational resources required to train state-of-the-art neural networks doubles every 3.4 months. Just one learning iteration is estimated to cost $12M for GPT-3 creators, and uses 20 megawatts of energy.
  3. Computation power growth will face limitations due to 3 main problems: high energy consumption, the Moore’s law limit, and the Von Neumann bottleneck.
  4. To overcome all of computational limitations we are trying to mimic human brain while building neuromorphic hardware & software architectures incl. spiking neural networks, highly parallel systems, in-memory & analog computing.
  5. The global AI chipset market size is $7.6B is projected to grow with a 40% CAGR in the next 5 years. AI-focused hardware will grow 5 times faster than non-AI.
  6. Majors & VC funds are investing heavily in that segment. Startups have already raised more than $9B and 10 unicorns were born.

I would like to thank the entire Phystech Ventures team and Philip Khristolyubov personally for his assistance in preparing this analytical report.

Special thanks to Dmitri Strukov, Ali Erdengiz & Yulia Sandamirskaya.

--

--

PM

Venture capital professional focused on deeptech incl. life sciences, climate tech, next-gen computing, robotics & AI/ML.