LightOn’s Summer Series #1 — Faith No Moore: Silicon Will Not Scale Indefinitely
Welcome to LightOn’s Summer Series! Throughout the month of August, we take you on a tour of our unique technologies and what motivated their development. At LightOn we are developing novel optics-based computing hardware leveraging natural physical processes to perform high-dimensional computations at unprecedented speed and power efficiency. Our hardware accelerators are uniquely suited to tackle the most demanding machine learning applications. In this series, we will venture to the edge of existing hardware capabilities — and beyond! — and we will showcase how light can be used to unlock new possibilities in machine learning. For our first installment, we will focus on the challenges raised by the ongoing machine learning revolution. In particular, we will examine the limitations in existing silicon-based hardware.
Software, data, hardware: the triumvirate of machine learning
Machines that learn and software that adapts to our needs are growing ubiquitous, bringing transformative changes to countless industries and services. Once constrained to toy problems, machine learning (ML) applications have now spread to the real world, from powering smart assistants [6, 7], to improving medical care , or even helping researchers uncover new insights in materials sciences . In the underlying technology stack, this ongoing revolution is being driven by three concurrent factors:
- Smarter algorithms: refinements of existing statistical methods and brand-new frameworks have empowered machines with an unprecedented ability to draw complex insights from a wealth of data. On top of providing us with the tools to distill our complex world, novel architectures like the Transformer  or Generative Adversarial Networks  (Figure 1) enable machines to create new content.
- Abundant data: ever-growing sources of high-quality data have become available to feed these algorithms. Privacy issues notwithstanding, this is a boon for a field in which additional high-quality data directly translates into better performance.
- Cheap compute: the staggering increase in the amount of compute available has transformed our devices into relentless data-crunchers. Thanks to dedicated chips, even mid-range smartphones now have ML-processing capabilities; thereby allowing for such algorithms to be applied in an ever-expanding variety of contexts.
These three factors are deeply intertwined. Sometimes, they compensate for one another: smarter solutions may be able to learn complex representations with fewer samples on a tight compute budget. More often than not, they often amplify each other: larger data and/or more advanced algorithms often rely on more expensive computations. To fulfill its promises, the Artificial Intelligence/Machine Learning revolution requires these three factors.
So long, and thanks for all the transistors!
However one of these workhorses is currently facing trouble. Deep learning most fashionable successes have been requiring an exponential increase in compute: indeed, resources needed for State-Of-The-Art (SOTA) algorithms are estimated to double every 3.5 months . For Moore’s Law to be relevant, chip foundries and manufacturers have to find ways to pack more and more transistors into the dies of processors. As they struggle with increasing power densities (Dennard Scaling) and more expensive and finer lithography requirements, manufacturers instead turn to enhanced parallelism, by multiplying cores.
For deep learning applications, this is a boon: the most fundamental operation behind neural networks is matrix multiplication, which can take advantage of distributed computing afforded by these multitudes of cores. As a result, the trend for adding cores has grown over time, starting with extensive use of GPUs, and has also sparked new hardware accelerators. Carefully crafted chips incorporate the core principles of modern machine learning at the transistor level, enabling more than 100x speedups compared to a general-purpose processor. Not only has this allowed the industry to compensate for diminishing returns on new generations of processors, but it has also shown that task-specific hardware was more relevant than ever. Now, dozens of startups and large companies alike are designing chips tailored to artificial intelligence applications, often optimizing for specific algorithms and applications.
Yet, these approaches are insufficient for two main reasons:
-On the chip limits: Custom-made chips are still bound by the limitations of silicon-based electronics. On the one hand, quantum mechanics and thermal losses forbid access to smaller engravings. The ever-approaching Landauer’s limit places a strict limit on further shrinking of electronic components. On the other hand, as additional cores bring more thermal strain, the laws of thermodynamics become hard constraints; cooling components closely packed together and that dissipate large amounts of heat is a non-trivial exercise.
Communication bandwidth: Moreover, moving billions of ones and zeroes between disk, memory, and dedicated chips are proving to be a challenging bottleneck. In practice, this issue means CPUs are still the preferred device for some memory-hungry applications .
Accordingly, exponential scaling of electronics is not a free lunch anymore [12, 13], as shown in Figure 2.
A game of tug-of-war: bite-sized models vs no-limit architecture search
Towards lean machine learning
The machine learning community has not remained indifferent to these concerning hardware trends. A number of countermeasures have been devised to permit the exponential scaling of compute to continue — if only for a while.
Perhaps the most successful practice taken by manufacturers has been to switch to lower precision arithmetic. High-Performance Computing (HPC) has traditionally required longer bit depth. The standard bit length has long been floating-point numbers with 32 bits — or 64 bits for double precision, and even 128 bits for some larger HPC simulations requiring quadruple precision. However, in the case of machine learning applications that are essentially performing statistics on noisy data, half-precision (16 bits) or even less can do just as well . And indeed, a significant part of the recent progress in hardware benchmarks for machine learning tasks is due to this switch: eye-catching performance metrics reported by chip makers are often for half-precision computations.
This movement for lower precision is certainly only the beginning: at the cutting-edge of research, practitioners are looking into making neural networks able to only use integers ; or even quantized coefficients, down to binary or ternary numbers .
What’s more, instead of reducing the precision of numbers, neural networks pruning techniques have also grown increasingly trendy. These techniques are an extension of the long-lasting inclination for algorithms to exploit sparsity. Indeed, while over-parametrized architectures may be required to find a good set of weights at training time, this requirement can be lifted at inference . Swathes of neurons and/or connections can be outright removed, thereby enabling large speed-ups and model compression at little or no performance cost. These approaches are key to enabling ML-computing on the edge, in a larger variety of devices.
More compute is all you need … if you can afford it
Yet, at the same time, other Machine Learning trends are making compute needs skyrocket. The situation, akin to Jevons paradox, is almost as if any gain in efficiency is promptly compensated by these new trends.
One of these trends is Neural Architecture Search (NAS). NAS has been booming in recent years, achieving SOTA performances in computer vision and natural language processing. The practice is also controversial: using thousands of GPUs/TPUs for countless hours, the performance gains obtained are often marginal. Worse, individual models, such as XLNet , are ballooning in complexity, with one-time training costs in the $250,000 range. Thus, even well-funded universities struggle to find resources to match private labs.
In response to this trend, transfer learning has grown more common, thereby allowing practitioners to leverage the progress of pre-trained SOTA models for specific tasks. However, the stakes go beyond raw computing power and money within the community. Global environmental concerns are also coming to bear as compute-hungry approaches have non-negligible carbon footprints. For instance, Data Centers’ energy consumption have already passed that of air traffic . Again, as a result, within the community, there is a sense that practitioners have to more clearly expose the compute requirements of their practices, and maybe even include an estimation of their environmental impact .
The new paradigms
In order for Machine Learning and Artificial Intelligence to keep on growing, new computing paradigms are required: especially the ones that can both scale to vast amounts of high-dimensional data, and can do so on a tight energy budget. In our next installment, we will explore the general background for one of these paradigms: Optical/Photonics Computing. And through this series of posts, we will show how, at LightOn, this one is not just a promise: it’s already here and at scale.
Our upcoming installments for this Summer include:
- 1 — Faith No Moore: Silicon Will Not Scale Indefinitely (this post)
- 2 — Optical Computing: a New Hope
- 3 — Random Projections and the Blessing of Dimensionality
- 4 — Random Projections at the Speed of Light: Full Ahead Mr. Sulu, Maximum Warp
Stay updated on our advancements by subscribing to our newsletter. Liked what you read and eager for more? You can check out our website, as well as our publications. Seeing is believing: you can request an invitation to LightOn Cloud, and take one of our Optical Processing Unit for a spin. Want to be part of the photonics revolution? We are hiring!
: Ian Goodfellow et al. Generative Adversarial Networks. NeurIPS, 2014.
: Alec Radford et al. Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks. ICLR, 2016.
: Ming-Yu Lia et al. Coupled Generative Adversarial Networks. NeurIPS, 2016.
: Tero Karras et al. Progressive Growing of GANs for Improved Quality, Stability, and Variation. ICLR, 2018.
: Andrew Brock et al. Large Scale GAN Training for High Fidelity Natural Image Synthesis. ICLR, 2019.
: Yuxuan Wang et al. Tacotron: Towards End-to-End Speech Synthesis. Interspeech, 2017.
: Eric Battenberg et al. Effective Use of Variational Embedding Capacity in Expressive End-to-End Speech Synthesis. Preprint, 2019.
: Nenad Tomašev et al. A clinically applicable approach to continuous prediction of future acute kidney injury. Nature 572:116–119, 2019.
: Keith T. Butler et al. Machine learning for molecular and materials science. Nature 559:547–555, 2018.
: Max Jaderberg et al. Spatial Transformer Networks. NeurIPS, 2015.
: Dario Amodei et al. AI and Compute. OpenAI Blog, 2018.
: Chuck Moore. Data Processing in Excascale-Class Computer Systems. The Salishan Conference on High Speed Computing, 2011.
: Venkatramani Balaji. Machine Learning and the Post-Dennard Era of Climate Simulation. 42nd ORAP Forum, AI for HPC and HPC for AI, 2018.
: Suyog Gupta et al. Deep Learning with Limited Numerical Precision. ICML, 2015.
: Dipankar Das et al. Mixed Precision Training of Convolutional Neural Networks using Integer Operations. ICLR, 2018.
: Matthieu Courbariaux et al. Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1. NeurIPS, 2016.
: Song Han et al. Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding. ICLR, 2016.
: Zhilin Yang et al. XLNet: Generalized Autoregressive Pretraining for Language Understanding. Preprint, 2019.
: Emma Strubell et al. Energy and Policy Considerations for Deep Learning in NLP. ACL, 2019.
: Nicola Jones. How to stop data centres from gobbling up the world’s electricity. Nature, 2018.
: Yu Emma Wang et al. Benchmarking TPU, GPU, and CPU Platforms for Deep Learning. Preprint.
Julien Launay, Machine Learning R&D engineer at LightOn AI Research.