Why the AI Boom Will Need Better Compute Infrastructure

John M Mern
Prime Movers Lab
Published in
10 min readDec 1, 2023

The global demand for computation has risen steadily over the past several decades and will continue to do so into the foreseeable future, due in part to rapid growth in AI usage. This has been enabled in large part by the continuation of Moore’s law — the trend of processing power doubling roughly every two years. Though it’s uncertain how long Moore’s law will hold, recent advances in transistor design and fabrication make it feasible for Moore’s law to continue beyond the next decade. During this time, many innovations will be needed to continue to advance the performance and scalability of compute systems. New types of memory and communication will be critical to alleviating performance-limiting bottlenecks that are present and growing, in part due to increasing demand for machine learning data. Broader deployment of advanced AI models will require more energy-efficient compute systems, particularly for battery-powered edge devices. Research continues to advance quantum, photonic, and other novel computational models to the point that they have the potential to capture large segments of compute demand in the near future.

We believe that humanity has a near-limitless appetite for data, and more precisely, for intelligent ways to use data. As we have canvassed the compute ecosystem, we have become particularly interested in companies that we think will benefit from this tailwind:

New methods of processing data: novel computing, quantum, photonic, neuromorphic chip companies; new chip designs for “X”

Continuing Moore’s Law: 3D transistors / novel chip design

Storing, moving, hosting data: data center networking & communications, power efficiency

Prime Movers Lab is interested in connecting with innovators working to address the challenges arising from the ever-growing demand for computation. Please feel free to reach out to us at john@primemoverslab.com and brad@primemoverslab.com if you’re interested in talking about the future of AI and compute hardware.

Key Trends and Factors

The compute ecosystem starts with companies that provide the materials and machines used to fabricate integrated circuit chips and ends in data centers that house and process information and services for consumers and enterprises. Between these two ends are many companies that work together to design, produce, and package computer systems. In addition to large chip makers like Nvidia and Intel, this space is occupied by many other companies providing tools and subsystems to relay power, store data, connect systems, manage networks, etc. The current ecosystem is illustrated in Figure 1. Beyond what exists today, labs and companies across the world are working on new technologies like quantum, photonic, and neuromorphic computing that may disrupt and expand the existing landscape.

Figure 1: The Semiconductor Ecosystem. Image Credit: Altman Solon

There are a multitude of interdependent factors influencing the evolution of the silicon economy. Here we identify a few key trends shaping the most pressing challenges that the next generation of compute innovation will need to solve.

Moore’s law is continuing but at an ever-increasing cost. Industry stakeholders have been forecasting the end of Moore’s law since the beginning of the century. As transistor areas shrink, maintaining transistor control and efficiency grows increasingly difficult due to relatively higher impacts of leakage and quantum effects. Despite growing challenges, chipmakers are continuing to keep pace with Moore. To make up for the shrinking 2D area, chipmakers began building channels vertically, with most commercial chips below 22nm relying on 3D FINFET transistors. More complex 3D transistors like GAAFETs and MBCFETs overcome physical limitations with FINFETs by allowing more effective channel control, just as FINFETs did with planar MOSFETs a decade earlier. These more complex elements then introduce opportunities for improved design and fabrication technologies. As improving density in 2D becomes more difficult, chipmakers are stacking more features vertically. New interconnects and other integration technologies are emerging to support this new design direction. Together, these new design directions have the potential to continue Moore’s exponential growth for the next decade and beyond.

Data center scaling is becoming more challenging. Last year, more than 15 exaflops worth of compute capacity were installed in data centers around the world. For comparison, some estimates calculate the total processing throughput installed in the world at the end of 2022 to have been 25 exaflops. This 60% increase in installed processing does not, unfortunately, equate to a 60% increase in the model-training capability. Training models over distributed systems leads to a sub-linear scaling in training speed, meaning a 200x increase in deployed GPUs may only lead to a 50x increase in training speed (MLPerf results at ML Commons as of July 2020). This is caused by a variety of factors, most notably the need to move more data to and from more devices more frequently. As ML models grow in size, data centers will struggle to install enough capacity to meet the demand without further innovation.

Beyond the scaling rate challenges specific to ML, many logistical challenges pose immediate issues to general data center growth. Most immediate among these is power. While very computationally powerful, Nvidia’s industrial GPUs (e.g. H100) are power-hungry and hot. As more GPUs are packed into server racks, the total power demanded by data centers is growing to the point where local grids may start struggling to meet demand. Globally, vacancies are dropping significantly, with major locales in the US like Virgina are at an all-time low of 1.8%. This has caused an increased interest in innovations that improve power efficiency and cooling for existing systems, and that interest is expanding to interest in new, more power-efficient compute systems as well. Put simply, existing data centers cannot draw more power from the grid. If they want to expand capacity they need to run chips more efficiently.

Areas we’re tracking: Data center communications and networking, cybersecurity, and chips with improved power efficiency.

Processors are outpacing the rest of the computer. Rapid improvements in CPU and GPU processing throughput have caused getting data to and from the processors to become a bottleneck. The two main elements behind this bottleneck are memory and communication. High-bandwidth memory (HBM) based on DRAM have emerged to serve the immediate need by packing more connection pins per die to provide more channels for information to move in and out of memory. Deploying these at scale poses serious supply-chain and cost challenges, limiting them to only top-tier systems. As a result, many companies have begun exploring lower-cost and lower-power alternatives to HBM designs. Research into DRAM alternatives has also continued. Non-volatile MRAM, RRAM, and PCM hold the potential to replace DRAM and NAND-based flash and enable new computing paradigms.

While reading data in and out of memory is a significant source of latency, another major contributor is the actual movement of the data between devices. Networking systems often fail to match the throughput of modern data centers. For example, four NVMe drives can produce 11 Gbits/sec, more than high bandwidth, 100 gigabit ethernet can support. Similar throughput limitations and latency sources exist at all levels of compute interconnect. Innovations have been introduced to address these various bottlenecks ranging from improved chiplet interconnect architectures (UCIe) to silicon-photonic switches. Innovation and investment in this space is likely to continue as processors continue to accelerate.

Areas we’re tracking: next-gen chip architecture and memory technology, networking and communications, silicon photonics

AI is driving increased hardware specialization. While Moore’s law has successfully held, maintaining its pace has come at a high and ever-growing cost. Continued innovation has required major chip makers to increase R&D and capital expenditure spending significantly. State-of-the-art fabs now cost around $16 billion USD –and that cost is growing 13% annually. An alternative to improving compute effectiveness by simply increasing transistor density is to specialize processor design for specific compute tasks. An example of this approach can be found within the new Nvidia H100 which includes a “Transformer Engine” –

a processor specialized for computations required for neural network architectures commonly used in large language models (LLMs). With the transformer engine, the H100 can train LLMs nine times faster than the predecessor A100, despite only having roughly 48% more transistors.

As neural networks are more widely deployed, we expect to see the emergence of more specialized chips like the “Transformer Engine”. As enterprises more widely deploy LLMs, generative models, and other deep AI systems, data centers will struggle to meet this demand with general purpose compute alone as they come up against power, cost, and supply availability issues. Cheaper, more power-efficient chips that accelerate popular neural network model architectures such as RNNs or CNNs will be an attractive solution to meet the growing demand. Deploying trained models to more efficient hardware will also free up faster systems for more data-intensive training.

Areas we’re tracking: AI accelerators, process-in-memory systems

Increased AI at the edge creates new design demand. Advanced AI is being increasingly integrated into edge industrial and consumer devices. Today, much of the compute workload for these applications is being handled by shipping data to and from cloud processing. To limit the reliance on battery-intensive and bandwidth-limited RF communication, future systems will likely look to do more processing on the edge device. This will require designing chips that can process the large AI work loads with significantly smaller size and weight, and lower power consumption (SWaP). The chips that solve this need will look significantly different from those used in data centers and servers today due to differences in performance requirements, production volumes, and supply chain access. ASICS and [e]FPGAs have the greatest potential to service the embedded systems market.

Low-power compute paradigms, such as neuromorphic computing, are likely to play a key role in this emerging space. Scaling the deployment of these new architectures will require changes to various parts of the silicon ecosystem. Software and development tools will be needed to move software written in and for conventional computing architectures to novel, low-power systems. Upstream to the chips, innovation will be required to make production of lower-volume ASICs more economical by lowering the cost and time required to set up a fabrication line.

Areas we’re tracking: neuromorphic and edge/IOT chips, [e]FPGAs

Change is being pushed upstream. The need to shrink transistor minimum feature size has driven innovation in fabrication processes and technologies. As the trend of simply shrinking transistors comes to a physics-limited end, feature structures and layouts are instead becoming more complex. This presents opportunities for new IP cores to play major roles in future chip advancements. Commercial adoption of more complex 3D transistors like GAAFETs and MBCFETs introduce opportunities for improved design and fabrication technologies. As chip makers look to stack more features on top of each other, technologies allowing denser interconnects, better power management, and improved thermal regulation are becoming increasingly important.

As chip makers continue to chase Moore’s law, development costs are growing exponentially as process nodes shrink. The cost to develop a chip on a 5nm node is somewhere between $280M and $540M with a time from RTL design to tapeout of up to 18 months. Having more transistors per chip means that there is more to verify with each new design and as a result, verification can take up to 70% of the total design and development time. We expect novel design and verification techniques that can accelerate this cycle to be investigated.

Areas we’re tracking: improved chip design tools / EDA software, AI based verification, new materials (e.g. SiGe) that can be manufactured on existing silicon manufacturing tools

Novel paradigms continue to hold promise. Despite the large global investment in traditional silicon-transistor computing, several nascent technologies offering potential step-function performance improvements have the potential to quickly disrupt and dominate large segments of the space. One clear example is quantum computing. Quantum is particularly well suited to AI applications, as parallelism is fundamental to quantum processing, allowing it to scale to massive data sets much more efficiently than conventional bit-based computation. Companies are working to scale quantum computing to commercially viables levels. Atom Computing (A Prime Movers Lab portfolio company) recently announced it completed a system with 1,180 quibits, making it the first ever gate-based quantum computer with over 1,000 qubits. Photonic computing is another potentially disruptive processing technology that uses light instead of electrons for computation, allowing it to compute much more quickly and (theoretically) with much lower power than semiconductor-based systems. Researchers have already begun to design photonic systems specifically for neural network computation, suggesting that photonic accelerators may play a major role in future AI compute infrastructure.

There is also research into potentially disruptive technologies for the production of conventional computing. In particular, alternative materials to silicon have the potential to improve performance, lower cost, improve power efficiency, etc. Some materials, such as germanium, have the potential to offer immediate improvements within the existing transistor design ecosystem. More exotic materials such as carbon nano-tubes and graphene would require more significant changes to the well-established silicon design and manufacturing processes. These materials, however, are seeing limited early commercialization.

Areas we’re tracking: quantum and photonic hardware, software infrastructure layers for these technologies, novel silicon substitute materials

Summary/Conclusions

The rapid growth in compute demand, driven in part by the AI boom, has created the need for innovations across the silicon ecosystem. In the near term, innovations will be needed to help data centers overcome power, cooling, and other challenges to continue growing to meet the demand. Technologies to address the networking and memory throughput bottlenecks will become increasingly critical. As AI models are more widely deployed, more specialized chips are likely to emerge to accelerate model computations without prohibitive power and cost increases. This will be especially true for edge devices where drastic SWaP limitations create the need for entirely new computing paradigms. All of these new requirements will push innovation upstream in the silicon ecosystem, precipitating the need for new design and fabrication technologies. While conventional processing performance will continue to improve, many future computing tasks are likely to be serviced by novel technologies like quantum and photonic computing.

Prime Movers Lab invests in breakthrough scientific startups founded by Prime Movers, the inventors who transform billions of lives. We invest in seed-stage companies reinventing energy, transportation, infrastructure, manufacturing, human augmentation, and agriculture.

--

--