Finding the Next-Moore’s law in Future Compute Systems

Steel Perlot
7 min readNov 3, 2023

--

Matt Robbins (Steel Perlot, Semiconductor Lead)

The technical progress that has been made over the last 50 years can largely be attributed to Moore’s law, an empirical law stated first by Intel’s Gordon Moore in 1965 that the number of transistors on a chip will roughly double every two years while the total cost stays the same. While most in technology fields are familiar with this relatively simple scaling law, many are less familiar with the underlying physical principles which enable it. This article dives into the basic physical principles which have powered Moore’s law, outlines a framework for finding platforms which exhibit physical principles which resemble Moore’s law and could ignite the “next Moore’s Law”.

Physical Principles Powering Moore’s Law

At its core Moore’s law is very simple, if we shrink the dimensions of a transistor then we can fit more of them on a chip and the relative cost/transistor goes down because of the parallel nature of chip manufacturing. The more transistors on a chip the more operations it can perform and complexity of functions increases. This central tenet of Moore’s law has been driven by amazing advances in engineering and manufacturing over the last 50 years; however, equally important, but less widely appreciated scaling principles of transistor physics are the real hero lying under the cases of our smartphones and laptops. To understand these principles it’s first important to understand the standard metrics that chips and their underlying transistors are graded on. The most important of these are power, performance, and area, often shortened to PPA and summarized below:

  • Power: Power consumption of a chip, circuit, or active transistor. — lower is better
  • Performance: Speed/operating frequency or throughput of a chip, circuit or active transistor. — higher is better
  • Area: Area occupied by chip, circuit, or transistor — smaller is better

PPA is a metric that can be optimized at multiple levels of the chip design process but is ultimately limited by the transistors that make up the chip. In order to optimize for PPA, transistors should be small, fast, and power efficient, with an emphasis on the latter. In fact, Moore’s law would not be possible without power efficiency scaling as increasing transistor density without lowering power would be unsustainable. As an example, the world’s first commercial microprocessor called the Intel 4004 contained around 2,300 transistors, consumed around 0.5 watts of power, and operated at 100 kHz. A modern day processor such as the newly released A17 processor that powers the iPhone 15 pro contains approximately 19 billion transistors. If power efficiency was never improved with the shrinking of transistors, the A17 chip would consume at 4 MegaWatts of power, the equivalent of roughly 3,000 homes (if it didn’t first instantly melt under the massive amount of heat produced). Of course, the A17 chip does not blow up in our pockets and only consumes around 3 watts all while operating at a 10,000X higher frequency (2 GHz). Why is this? Let’s dive into the underlying device physics.

A transistor, the fundamental building block of a microprocessor, is a 3-terminal device in which a capacitive gate with capacitance (C₉ₐₜₑ) is used to modify the current flow between two source and drain electrodes. The supply voltage of a transistor, often called V𝒹𝒹, is set such that the transistor can be switched from the fully-OFF state where resistance is high to an ON-state where resistance is low (Rₒₙ). We know from the PPA metric that as we scale transistor size we need lower power and we’d like higher speed.

Let’s walk through the operation of the device and some critical relationships that show us power can be decreased and frequency increased with scaling.

(Also summarized in Figure 1)

  1. A transistor is turned on by setting the gate and drain voltage to V𝒹𝒹
  2. A few relationships describe the speed at which a transistor is turned on and the average power consumption.
  3. fₗᵢₘᵢₜ=1/RₒₙC₉ₐₜₑ (1) The maximum speed at which a transistor can be turned on and off is determined by the RC delay.
  4. P=f*C₉ₐₜₑ*V𝒹𝒹² (2) describes the average switching power of the device
  5. From these equations it is clear that reducing C₉ₐₜₑ and V𝒹𝒹 reduces power consumption and can increase speed provided RON can be kept constant.
  6. Now let’s look at what happens at we start to scale the width W and length L of a transistor by 1/S, thus the transistor area by 1/S².
  7. C₉ₐₜₑ= W*L * ε/tₒₓ where ε is the dielectric constant of the gate oxide and tₒₓ is the thickness. This means as transistor area is scaled by 1/S², C₉ₐₜₑ is also scaled by 1/S².
  8. Rₒₙ= V𝒹𝒹/Iₒₙ ∝ L/W so if W and L both decrease by 1/S it has no effect on Rₒₙ
  9. Circling back to equations (1) and (2) it is clear that power decreases by the same factor as area while Rₒₙ stays constant! This physical feature of transistors alone enables Moore’s law scaling and allows transistor density to indefinitely while keeping power constant as long as operating frequency is kept constant.

But it gets even better, if we leek deeper at the equation for Rₒₙ that is derived from the voltage-current relationship in a transistor we see that Rₒₙ∝ L*tₒₓ/W*V𝒹𝒹. This relationship says that we can decrease V𝒹𝒹 by 1/S and maintain a constant Rₒₙ as long as tₒₓ is also scaled by 1/S. By doing so we can maintain the 1/S² decrease in power with transistor scaling while also obtaining a frequency increase of a factor S.

Fig.1 Graphical summary of Dennard scaling displaying how power and frequency both scale favorably when transistor dimensions are reduced.

So there it is, not only could we make more transistors every year at the same cost, they would also perform faster and more efficiently.

This scaling theory became known as Dennard scaling and by taking advantage of these principles and dedicating massive resources to reducing the transistor gate length, gate width, and gate oxide thickness, the world unlocked decades of massive computation growth (see figure 2) which has become the foundation of our modern world and created trillions in economic value.

Fig.2 Plot displaying how key microprocessor metrics have scaled over time as a result of Moore’s law and Dennard scaling.

Unfortunately, things this good cannot go on forever, and Dennard Scaling broke down in 2006 as shown by the flattening of the frequency curve in figure 2. Eventually V𝒹𝒹 could no longer be scaled due to thermodynamic limitations related to the minimum voltages required to turn a transistor on. As a result, in order to continue increasing transistor densities at a constant power, frequency scaling was halted and new device architectures such as FinFETs have been introduced which allow for incremental power improvement. As transistor density has continued to increase at a constant frequency, compute capacity has scaled via increased core-counts allowing increased parallelization.

Currently Moore’s law is in a “bend, but don’t break phase” where it is becoming more and more costly to achieve diminishing levels of transistor density increases. And while it appears that the likes of TSMC and Intel will continue finding ways to push limits and keep Moore’s law limping along, the applications of today such as AI are demanding more than today’s version of Moore’s law can give. Leaving the likes of Nvidia to pursue other means of increasing compute system performance such as multi-chip-modules. All of this raises the question, is there another “Moore’s law” waiting to be discovered? Or are there similar scaling principles in other technical platforms that have yet to be fully exploited? Where will the next 1000X advancement in computing technology come from? The next section will propose some general principles inspired by Moore’s law which can be used to evaluate a platform’s scaling potential.

General Scaling Principles

As discussed previously the big reason for the success of Moore’s law is the breaking of tradeoffs between performance, power, and cost in ways that are generalized below and can be applied when evaluating the scalability of emerging technology systems:

Systems that can increase in performance over time at a fixed cost.

To enable applications to be deployed at scale, cost is a top consideration. The famous “Software will eat the world” thesis by Marc Andreeson was entirely based on the fact that compute had become cheap enough that new software applications could be profitably deployed. As an example, cheap compute allows AI applications such as ChatGPT to be deployed for millions of users. However, scaling model size is limited by cost, and future AI systems will need even cheaper compute to unlock their full potential.

Systems where performance can be increased at fixed power.

Physical systems generally exhibit trade offs between power and performance (speed). Moore’s law broke this tradeoff allowing decades of scaling. Future scalable systems must possess means of scaling performance at a constant power in order to be sustainable.

Systems that scale linearly in complexity, power, or cost, but superlinearly in performance.

This qualitative idea is fundamentally important to chip technology scaling. Moore’s law provides an excellent example of this, in that every generation new processes are developed to decrease a minimum dimension by 2x for example, but the benefit that is achieved by accomplishing this is 2x squared or 4x. In the Dennard scaling era we saw an even higher multiplier as frequency could additionally scale by 2x. Because of this, the high continued investment in advancing chip manufacturing node produces a large ROI when successful.

Insatiable Demand for performance

Compute performance has scaled by >1,000,000,000X over the last 50 years and we still have an insatiable demand for more in order to satisfy new technology applications such as AI. Physical scaling principles don’t matter if scaling a technology gives diminishing returns and there is no demand for 1000x scaling.

While there are certainly many technologies in the semiconductor and computing space being developed that don’t exhibit these scaling laws, that will solve important problems and generate value, the next Moore’s law(s) will ultimately be the technologies that change the world. As we at Steel Perlot build and invest in future chip technologies, we are inspired by Moore’s law and the general principles behind it to find systems that can continue to scale over decades rather than years and impact the world in ways we can only begin to imagine.

--

--

Steel Perlot

Steel Perlot builds and backs the defining platform technologies of our time.