The Future of Computing and AI Looks to 3D Architecture
Computing can be traced back at least as far as the antikythera mechanism of ancient Greece. But as mechanical devices were supplanted by electronic valves, and then by semiconductors, the power of computing grew and grew.
We often equate computer power with the number of gates (the individual switches that are the basis for logic and memory) on a chip. To increase the number of gates on a chip, the physical size of each element of a gate must be made smaller. This type of scaling, however, has become increasingly difficult. Chips are more expensive to make, and in some cases, the gains are smaller.
So if making things smaller becomes impossible, is that the end of the road? No.
One way to continue improving integrated circuits is to redesign the very structure of the computer chip itself. Processors can only process if they are fed with information. But getting that information to the right place has been a longstanding problem in computing.
“[What] is the biggest bottleneck in computation today? It’s the memory bottleneck where you have to go and grab data. Traditionally, the memory chip is separate from the processor chip,” explains Subhasish Mitra, a professor in the departments of computer science and electrical engineering at Stanford University.
Consider a single gate. It is either part of the memory, where information is stored, or it is part of the processor. The processor is where information is processed. For information to be processed, it must first be retrieved from memory and transported to the correct logic circuits.
All of this costs time and, just as important, energy.
Waiting for Delivery
This is usually referred to as the memory bottleneck. The point is that your computer might consist of several processors that are distributed across a couple of chips. Each processor has a small amount of nearby memory. These caches are quite fast, but they are small.
As soon as information is required from outside the cache, a call goes out to main memory. But that is on a separate chip, and it is considerably slower than your processor.
So the wait begins…and continues…
Integrated circuit designers are not stupid. To overcome the memory bottleneck, they use part of the logic on the chip to predict what data will be required. On the programming side, information that is used together is stored in close proximity so it can be called up more efficiently. But the problem remains.
“Caches are really efficient at certain things,” says Max Shulaker, a graduate student studying with Mitra in Stanford’s electrical engineering department. But “when you look at these exciting [new] applications, they are dealing with data that’s very unstructured. You don’t really have locality in time or space, which means that using these caches constructively is very, very challenging,” explains Shulaker.
To be fair, it is not like researchers have suddenly turned their attention to speeding up memory access. It’s just that it has become more important to solve it. The solution, although obvious on the surface, has been waiting for several technological developments in how memory is made and how integrated circuits are designed to reach maturity.
Shulaker describes the problem: “It’s not just about stacking a whole bunch of memory on top of logic, you also need to actually connect them. You need to be able to pass lots of data back and forth between the logic and memory,” and “that requires extremely fine interconnects between logic and memory.”
A Vertical Filigree of Wire, Memory, and Processing
Ultimately, the solution is twofold: Bring the memory closer to the logic, and increase the speed and parallelism of data transfer. This sort of solution has to be on-chip, and the most likely approach is to move to a 3D layout. The basic idea is reasonably simple: Alternate layers of logic with layers of memory.
If this idea is so simple, then why is the future not the present?
Three-dimensional structures have been part of integrated circuit production for a long time. This is because the density of gates is too high to allow all the wiring between gates to be put on the same layer, so a current integrated circuit consists of a single layer of logic, but the wires that connect all the gates together are in several layers above the logic. In that sense, 3D is already here.
But some of the steps in standard integrated circuit processing require high temperatures. Laying down the chip logic first protects the metal wires that connect the gates from the high temperatures. However, a second layer of logic or memory would mean that a second high-temperature step is required, which would damage the wiring. The point being that a 3D monolithic design is trickier than it seems.
Going 3D beyond wiring has usually involved gluing separate bits of silicon on top of each other. This works pretty well, as long as you don’t look too closely.
The problem is that two separate chips need to be very carefully aligned and may not drift as the chips are joined. Since the alignment is not that great, the wires connecting the two chips have to be quite widely separated and fairly large. Effectively, the two chips — one memory, one logic — are in very close proximity, but the number of connections is too small to offer enough bandwidth to stop the logic chip from starving of data.
However, the direction Shukalar and Mitra are investigating reduces that alignment problem by orders of magnitude. By stacking logic and memory on the same chip, alignment is governed by the accuracy of repeated photolithographic steps, which chipmakers use to fabricate chip. Photolithography uses light to project an image of the circuit pattern onto the silicon wafer. Chemicals on top of the silicon respond to the light, allowing later processes to create patterns of logic and wiring.
Engineers are brilliant at this: Current technology requires that each layer is positioned to within just a couple nanometers of previous layers. That is hundreds of times more accurate than can be achieved by stacking different chips.
By making use of the inherent accuracy of photolithography, memory and logic are not just close but also densely connected. Effectively, the connection between memory and logic in this configuration is more like the connections between different logic gates in current processors.
With this breakthrough, computing power can continue to grow without the pain of shrinking the size of logic gates. In fact, as long as the different layers can be lined up accurately—something called “overlay accuracy”—then processors consisting of multiple interleaved layers of logic and memory may be in our future.
But 3D stacking—and the resulting high-speed, high-volume memory access—has much bigger implications than just quicker memory access.
Consider the difference between the brain and artificial neural networks, which try to reproduce some brain-like features. The basic building block of the brain is the neuron, which is not a logic gate. Another difference is that in the brain, as far as we can tell, memory and processing are not strictly separated.
Yet neural networks today rely on off-chip memories, while highly abstracted models of neurons are implemented using many logic gates. We won’t even get into the problems of scale and interconnectedness of neurons in the brain compared to neural networks.
The point is that by having memory on-chip, readily accessible models of neurons can include elements of memory. But beyond that, a 3D architecture also allows for different processing units (or artificial neurons) to be connected more efficiently and densely.
This may be a technology that truly enables the leap from today’s processors that are fast but dumb to a future where processors are fast and smart. It is incredible to think that this one problem — rapid access to data — may unlock such abstract concepts as hyperdimensional computation (a concept that, as far as I can tell, tries to implement brain-like computing in silicon, but there is limited layperson-level information available about it).
“For a hyperdimensional computation model, it’s an even bigger problem, because you have to have computation immersed in memory. This kind of 3D architecture is even more suitable for that. For any kind of neural-inspired, man-inspired computation model, this notion of very fine and dense connectivity between computation and memory is key,” says Mitra.
Sucking Out the Heat
That said, the future has a way to go. Every gate uses some energy, and keeping processors cool will only become more challenging when multiple layers of logic are buried in the center of the chip.
In traditional chip architecture, the logic layer is the bottom layer and sits upon a layer of crystalline silicon. That crystalline silicon has pretty good heat conducting properties, simplifying heat extraction. In contrast, a 3D chip puts layer of glass (either silicon oxide or hafnium oxide) between each layer. These layers are not just electrical insulators; they are also very good thermal insulators. As a result, the power used by each gate is very slow to move to the crystalline substrate.
In the first step, memory on top of logic, Mitra doesn’t believe that getting rid of the heat will be a problem. But for a more advanced design, where multiple layers of logic are working hard on computation, then heat removal becomes a serious issue. “Those computing engines on the upper layers will be running at a lot of speed, and they will doing a lot of number crunching. That will create a lot of heat,” says Mitra.
So, new materials are required not just for reasons of speed or fabrication, but also to keep chips running cool.
That said, the process developed by Mitra, Shukular, and a veritable army of researchers looks very promising. And who knows, if the memory bottleneck is truly gone, and power consumption is kept under control, then, just maybe artificial intelligence will live up to its promise.