The State of Microprocessor Cooling Systems
A Technical Literature Review
This literature review was originally written in December 2017 by Matthew Cheung, for the graduate-level course ME 290R (Topics in Manufacturing — Nanoscale Manipulation of Materials) at the University of California, Berkeley.
Abstract
Historically, microprocessor die size has stayed relatively constant. While performance has increased due to the increased number of transistors, total microprocessor package heat production levels have stayed relatively stable. Thus, in order to further increase computational density beyond increases due to Moore’s law, microprocessor cooler manufacturers have been pushing to make coolers smaller, even attempting to integrate cooling systems into the microprocessor dies themselves. This review discusses a brief history of microprocessor cooling and then touches on the effectiveness of current technologies.
Index Terms — Electronics cooling, Jet impingement cooling, Liquid cooling, Microfluidic cooling, Microprocessors.
I. Introduction
Moore’s law states that the number of transistors per unit area on an integrated circuit (IC) doubles approximately every 18 months. However, due to the exponential nature of the growth, it is unlikely for it to continue forever. In fact, as transistors become smaller and smaller, problems arise as we approach atomic limits. In order to keep up with increasing demands of increased computational density, heat because a larger and larger concern.
Most computer components have some form of cooling. Heat is produced as a result of some inefficiency in an IC. Electrical resistance is a large contributor to heat production. Though computer components have seen large strides in increased performance and efficiency, cooling is still necessary.
II. How Heat is Produced
During a computer’s operation, heat can cause issues to emerge. For example, if a processor is under a high workload and the cooling system cannot sufficiently dissipate heat fast enough, the processor may “underclock” itself by lowering the frequency in an attempt to lower the processor package temperature to a safe operating temperature. However, if the temperature still continues to rise, modern processors will completely turn off the system, in an attempt to save the hardware from dangerous operating temperatures.
To understand how the majority of the heat is generated from integrated circuits, we look at transistors. The static power consumed, P_S, due to leaking current and supply, due to leaking current and supply voltage is given by the equation:
where V_CC is the supply voltage, and I_CC is the current into the device. The power consumed while switching from one logic state to another is the transient (dynamic) power, P_T, and is given by the equation:
where C_pd is the dynamic power-dissipation capacitance, V_CC is the supply voltage, f_I is the input signal frequency, and N_SW is the number of bits switching. The power that is consumed due to charging external load capacitance, P_L, is given by the equation:
where N is the number of bits, C_L_n is the load capacity of bit n, f_O_n is the output frequency of bit n, and V_CC is the supply voltage. Finally, the total power consumption, P_Total, is an additive combination of (1), (2), and (3):
where P_S is the static power consumed, P_T is the transient power consumed, and P_L is the external load capacitance, all of which are given above [1].
Though some amount of the total power consumption is used for electrical work, the rest of the total power consumption is converted to heat. The maximum designed heat that is produced by a chip, under any workload, is often given by the chip manufacturers as the thermal design power (TDP). Hardware designers need to know the TDP of the chips that they will be working with because they need to know how much heat their cooling systems need to dissipate.
III. Traditional Methods of Cooling
A. Passive and Forced Heat Sink Cooling
Early microprocessors like Intel’s first central processing unit (CPU), the Intel 4004 (released in 1971), were passively cooled. The body of the CPU was more than sufficient at dissipating the heat to the surrounding environment. This cooling plan was sufficient was until the Intel 80486 and i486 series of CPUs (released in 1989). Some of the higher performance 486-series CPUs shipped with passively-cooled heat sinks, while the slower or “ultra-low power” CPUs did not. Then one generation later, all Intel Pentium P5-series processors (released in 1993) shipped with active fan-cooled heat sinks. Virtually all CPUs sold in or after 1993 could not cool itself as the Intel 4004 did with no heat sink. All post-1993 CPUs required some form of cooling.
Early heat sinks were simple finned devices. Fins would often be flat plates or straight pins, angled from the base, and would be made of aluminum (or copper if extra cooling performance is needed). Occasionally, if the heat sink is meant to be a passive-heat sink, the designer may specify that the heat sink be anodized matte black to take advantage of increased black body radiation. Electronics companies has seen six to eight percent increases in cooling performance, due to the matte black anodization, across many different passive heat sink designs for their computation equipment [3].
When passive-heat sinks were not sufficient enough for cooling, airflow was increased through the use of fans. Ducting is also implemented to seal airflow channels, increasing effective airflow to the heat sinks. The use of ducting is particularly prevalent in servers and data centers, where hardware operators are very concerned with using all of their energy efficiently.
If a heat sink is too large, then heat will not be able to reach the far ends of the heat sink, as it would escape the heat sink before it reaches the far ends. In other words, there is a limit to the amount of heat a heat sink can expel. However, heat pipes get around this by quickly and effectively transferring heat from the CPU, evenly throughout the rest of the heat sink by way of evaporative cooling. As the encapsulated coolant liquid heats up near the CPU, the coolant evaporates turning into a gas and travels to the other side of the heat pipe. The coolant gas cools of and condenses back into a liquid and returns to the CPU side of the heat pipe. Throughout this process, heat is extracted out of the heat pipe and transferred to heat sink fins for removal from the system. The thermal conductivity of a heat pipe can reach as high as 100,000 W/(m*K). For comparison, the thermal conductivities of aluminum and copper are 204 and 386 W/(m*K) respectively. Heat pipes essentially make heat sinks able to dissipate more heat [3], [4].
B. Liquid Cooling
Finally, the last major form of IC cooling is liquid cooling. A heat exchanger is placed on top of the IC to transfer heat from the microprocessor to a fluid, usually water due to its unusually high specific heat capacity and low cost (but can be other coolants for higher performance). The coolant is transported from the IC to a radiator to extract the heat out of the system. Then the coolant is transported back to the IC to repeat the cycle. Short-term cooling is vastly improved due to the coolant in the loop having a high thermal mass. Long-term cooling can be improved (in comparison to air cooling) if the radiator is more efficient at extracting heat than an air-cooled heat sink that it replaces.
IV. Emerging Cooling Technologies
A. Microfluidic Cooling
Microfluidic cooling is essentially a more efficient form of liquid cooling. In liquid cooling, there is a heat exchanger on top of the IC. This heat exchanger transfers the heat from the IC to the liquid. Microfluidic cooling gets rid of this interface and has the liquid flowing directly through the IC. This allows the liquid in the loop to pull heat from the microprocessor sooner. Microfluidic cooling also takes advantage of the fact that there is more surface area to directly extract heat from the IC.
1) DARPA’s ICECool Program
Microfluidic structures were first looked at in the early 1990s. The first patent involving microfluidic structures was filed on 8 May 1991 and was granted on 14 Nov 1991 to B. Ekström, G. Jacobson, O. Öhman, and H. Sjödin [5]. However, microfluidics was more intensely looked at starting in 2008 when the U.S. Department of Defense’s Defense Advanced Research Projects Agency (DARPA) announced its interest in microfluidics for cooling ICs with its Thermal Management Technologies program. The program attracted large industry partners. One such partner was GE Global Research.
Between 2011 and 2012, GE Global Research was working to demonstrate single-phase cooling of a gallium nitride (GaN) on silicon carbide (SiC) power amplifiers with microchannels etched into the SiC substrate within 50 μm of the GaN.
Seeing the positive results of GE Global Research and other teams’ efforts, DARPA initiated their Intra/Interchip Enhanced Cooling (ICECool) program to specifically explore two different architectures for microfluidic cooling. DARPA’s goal was to see heat dissipation of at least 100 W/cm² for high performance ICs.
One benefit of the ICECool Interchip concept, over the Intrachip concept, is that multiple ICs can be stacked on top of each other in a three-dimensional array. This allows for vastly larger computational densities. Fig. 4 illustrates this benefit. Currently 1U rack-mounted, air-cooled servers can have as many as eight separate CPUs on the motherboard. In the near future, it would not be unreasonable for 1U liquid-cooled servers to have between eight to 10 CPU sockets.
International Business Machines Corporation (IBM) and the Georgia Institute of Technology (Georgia Tech), as a team, were awarded the ICECool contract in late 2012. Using DARPA’s ICECool Interchip concept, the IBM/Georgia Tech team was able to demonstrate a 90% reduction in cooling energy and a 14% reduction in computational energy, compared to traditional refrigerated air-cooling methods for data center applications.
Thought of another way, data centers with ICECool cooling systems could either reduce electricity costs for the same amount of computational power, or they could keep the electricity expenditure the same and increase computational power. Though an initial investment in an ICECool system would be pricy, a data center could extend the usable life cycles of ICs, potentially lowering the overall cost to the data center operator.
One way the IBM/Georgia Tech team decreased cooling energy by 90% was through the use of two-phase liquid cooling over single-phase liquid cooling. Using the same principle as heat pipes, two-phase liquid cooling gains extra cooling capacity via the phase change from liquid to gas. An added benefit is that the gas travels much more quickly than the liquid. Most three-dimensional IC designs, though not all, should see superior cooling performance with two-phase, over single-phase liquid cooling [7]. A finding by [8] indicates that ICs with intense hot spots may actually see degraded performance with two-phase liquid cooling. As hot spots will create more vapor than other areas, discouraging flow towards the hot spots.
A separate Georgia Tech team was able to show that microfluidic cooling for a three-dimensional IC stack was much better at removing heat from the ICs than traditional air cooling. They demonstrated that microfluidic cooling was capable of cooling multilayer ICs with at least 200 W/cm² in power density [9].
In one of their studies, the Georgia Tech team attempted to replicate a 790 W/cm² system found in [10], but were unable to. In this particular study, they were only able to achieve 100 W/cm² [11].
There has been some work making microfluidic liquid cooling more effective at cooling ICs. To effectively transfer heat from an IC to the fluid, fins protruding form the underside of the IC are often employed. IBM Research in Zurich, Switzerland did some simulation on pin fin geometry and spacing. They found that depending on IC conditions (near or far away from hot spots, near corners, or other considerations), they could increase heat flux by using different architectures. For example, their proposed four-port fluid delivery architecture had the best cooling performance in the corners of the three-dimensional IC stacks. Also, redirecting coolant to intense hot spot areas improved cooling. Fig. 5 shows such a guiding structure. The Zurich IBM team was able to achieve up to 250 W/cm² by altering the pin fin density and using their four-port fluid delivery [12].
2) Outlook of Microfluidic Cooling
In March 2013, Prof. K. Goodson of the NanoHeat Lab at Stanford, gave an industry briefing on then-current cooling technologies. Most of the developments have been through DARPA’s Thermal Management Technology programs like ICECool and Heat Removal by Thermo-Integrated Circuits (HERETIC).
Goodson noted that in 2005, the level of cooling that was available commercially available was about 100 W/cm². An example of a widely available product in 2005 that featured this level of cooling was the Apple Power Mac G5.
In the near future (approximately the end of 2020), Goodson expects microfluidic cooling to break the 300 W/cm² point [13].
B. Jet Impingement Cooling
Jet impingement cooling systems cool a surface with an impinging fluid (usually air, water, or other proprietary fluids) jet. This method of cooling is desirable due to its high heat transfer and relatively simple hardware. Because of this, electric vehicle companies actively try to implement jet impingement cooling systems for their vehicles’ electronic hardware, space permitting. The fluid in this case is most often ambient air from outside the vehicle. Even the hottest recorded temperature of 56.7°C (measured at Greenland Ranch, Death Valley, CA, USA on July 10, 1913) is sufficient for cooling electronic equipment [3]. For reference, Intel CPUs often have a junction temperature, the maximum temperature allowed for a processor die, of 100°C [14].
The impinging fluid hits the hot surface at (usually) a 90° angle. This has a couple of advantages compared to parallel flow, like in standard liquid cooling systems. Firstly, this perpendicular flow guarantees that more fluid molecules directly contact the hot surface, giving the fluid more opportunities to effectively transfer heat. Secondly, jet impingement produces thinner boundary layers, thereby increasing the thermal gradient. And finally, the impingement produces more turbulent fluid. This combination of more fluid molecules contacting the hot surface, increased thermal gradient, and more turbulent fluid produces heat transfer coefficients that are up to three times higher than parallel flow cooling systems, for the same flow rate per unit area. Thought of another way, to keep the same amount of heat transfer, a jet impingement cooling system can have the fluid travel at much slower speeds, saving on energy spent on running the cooling system, and/or lowering noise signatures.
There are two major methods of cooling using jet impingement cooling: single and double-phase jet impingement cooling. Single-phase jet impingement cooling is what has been described thus far. Double-phase has the fluid transition from a liquid to a gas once it contacts the hot surface. This method exploits the fluid’s (usually water) large enthalpy of vaporization [15].
The most cooling occurs near the center of the jet. Thus, if a chip designer anticipates that there will be a hotspot in any particular location, the designer can point the jet directly at the hotspot. Also, if more cooling is needed — potentially at different spots — it follows that multiple jets can be used. Taken to the extreme, it is also possible to make use of many microjets as the NanoHeat Lab at Stanford demonstrated. Wang et al. showed that with a 5 mm square microjet array of four 76 μm diameter and 300 μm long holes was able to achieve 90 W/cm² with a flow rate of 8 mL/min. With higher flow rates, 200 W/cm² (65 mL/min flow rate) and 790 W/cm² (516 mL/min flow rate) is possible [16].
Double-phase jet impingement cooling, using microjet arrays, is quantifiably the most effective way to cool ICs.
V. Conclusion
DARPA and partners in industry anticipate high performance ICs will have a power density of only 100 W/cm² in the near future. This trend is backed up by Intel’s historical TDP data for recent (2000 to 2018) desktop and server CPUs in [14]. However, it is not unreasonable that ICs could eventually surpass this level. Thus, there will always be a push for better and better cooling.
At the moment, there seems to be a 250 W/cm² limit in microfluidic cooling. However the claim of 790 W/cm² by [12] informs us that there is still work to be done with microfluidic cooling. A conservative estimate, 300 W/cm² is expected, by academia, to be reliably achieved by the end of 2020.
Double-phase jet impingement cooling with microjets is by far the most promising form of cooling, with a demonstrated 790 W/cm² level of cooling. With this surplus of cooling capacity, data center operators could overclock their ICs for vastly increased performance without purchasing any new equipment. A “cooling factor” of 7.9 times gives data center operators a massive amount of thermal margin for increasing clock frequencies. A 50% increase of a chip’s clock frequency would not be unreasonable for this level of cooling. Thus, we should expect more work to be done on double-phase jet impingement cooling with microjets with potentially more widespread adoption by industry.
Special thanks to Prof. Hayden Taylor who taught ME 290R in Fall 2017 and Junpyo (Patrick) Kwon for providing feedback on early drafts.
Matthew Cheung is a Product Design Engineer on the iPhone PD team at Apple. Previous roles include Falcon 9 Structures at SpaceX, Input Devices PD at Apple, Autopilot PD at Tesla, and Accessories PD at Boosted. He studied Mechanical Engineering at the University of California, Berkeley.
The only notable changes between this article and it’s December 2017 version are minor word choice updates and the addition of the first image at the start of the article.
Further Reading?
Interested in seeing what it’s like to be an Engineer in Silicon Valley?
References
[1] A. Sarwar, “CMOS Power Consumption and Cpd Calculation”, Texas Instruments, Dallas, Texas, United States, 1997.
[2] M. Lin, “Asetek Low Cost Liquid Cooling (LCLC) System — Page 3 | HotHardware”, Hothardware.com, 2008. [Online]. Available: https://hothardware.com/reviews/asetek-low-cost-liquid-cooling-lclc-system?page=3. [Accessed: 16- Nov- 2017].
[3] J. Cheung, “Heat Sink Performance”, San Francisco, California, United States, 2017.
[4] H. Akachi, “Structure of a heat pipe”, US4921041, 1990.
[5] B. Ekström, G. Jacobson, O. Öhman and H. Sjödin, “Microfluidic structure and process for its manufacture”, WO1991016966A1, 1991.
[6] A. Bar-Cohen, J.J. Maurer and J.G. Felbinger, “DARPA Intra/Interchip Enhanced Cooling (ICECool) Program”, in CS ManTech Conference, New Orleans, Louisiana, United States, 2013, pp. 171–174.
[7] T.J. Chainer, M.D. Schultz, P.R. Parida and M.A. Gaynes, “Improving Data Center Energy Efficiency With Advanced Thermal Management”, IEEE Transactions on Components, Packaging and Manufacturing Technology, vol. 7, no. 8, pp. 1228–1239, 2017.
[8] Y.J. Kim, Y.K. Joshi, A.G. Fedorov, Y. Lee and S. Lim, “Thermal Characterization of Interlayer Microfluidic Cooling of Three-Dimensional Integrated Circuits With Nonuniform Heat Flux”, Journal of Heat Transfer, vol. 132, no. 4, p. 041009, 2010.
[9] C.R. King, J. Zaveri, M.S. Bakir and J.D. Meindl, “Electrical and fluidic C4 interconnections for inter-layer liquid cooling of 3D ICs”, 2010 Proceedings 60th Electronic Components and Technology Conference (ECTC), 2010.
[10] D. Tuckerman and R. Pease, “High-performance heat sinking for VLSI”, IEEE Electron Device Letters, vol. 2, no. 5, pp. 126–129, 1981.
[11] Y. Zhang, A. Dembla, Y. Joshi and M.S. Bakir, “3D stacked microfluidic cooling for high-performance 3D ICs”, in 62nd Electronic Components and Technology Conference, San Diego, California, United States, 2012.
[12] T. Brunschwiler, S. Paredes, U. Drechsler, B. Michel, B. Wunderle and H. Reichl, “Extended tensor description to design non-uniform heat-removal in interlayer cooled chip stacks”, in Thirteenth InterSociety Conference on Thermal and Thermomechanical Phenomena in Electronic Systems, San Diego, California, United States, 2012.
[13] K. Goodson, “Are you Leveraging Department of Defense Funding? Executive Briefing”, Stanford University, 2013.
[14] “Intel Product Specification”, Intel Automated Relational Knowledge Base (Product Specs), 2017. [Online]. Available: https://ark.intel.com/Search/FeatureFilter?productType=processors&MaxTDPMin=0.025&MaxTDPMax=300. [Accessed: 01- Oct- 2017].
[15] A. Azizi and M. Moghimi, “Impingement Jet Cooling on High Temperature Plate”, Iran University of Science and Technology, 2016.
[16] E. Wang, L. Zhang, L. Jiang, J. Koo, J. Maveety, E. Sanchez, K. Goodson and T. Kenny, “Micromachined Jets for Liquid Impingement Cooling of VLSI Chips”, Journal of Microelectromechanical Systems, vol. 13, no. 5, pp. 833–842, 2004.