Image for post
Image for post

Exascale computing and its enabling technologies

Enterprise.nxt
Oct 20 · 8 min read

By Daniel Ernst

Originally published on Oct. 15, 2020, on Hewlett Packard Enterprise’s Enterprise.nxt, publishing insights about the future of technology.

It’s not just about performance; exascale computing will change the way research can be done.

As supercomputing moves from petascale to exascale performance, it brings a new generation of capabilities to fields such as scientific exploration, artificial intelligence, and data analysis. The first U.S.-made exascale computers have been contracted for by the Lawrence Livermore, Oak Ridge, and Argonne National Laboratories. Two of these systems will be coming directly from HPE Cray, while the third is being built as part of a partnership with Intel.

The seemingly sudden shift to next-generation supercomputing isn’t just a matter of winning the title in a speeds and feeds race; the technologies that are being deployed in these next-generation supercomputers are focused on making sure they can deliver on the promise exascale computing brings, not just being a benchmark champion.

Listen to The Future of Supercomputing

Where do I plug this in?

Clearly, there are new technologies at work here that are allowing performance to scale faster than power demand. In the case of Fugaku, one of the innovations at work is its use of an extension of the ARM architecture known as the Scalable Vector Extensions (SVE), designed in part through multinational exascale research efforts. SVE makes use of vector technologies, many pioneered by Cray architectures, that enable CPUs to more efficiently execute computationally heavy workloads while maintaining programmer productivity.

More efficient execution reduces the power demand by the processors. And SVE technology, by virtue of its inclusion in the ubiquitous ARM architecture, will eventually spread to a much larger diversity of platforms, particularly in places where efficient AI and media processing are critical applications.

CPU power requirements have increased from 100 to 130 to 170 and are now being delivered at more than 280 watts per socket.

One of the side effects of the industry’s work to increase the efficiency of processors is an increasing amount of silicon in every socket. This can be seen most easily in the chiplet-based products coming to market, such as AMD’s Epyc CPUs and the upcoming Intel Ponte Vecchio GPU slated to be used in the U.S. Department of Energy’s Aurora system.

Packing more silicon into a package allows compute elements to keep more compute and data close together, greatly reducing the power required to move data around. However, one consequence of this trend is that it leads to increasing the amount of power per socket and the overall power density within a system. CPU power requirements have increased from 100 to 130 to 170 and are now being delivered at more than 280 watts per socket. This trend will continue in order to maintain performance growth for the foreseeable future, which in turn puts significant pressure on server power delivery and cooling.

Modern supercomputing infrastructure has been developed to deploy better power delivery and cooling capabilities to get the most out of CPUs, memories, and accelerators. While typical power for a moderately loaded standard server rack is in the range of 30 to 40 kilowatts, liquid-cooled high-performance computing (HPC) infrastructures are capable of five to 10 times that density. Not only do these solutions reduce the data center footprint of your system, but they allow you to get full use of your CPUs and accelerators at performance levels that would not be feasible in standard racks, where power limitations often require you to leave performance on the table.

While liquid cooling has long been a staple of HPC deployments, it’s now becoming more common for platforms as diverse as desktop gaming computers to the data center industry, where deployments range from rack-mounted radiator doors to fully immersed systems where the entire rack is submerged in non-conductive fluid. The high-density infrastructure used for exascale supercomputer systems is a bellwether for the increase in adoption of liquid-cooling technologies.

Memory technology

Exascale supercomputing will help solve the world’s most important and complex challenges. Discovery enters the exascale era.

Learn more

OpenFOAM, a leading fluid dynamics framework used by many engineering and science applications, particularly in the automotive industry, is a prime example of the problem. Analysis of OpenFOAM shows its performance improves almost exclusively in proportion to memory bandwidth on modern platforms, while seeing very little improvement from increasing floating-point operations per second.

Typical server memories, such as DDR4, are incapable of accomplishing exascale bandwidth goals. Common DDR4 memory modules are neither cost effective nor, more importantly, sufficiently energy efficient for deployment in exascale computing. A memory system designed to deliver the necessary performance for exascale applications using DDR4 would consume more than 50 MW in DIMM power, with the cost of the memory alone exceeding $1 billion.

The industry realized that to reach these system-level goals in a power- and cost-effective way, a more bandwidth-focused memory technology was needed, which was a driving factor in the development of High Bandwidth Memory (HBM).

HBM technology provides an order of magnitude more bandwidth than DDR4 SDRAM and has now been adopted across the industry in solutions that target HPC and AI applications. HBM differs from typical memories in that it is integrated directly within the processor or accelerator socket instead of being purchased and populated separately. This tight integration is what enables HBM’s bandwidth to be achieved in a much more power-efficient way. Further, the bandwidth focus of the technology means much less of it is required to meet system-level bandwidth goals, greatly reducing the overall cost. Memory solutions using HBM are provided widely in GPUs and AI accelerators (such as Google’s TPU) today and will become more widespread over the next few years as other platforms tackle these same challenges.

Anyone have a map?

This is especially challenging because modern HPC and AI workloads tend to have hybrid components, which require multiple traffic types on a network―from high-speed message passing to high-throughput parallel storage and data ingest. As a result, the lines between highly performant HPC fabrics and the data center standard of Ethernet are beginning to blur as more diverse programming paradigms push the need for more standard network support. Further, failure to isolate these different workflow elements will result in applications being swamped by communication stragglers that were held up in traffic.

Solutions are emerging in the industry to handle network congestion. While data center networks have had basic congestion notification mechanisms (such as ECN) for a while, these technologies didn’t react to congestion at the speed required to handle high-performance traffic. New congestion management technologies that dynamically mitigate or prevent network traffic interference at HPC-relevant speeds are being brought to market now, with HPE’s Slingshot as one example. These fabrics will enable reliable quality of service on high-performance traffic at the system level, which will keep communications flowing in heavily used systems.

HPC interconnect solutions that address demands for higher speed and lower latency are an important component for making a collection of compute elements into a fast system. However, as scalable multinode workloads―particularly AI―become more common in as-a-Service deployments in the data center, exascale-style interconnects will be needed to maintain good performance guarantees.

Coding for performance

Thankfully, most technology vendors provide some level of software to help use their hardware effectively: compilers to optimize code, tuned versions of commonly used libraries, and tools to analyze performance and debug problems. However, care must be taken to avoid locking your developers into a vendor-specific solution, which could provide a major obstacle in moving to new and better technologies in the future.

The industry has made significant advancements in high-performance languages and frameworks, which can make porting between platforms a much easier task. This is particularly true in the AI space, where domain-specific frameworks are widely supported across nearly every hardware vendor. Compatibility with these open source projects is now seen as the price of entry for hardware vendors, while simultaneously making it easier for them to support high-performance workloads by not having to invent their programming paradigms from scratch.

This trend is also true of network-heavy data analytics use cases, with open source solutions such as Arkouda being developed to give users the tools necessary to focus on their technical outcomes instead of the logistics of full-system programming.

Perhaps more important, while prior HPC-specific language efforts focused on making their languages friendly to newcomers, these new efforts are bringing HPC performance capabilities to the languages most users already work with, such as Python. This, more than any hardware advancement, is enabling the democratization of HPC technologies all the way down to edge and mobile devices.

And here we are

READ THESE NEXT:

Here comes exascale, and it’s about to change everything

What’s with the 18 zeros? HPE’s Dr. Goh on the power of exascale computing

Discovery enters the exascale era

Supercomputers, AI, and the power of big datasets

This article/content was written by the individual writer identified and does not necessarily reflect the view of Hewlett Packard Enterprise Company.

Enterprise.nxt

Exploring what’s next in tech — Insights, information, and…

Enterprise.nxt

Written by

Exploring what’s next in tech — Insights, information, and ideas for today’s IT leaders. A Hewlett Packard Enterprise publication.

Enterprise.nxt

Exploring what’s next in tech — Insights, information, and ideas for today’s IT leaders

Enterprise.nxt

Written by

Exploring what’s next in tech — Insights, information, and ideas for today’s IT leaders. A Hewlett Packard Enterprise publication.

Enterprise.nxt

Exploring what’s next in tech — Insights, information, and ideas for today’s IT leaders

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store