You’d be forgiven for thinking that the only work I ever do with the Informatics Lab is travelling. And while there is currently a fair amount of evidence to support such thinking (visiting three different EU cities and countries in the last three months), this is not true! I just happen to be in a high-density travel period of work life right now.
This particular piece of travelling took me to the Swiss city of Zürich for the PASC19 conference — Platform for Advanced Scientific Computing 2019 — hosted at ETH Zürich; Zürich’s technology university. My reason for attending PASC19 was to present, advertise and evangelise the Pangeo scalable data processing platform, for which the Met Office is one of the key partners.
At this point it’s probably worth a quick diversion to introduce the related but different terms of HPC and supercomputing, as both terms will be used in the following article. Confusion is increased by the fact that people often use the terms HPC and supercomputer interchangeably.
- HPC: either High Performance Computer, or more generally, the concept of High Performance Computing. This can cover a fairly broad range of hardware, from an unusually powerful personal computer, to a cluster of servers, all the way up to supercomputers. A supercomputer, then, is a subset of HPC.
- Supercomputer: a specialised example of HPC hardware specifically designed to deal with very large scale, massively parallel data processing. Supercomputers are typically made up of multiple racks of reasonably standard (if high-end) computer hardware, such as CPUs and memory, but with the differentiating feature being an extremely fast network infrastructure between racks and CPUs.
The PASC Conference
The PASC conference brings together HPC administrators and practitioners from a very broad range of backgrounds and scientific disciplines. These themes run from astrophysics through weather and climate research, to fluid physics (itself covering such diverse topics as flow over turbine blades to high-energy plasmas), to biochemistry. The uniting theme of all attendees, though, is that their science is run on HPC; or, in the case of the administrators, that science is run on their HPCs.
PASC is very application-heavy. While other HPC conferences have a lot about the latest hardware and the latest vendor provision, PASC focuses on the science research that is being done on existing HPCs. These are, to refer back to the conference’s name, the practical applications of scientific computing that are of interest. Of course there are talks about hardware as well, but even these tend to have a focus on how a given hardware can enable new science, speed up existing science, or be tuned so that science can be done faster.
The theme of this year’s conference was set as “Exascale and Beyond.” This picks up on the fact that the first exascale-capable HPCs are now being built and coming online. Let’s define some terms:
- exascale: HPCs that are capable of ExaFLOPs operations, where…
- the prefix exa is the name given to 10¹⁸ — a billion billion — of whatever follows (in the same way that “mega” is 10⁶ or a million, “giga” is 10⁹ or a billion, and so on),
- FLOP stands for Floating-Point OPeration (per second); the number of calculations between decimal numbers that the computer can carry out in a second, and
- an exabyte is a billion billion bytes’ worth of data.
Thus an ExaFLOP capable HPC can perform at least one billion billion calculations between decimal numbers every second, which is a lot of calculations!
PASC has always been hosted in Switzerland, although the specific location and host institution of PASC has varied throughout its history. Now into its 7th year, PASC19 was hosted in Zürich at ETH Zürich, with previous instances of the conference having taken place in other big Swiss cities, and next year’s conference already announced as being at the end of June / start of July 2020, in Geneva. This year’s PASC conference marked a turning point in the conference’s history as the first one where more than half of the attendees were International attendees and not Swiss natives — what had started out as a Swiss HPC conference bringing together users of Swiss HPC systems (such as CSCS, who are still very much involved in PASC), has now become an internationally-recognised conference.
In Zürich, I shall…
Here are the specific things that I went to Zürich to do.
Ahead of PASC19 I visited a friend of the Informatics Lab who is fortuitously based at ETH Zürich. We spent a good time discussing Pangeo, dask and Iris and thinking about how you encourage uptake of new tools and technologies when there is little incentive to do so and it is hard to find time and support to do so. A little local knowledge is always a good thing, so at the end of a hard afternoon of thinking we all headed out to an excellent local restaurant for good local food and beer.
At PASC19 there was a minisymposium themed on Pangeo, into which my talk slotted. It presented Pangeo as the basis for an alternative to the traditional HPC model for a data API; a bold move at a supercomputing conference!
Here’s the abstract for my talk, entitled Turn Key Ephemeral Data Platforms: the Next Generation Data API, and the slides that accompanied it:
Tens of terabytes of Met Office weather forecast data is currently being served every day via the AWS cloud. This low level data contains complex probabilistic information which spans multiple scales. However, its volume and velocity make it increasingly hard for consumers to get bespoke analyses (i.e. new value) from the data. Currently, this data is being made available through traditional RESTful APIs, where users can request subsets of these datasets before being sent them across the internet. Whilst this model serves well most of our traditional consumers, it can be prohibitive to the new use cases which are necessary to justify the large up-front investment. Pangeo provides an opportunity to allow consumers to efficiently make use of these complex datasets. However, this presents a challenge: how would a public sector body justify funding a data processing platform which is designed to allow private enterprise to make money from bespoke data products, effectively subsidising their business? This talk describes work to allow Pangeo to be easily cloned and deployed by non-experts, transferring the cost away from the tax-payer to the business, effectively creating a new type of data API.
There were a number of themes evident throughout PASC19 in the sessions that I attended. Perhaps unsurprisingly these all had at least some connection to this year’s conference theme. Let’s explore these themes in some more detail.
Supercomputers remain popular
In the face of the rise of new technologies the main theme I saw at PASC19 was that the traditional supercomputer continues to be popular. On the face of it, this theme should not be a surprise, but, as introduced above, the supercomputer is just one among many options for HPC. I think this reflects that the supercomputer still remains the best overall platform for high-performance computing, especially with the challenges posed by exascale, and with the fact that a supercomputer can be reasonably easily improved with accelerators such as GPUs.
As discussed above, an exascale HPC can reach ExaFLOP performance — that is, perform at least one million billion floating point calculations every second. This brings with it a number of secondary considerations that need to be thought about now before they actually cause problems for people. These include the amount of data being produced, where to store this data, and how to use this data in downstream systems. This is a problem that affects CERN, the Met Office and ECMWF, and organisations working in many other fields.
Appropriately given the conference theme, this fact prompted a number of talks at PASC19, where various options and solutions were proposed for various different exascale considerations. Some of these talks in a sense just presented a newer, bigger HPC, which is a great way of tackling the exascale problem head-on, so long as you have budget and space to enable a sufficiently large HPC. Perhaps the more interesting talks were the ones where alternative solutions to dealing with the exascale problem were presented and explored, and it’s these alternatives that we’ll explore next.
Doing something else
Exascale brings with it many challenges, which won’t necessarily be eased just by making more compute available. A lot of the talks I attended at PASC19 presented alternatives to this to help with these challenges.
Reduced precision stood out as a non-hardware approach to tackling the exascale problem. It does this, very cleverly, by greatly reducing the amount of data to be processed! To understand this we need to take a brief detour into how numbers are represented by computers.
The numbers used in computer simulations are typically represented as signed 64-bit floats (usually called “doubles”; short for double-precision values). This means each number can be positive or negative (“signed”), with a variable-position decimal point (“float”), that takes up 8 bytes in memory (that is, 64 bits, given there are 8 bits to a byte).
Using 8 bytes in memory for a single number seems like a lot when we can’t make use of all the decimal places in the number in the final result we’re trying to calculate. To take a hypothetical example from the weather, the forecast air temperature for 2pm on Tuesday barely needs to be expressed as a decimal at all: 18.2℃ is, for all intents and purposes, the same as forecasting the temperature just as 18℃, so forecasting the temperature as exactly 18.21399864652℃ is just a little too precise for the situation.
The argument for reduced precision, then, says “we’re not using all those extra decimal places, so why include them in calculations at all?” and puts the argument into practice by using numbers with half, or even a quarter, of the precision of a double. These are called single precision and half precision values respectively. Let’s take a look at what happens to the well-known floating point value π when we express it as a double, single, and half-precision float.
- double (64-bit): π = 3.141592653589793
- single (32-bit): π = 3.1415927
- half (16-bit): π = 3.14
As can be seen, the numerical accuracy of π drops off dramatically with each reduction in precision, but each drop in precision also brings a reduction in the amount of bytes to process.
The reduction in numerical accuracy with single and half-precision values is the main risk of not using doubles in numerical calculations. Although the numerical accuracy is not needed in the end result, removing it entirely can lead to wild inaccuracies in the final result being calculated. To go back to the weather example from earlier, the eventual result calculated might end up being inaccurate by as much as 3–5℃.
A lot of talks about reduced precision numbers, then, focussed on exploring how great the forecast inaccuracies become when using reduced precision numbers in numerical simulations, and what can be done to reduce the inaccuracies while still taking advantage of the lower memory usage with reduced precision numbers. Specifically, careful use of reduced precision numbers can lead to models being quicker to run without significant reduction in accuracy.
Exotic hardware accelerators
This was the other major example of alternatives to HPC with more compute. Augmenting traditional CPU-based compute was a very popular theme at PASC19, with all sorts of different options presented on. Let’s take a look at the contenders…
Arm: Arm is best known for providing processor architectures used in your smartphone. Recently, however, Arm-based processors have started making inroads into high performance computing. Key examples of this are the Marvell (Cavium) Thunder X2 processor and the Fujitsu A64FX processor, which will be used in the Post-K supercomputer.
Testing of HPC with Arm-based processors showed they had comparable performance to more mainstream Intel Xeon processors, showing slightly better performance in some cases and slightly worse performance in others. Arm-based processors were also easy to start using, and are noticeably cheaper than similar offering from Intel. The main difficulty with using Arm-based processors comes from the software stack. As Arm-based processors have a different microarchitecture to Intel processors, code must be re-compiled to run on them, and a different set of libraries and compiler toolchains are needed, which are not necessarily as optimised as the Intel offering.
GPU: Graphical Processing Unit, better known as a graphics card. For many years the only thing graphics cards were really used for was playing Crysis at ever-improved frame rates. It turned out though that the maths needed to display high-quality video graphics is exactly the same as the sort of maths needed in a number of high-performance computing applications. As GPUs have been developed over many years to solve such maths better and better, they are now very good at solving this sort of maths, so GPUs are now reasonably well-known as hardware accelerators for HPC applications.
The big advantage of GPU is the performance you get, which can be significantly faster than running the same calcuations on CPU. The main downsides with GPU is they use a lot of power compared to CPU and can be hard to use as they require niche programming experience to extract the full performance they are capable of providing.
In recognition of the difficulties present with using graphics cards to the full available performance, the graphics card manufacturer NVidia have started working on making core libraries of the scientific Python ecosystem natively support graphics cards. This means you can use these libraries —in particular dask and numba — on graphics cards to get all the benefits of running on graphics cards without having to do extra work in your own code to get graphics card support.
FPGA: Field Programmable Gate Arrays; a sort of programmable hardware. Whereas a CPU has a microarchitecture that’s fixed at fabrication time, FPGAs can be architected (in a sense programmed) to match the requirements of the application they are going to be used for. This means they can be made highly performant and efficient for processing a particular task.
The main advantage of FPGAs then is the performance and efficiency you can achieve using them for a particular processing task. A well-programmed FPGA can achieve GPU-like performance but with much lower power usage. The main downside of FPGAs is that code development for them is long and complex; more so than with GPU. FPGAs also have some inherent hardware limitations, particularly to do with clock speed and amount of on-board BRAM that make them a poor fit for some (particularly memory intensive) tasks.
Cloud vs on-prem
The broad feeling within PASC19 seemed to be that all supercomputing equated to on-premises hardware (or, at least, hardware owned by the institution or a partner institution). That said, cloud is now a viable alternative for many supercomputing tasks, especially those that are reasonably ad-hoc, a fact that was borne out in the Pangeo talks presented.
As such, it was good to see that it wasn’t just the Informatics Lab presenting cloud technologies at PASC19. There were also a number of other interesting talks on how cloud is advancing as a viable alternative to HPC, at least in a number of specific use-cases.
As noted much earlier on in my definition of HPC, the thing that really sets HPC apart from other forms of supercomputing is the fast network interconnects between workers running on the HPC. This traditionally has also been what has prevented cloud from being a viable challenger to HPC, as the interconnects between cloud nodes just has not been good enough.
This seems to be changing though. Penguin computing now provide a cloud-based HPC-on-demand service. At the same time both AWS and Azure are working on providing cloud instance types that come preconfigured with fast network interconnects, reducing the difference between HPC infrastructure and cloud infrastructure. There was even evidence presented that for some workloads, such cloud instance types could rival HPC runtimes on like-for-like tests.
As an aside that links to my recent travel, it was again noticeable that cloud uptake is much higher in the US than in Europe. This is the same pattern that we have observed in the Pangeo community.
The end of the supercomputer era?
The supercomputer remains the de facto HPC platform, but for how much longer? In the past a new supercomputer would, like for like, approximately provide an order of magnitude processing improvement over the outgoing supercomputer. So, for an supercomputer of a given physical size with a given number of processor cores, a new supercomputer with approximately the same size and number of processor cores would effectively give an order of magnitude more processing capacity.
This is a pattern that we are seeing drawing to a close. Improvements in processor technology have slowed dramatically, so a new supercomputer of approximately the same size as the outgoing one does not necessarily offer an order of magnitude increase in processing capacity any more. Instead, to see this order of magnitude improvement you need either a bigger supercomputer or one that includes some form of acclerators, which is OK unless you are limited by physical or fiscal resources.
This is all basically a statement of the end of Moore’s Law.
In the face of this, it’s the belief of the Informatics Lab that we need to look for new technologies to provide the de facto HPC platform of the future. This belief was one of the key points that I made in my talk at PASC19. It’s a disruptive viewpoint, and one that some found threatening, refusing to engage with arguments in support of it.
Of course, the end of the supercomputer era is not here yet. But even the content and major themes of PASC19 show that a supercomputer is no longer enough in a number of HPC applications. The end of the supercomputer era should not be seen as a threat but an exciting opportunity to shape the next era of HPC, in whatever forms it should come!
A note about Zürich
To finish in what is becoming traditional style for me, let me share my impressions (and a couple of photos) of Zürich.
The thing that everyone said to me about Zürich is that it is an expensive city. This was borne out in fact as well as in promise! Nothing sums this up better than the fact that Zürich’s Bahnhofstrasse is Europe’s most expensive shopping street. In every other way though Zürich is a lovely city containing amazing architecture, good food and very nice people. And one of the best places to view Zürich’s architecture is from the terrace at the back of ETH Zürich.
My favourite part of Zürich was probably the view from the bridge at the head of the lake down the lake. On a clear day it is possible to see the Alps from this vantage point. The lake is well worth making the most of, as the water, which drains down from the mountains, is beautifully clear.
If you get the chance to go to Zürich, don’t turn it down!