Accelerating Iris with NVIDIA GPUs

Published in

Met Office Informatics Lab

7 min readJul 19, 2019

The NVIDIA Tesla V100 GPU, the star of this blog post.

Recently in the Informatics Lab we were sad to say goodbye to Jacob Tomlinson, who has moved to an exciting new job working for NVIDIA. One of his roles at NVIDIA will be enabling communities to utilise GPUs for accelerated data processing, as a part of RAPIDS. Given Jacob’s recent employment, the Met Office is an obvious target community, especially with the amount of data processing that is done by the Met Office.

With Jacob’s new boss Matthew Rocklin visiting from the US to support his induction into NVIDIA, we thought we’d pool resources to work on something appropriate to our combined expertise. We decided to apply work that NVIDIA have been doing to speed up Python code to a Python package heavily used at the Met Office for analysing weather and climate data.

Existing libraries

The Python programming language has been widely adopted for use within the science community for its ease of use, simple syntax and extensive library of packages that extend core Python functionality. The Met Office is no exception to this, with a majority of computational analysis and visualisation of weather and climate data that takes place being written in Python.

Python packages add a broad range of functionality to Python, from packages that provide scientific and mathematical processing operations to packages providing advanced out-of-core processing and just-in-time compilers. Let’s explore four of these packages in a bit more detail.

NumPy

NumPy is a Python package that adds support for arrays (n-dimensional grids of data) to Python. It is very commonly used to the point that it has become the de facto package for working with arrays in Python. You can use NumPy to generate arrays in Python and perform mathematical and statistical operations on them and between them. NumPy is also fast due to largely being implemented in C, but it has limits: it cannot handle very large datasets that do not fit in memory.

One of the biggest things that NumPy has provided is an array API for Python, which is in such common use that it is effectively the de facto array API in Python. This means that other libraries can take this API and recreate it for their bespoke, but still NumPy-based, Python array objects. Two notable examples of this are dask’s Array functionality and CuPy.

Dask Array

Dask array provides out-of-core parallelised data processing to NumPy arrays. The majority of the NumPy array API is reproduced in dask for dask arrays, meaning that with dask you can create arrays too large for memory and parallelise processing of them, all while using the same API as originally provided by NumPy. It is also easy to switch between a NumPy array and a dask array.

CuPy

CuPy, part-developed by NVIDIA, effectively replaces the C code in NumPy with CUDA code. Doing so produces arrays that can be processed extremely rapidly on GPUs. Again the majority of the NumPy array API is reproduced for CuPy arrays, and it is easy to switch between a NumPy array and a CuPy array. This means you could run an intensive NumPy array calculation on an NVIDIA GPU and see a significant speedup in processing time compared to running the same calculation with a pure NumPy array.

Iris

Iris provides a data object called a cube that utilises the NumPy array and adds functionality specialised to handling weather and climate datasets. The cube contains both data (a NumPy or dask array) and metadata that describes the meaning of the values in the array, such as what the values in the array describe, and where in space and time the data are located. The Iris user guide contains more details on the cube object.

Enhancements to the fore

A recent enhancement to NumPy called “array overrides” (available from v1.16) enables NumPy to hand off processing of a NumPy-like array to the library best suited to processing the array. For example, you could pass a CuPy array to a NumPy array operation, and NumPy would automatically call the CuPy equivalent of the NumPy operation to perform the requested processing. There is great benefit to be had from doing this, as you get the most effective (and so also fastest) way to process any given array without requiring any changes to your own code.

Application to Iris

As we’ve seen, at the core of the Iris cube is an array, either a dask array or a NumPy array. This allows Iris to take advantage of the benefits of dask but also to get to the real numbers when they’re needed (such as for plotting). There is cleverness in Iris to take advantage of this and make the user experience as seamless as possible. Iris detects whether the array is a dask or NumPy array and responds accordingly.

This means that Iris already contains functionality similar to what has recently been added to NumPy, if a lot more limited in capacity. By making use of array overrides in Iris we would see the following benefits:

No loss of existing functionality, but instead a gain in functionality as this functionality would be available for all array operations in Iris rather than just in the limited number of cases where it has already been implemented.
Iris would benefit from being able to support any array type that’s like the NumPy array, without any changes to Iris code.
We could remove existing Iris code designed to handle different array types, making Iris simpler and easier to maintain.

The Experiment

We thought we would try passing an Iris cube a CuPy array (we called this cube a cucube) as its data attribute and seeing how much Iris functionality we could use with this cucube. In particular we were interested in trying to use the Iris statistic operators (common statistics such as mean, max, or std captured in an Iris API for easy application to Iris cubes) on the cucube. There were a number of reasons for this choice:

performing a statistical operation on an Iris cube is a very common operation,
it’s a mathematically intensive operation that could easily benefit from faster processing, and
it’s one of the areas of Iris that already handles dask and NumPy arrays differently.

We reasoned that we could bypass the different array handling mechanism used by the Iris statistic operators and use the new functionality in NumPy in its place. This way if we passed a CuPy array from the cucube with a request to perform a mean operation on the array then NumPy should automatically hand off to CuPy to perform the requested calculation on the array.

Setup

The equipment we used to run the experiment is worth a brief mention. With no NVIDIA GPUs physically available at the Lab we turned to AWS (Amazon Web Services) to provide a GPU-accelerated machine. We spun up a p3.8xlarge EC2 instance containing no less than four NVIDIA V100 GPUs to run our experiment on, giving us lots of GPU processing space. For our data processing environment on top of this EC2 instance we chose the Littlest JupyterHub for being a lightweight JupyterHub environment that was easy to set up and use.

Jacob has written elsewhere about our experiences in preparing this experimental setup. If you’re interested in what we did, the difficulties we encountered and how we resolved them, his post is well worth a read.

Data used and calculation performed

We ran our experiment on a 4D Iris cube of global air temperature data, containing 18 ensemble members, 33 pressure levels, on a 960x1280 horizontal grid. With float64 data, this added up to 2.9GB of data to process. To this 4D cube we applied a sum operation over all ensemble members and pressure levels, resulting in one 2D cube describing total air temperature over all ensemble members and pressure levels.

While representing a completely unphysical quantity, this operation was useful for testing the performance improvement found using CuPy over NumPy in Iris. The sum statistic operator was also one of the easiest Iris statistic operators to convert, making it an ideal operator to use in this experiment.

What we found

Our first attempt took this large Iris cube, converted the cube’s data array to a CuPy array and tried to put that CuPy array back into the Iris cube. Here we encountered our first difficulty: by trying to seamlessly handle either a dask or a NumPy array, Iris was not able to handle a CuPy array. We worked around this with a quick modification to Iris to skip this array type handling step.

With Iris modified we were able to make an Iris cube containing a CuPy data array (that is, the cucube). This was an excellent initial result as all the rest of the experiment relied on getting this done. The next step was to try using a statistic operator on our cucube. This operation also initially failed as Iris again expected either a dask or a NumPy array. Again a quick modification to Iris got us over this hurdle so that we could use Iris to calculate statistics on our cucube.

Results

We ran the same sum operation described above on both the original cube and the cucube we created from the original cube. The processing times for performing the sum operations using NumPy and CuPy respectively are shown below:

We can see that using CuPy to perform the sum operation resulted in a speedup of near 75x over NumPy. This is on a reasonably small cube too; we might hope to see increasing performance improvements with larger cubes.

The full demo notebook used to get this result, which includes a demonstration of how to make a cucube, can be viewed in this GitHub gist.

In Conclusion

Even this somewhat brief and limited experiment has shown that we can successfully use CuPy to speed up data processing in Iris. It was also pleasingly simple to modify Iris to be able to handle CuPy arrays and so make use of the new functionality added to NumPy.

To take full advantage of the findings of this experiment we should modify Iris to use this new NumPy functionality by default. As discussed above, this will make Iris simpler and have more consistent behaviour all throughout the codebase. It will also mean that the whole of Iris can benefit from the speedups available through using CuPy for data processing.