Lab on the Road

The Informatics Lab at EGU19

Peter Killick
Met Office Informatics Lab
7 min readMay 8, 2019

--

The European Geosciences Union General Assembly 2019, or EGU19 for short, is one of the largest conferences on the geosciences in the world, and one of the largest academic conferences in Europe. It meets every year in April in Vienna, and brings together academics and practitioners from all over the geosciences — people studying everything from geology and soil sciences, to space science, to informatics and data science.

The diversity and number of topics is matched by the diversity and number of conference attendees, with over 15,000 people from all over the world visiting EGU19 over the course of the conference. Such diversity brings both opportunities and challenges: there is a lot available to learn from (and many people to teach!), but the content most appropriate for your own situation and studies can get lost amongst all the other content! Planning ahead is essential.

Informatics Insights

The Informatics Lab’s presentation contribution to EGU19 was made up of three posters — one about technology, one about Lab culture, and one about applying Machine Learning to atmospheric science. These themes together encapsulate some key areas of Informatics Lab thinking. Let’s explore each poster in a little more detail.

The Technology One

This poster demonstrated the application of a number of technologies to solving a real world weather data problem. We took the scalable data processing platform Pangeo, open-source Python libraries Iris and Intake, and added custom code to produce a system where very large volumes of weather data can be loaded as a single datacube structure (which we prosaically named a SuperMegaHyperCube).

The target problem behind this poster is that weather and climate datasets are getting bigger and bigger. This makes the data harder to work with. There is often too much data to be able to process, at least on short timescales; the data often contains more dimensions than the human brain is able to easily comprehend; and it can be hard to locate the dataset that you need in amongst all the other datasets that have also been produced.

The Culture One

A new theme introduced at EGU19 was that of RSEs (Research Software Engineers). RSEs are researchers whose primary research target is creating software — either as a standalone task, or as part of a wider research group.

The Informatics Lab does not quite fit the description of a team of RSEs, as instead we work as a multidisciplinary team applying science, technology and design holistically to inform thinking on future direction for the UK Met Office.

Nevertheless, the Informatics Lab’s primary output is functional prototypes that detail the thinking behind each project we work on. These functional prototypes almost always involve producing code. This and the fact that the culture of the Informatics Lab is also quite different to that of many research groups in science has given us an interesting story to share, and it’s this story that’s shared in this poster.

The Machine Learning One

One member of the Informatics Lab’s multidisciplinary team is exploring the application of Machine Learning to forecasting the weather. Specifically, the focus is on a particularly gnarly area of weather forecasting: predicting the location of shower clouds at very high resolution.

Predicting exactly where showers will form is difficult because convective processes, which drive the formation of shower clouds, happen on a scale that is far smaller than the grid cells of even the highest resolution weather models run by the UK Met Office. This means there can be a rain shower occurring in part of a grid cell but not another part of the same grid cell! This presents a problem for the weather forecast: do you say it will or will not be raining within this grid cell?

Evidently this could be solved by making each grid cell in the weather model smaller. This is not a perfect solution though: decreasing the size of the grid cells increases the number of grid cells in the model, meaning the model both takes longer to run and generates bigger output files containing more data, which themselves take longer to process.

An alternative solution to this problem is the one presented in this poster: use machine learning to try and infer convective scale processes from existing model data without having to add more grid cells to the model. This removes the problems created by adding more grid cells, which is good. What’s not certain is whether machine learning can be effectively applied in this situation to produce valuable results, which is why this is such an important area of exploration.

Elsewhere at the Conference

Here’s the space for my feedback on my time at EGU19 — taking a look at what was good and interesting at the conference, and also the challenges it presented. During my time at the conference I also noticed some themes in the content being presented, so I’ll spend a bit of time here exploring those themes.

Here we go then:

  • Machine learning is everywhere. A number of sessions I attended contained at least one presentation on using machine learning techniques, and there was at least one session entirely dedicated to machine learning. This appears to be an upward trend on previous years, suggesting that this reasonably new technique is becoming more mainstream as a research tool within the geosciences.
  • There was a theme of moving data processing to HPC and cloud infrastructures, as well as increasing focus on hybrid infrastructures combining both HPC hardware and cloud architecture. I wonder if this trend will continue, given how geophysical datasets are getting bigger and bigger, with the same requirement to process and visualise them.
  • A new term that I learned at EGU19 was RSEs: Research Software Engineers; researchers who write code (see above). These are researchers who write code that is itself scientific exploration, or enables other work that is scientific exploration. This definition is not a long way from the work of the Informatics Lab.
  • The Informatics Lab and ESSI (Earth and Space Science Informatics) programme group share a common term in their names. It’s also the programme group where Informatics Lab presentations fit most naturally and where content applicable to the Informatics Lab is most likely to be found at EGU. At EGU19, however, a lot of ESSI content had been merged with sessions run by other programme groups. This I feel is positive, because more and more Informatics concepts are being actively applied and researched in all areas of the geosciences.
  • Pangeo is not well-known in European Geosciences, something that is not so true in the USA. I nevertheless saw a lot of interest in Pangeo, as well as groups working on projects similar to Pangeo. This suggests there is a recognition that Pangeo-like systems are becoming increasingly necessary in the geosciences.
  • A common theme at EGU19 was that a lot of content in a given session would describe a very similar end product. There are positives to this, as it suggests that the product being developed is one that is important to contemporary geosciences, but it does also shine a light on a tendency within science to silo work, and reinvent instead of collaborate to produce a single common product rather than a number of replica products.

Holiday Snaps

Looking back on Schloss Schönbrunn

And finally, around a packed schedule at a busy EGU19 I also found some time for a bit of sightseeing. Having visited Vienna a few times in the past I feel I’ve ticked off many of the big tourist attractions in Vienna — I’ve taken a look inside Stephansdom, the imposing cathedral with patterned tiled roof at the very centre of Vienna; nodded to the golden statue of Johann Strauss in Stadtpark; and explored the grounds of Schloss Belvedere.

The one major tourist attraction I hadn’t managed to visit was Schloss Schönbrunn. So, with the opportunity to do some sightseeing around this year’s visit to EGU19, Schönbrunn was the obvious place to visit. And it was worth the visit — the palace itself is huge and partially open as a museum; and there are extensive grounds, and monuments within the grounds, to also explore. There was even an Easter market in the grounds as an added attraction.

In Summary

EGU19 was all about lots of content, lots of sessions, lots of interesting ideas (and a few repeated ones). There were three posters from the Informatics Lab that were well attended, and there was just a little bit of sightseeing for me. There’s just space here for me to thank all the organisers of EGU19 for putting together an excellent, useful and informative conference.

--

--

Peter Killick
Met Office Informatics Lab

Cloud Platform Architect, open-source software engineer and technology researcher in the UK Met Office Informatics Lab. I tend to blog on these themes.