Machine learning for climate change and health

Talia Caplan
Wellcome Data
Published in
5 min readJan 11, 2022

Climate change is a health crisis. No part of the world is immune from its harmful and deadly effects — but that doesn’t mean the impacts will be equally felt around the globe.

People in low- and middle- income countries are likely to be most affected due to a combination of factors that make low resource settings more vulnerable to the impacts of global heating. Yet at the same time, researchers and policymakers in those countries have less data to inform potential solutions, because the best-curated data tends to be collected in high income countries.

To help address this issue, Wellcome has contributed £1M to Lacuna Fund, whose mission is to fill global data gaps so that machine learning can help solve urgent problems in low- and middle-income contexts. Lacuna Fund will run a competitive process for proposals to create new datasets, or improve existing ones, with a focus on two specific issues:

  1. Understanding climate harms on health and livelihoods
  2. Improving energy systems and infrastructure for climate change mitigation and adaptation

The issue with climate data

During the past 20 years, there has been a 53·7% increase in heat-related mortality in over 65s globally, reaching a total of 296,000 deaths in 2018. Whilst it is getting easier to spot these global trends, understanding and predicting local health impacts is much tricker. Let me explain why that is.

Currently, our best way of looking at what the temperature is doing today (and predicting what it will look like in the future) is through global models. These models are trained using data from weather stations around the world and give us a good idea of average trends globally.

Sometimes though, scientists try to use them to look at what is happening in a particular country (called downscaling). Downscaling these models is intuitively a very attractive solution to a big problem; individual weather stations can be hundreds of miles apart which means it is hard to understand what is going on in the spaces between them without complex modelling — such as that achieved by these global models.

Unfortunately, this can go very wrong, as demonstrated below. On the top left, we see the temperature recorded at weather stations across Zambia. Top right shows the results from a global model. On the bottom left, are the downscaled results of the global model. Compare the picture you see there to its neighbour on the bottom right — created after the addition of data collected by local weather stations across Zambia. As you can see, the downscaled results overestimate temperature over almost the entire country, compared to the more accurate map on the right.

Minimum temperature maps of Zambia over a 10-day period.Source: Dinku et al., (2016), Figure 6, THE ENACTS APPROACH: Transforming climate services in Africa one country at a time A World Policy Paper. World Policy Journal, https://www.researchgate.net/publication/310459256_THE_ENACTS_APPROACH_Transforming_climate_services_in_Africa_one_country_at_a_time_A_World_Policy_Paper

This is not a unique problem. We see it a lot in all types of science and health research, where inadequate representation in a dataset leads to skewed (and inaccurate) results for a sub-group.

Now, using Malaria as an example, let’s look at how getting it wrong has direct implications for health.

Health impacts of climate change

Malaria is a climate sensitive infectious disease, meaning that environmental factors can influence the number of cases in a given area. For example, an increase in rainfall in an area where malaria is present will most likely cause an increase in cases of malaria. This is because the extra water that pools in puddles or gardening pots can become breeding grounds for the kind of mosquito that spreads the disease.

However, when a recent study used common global models for rainfall (created in a similar way to the models for temperature), to predict the impact on malaria in Southern Ecuador, the researchers got some very unexpected results. One of the global rainfall models they downscaled suggested an increase in rainfall would lead to a drop in cases of malaria… Decades of public health experience locally mean that we know this is incorrect, but we do not always have such robust knowledge to be able to challenge models and suggest that they are biased — which is likely what has happened here. It is easy to see how research could very easily result in unhelpful, if not downright counterproductive advice to policymakers if we do not find ways of addressing these biases.

What makes this not only a health problem, but also an equity problem, is that the communities who have least contributed to climate change, those in low-and-middle-income-countries, stand to be the most negatively impacted by it. And on top of that, these are the same communities in which large gaps in both climate and health data exist.

Filling the gap for machine learning

In essence, we need to produce more local data. As demonstrated in our example from Zambia, we can get much more accurate results if we combine local data with the global models. One way to do that is to unlock currently inaccessible sources of climate or health data. Many communities have information relevant to the study of climate change and health that need to be compiled, digitised and/or made accessible in a larger dataset.

For example, during the first UK lockdown in 2020, over 16,000 volunteers helped digitise hand-written rainfall measurement taken before 1960, across the UK. Until then, the 65,000 sheets of paper, with 5.28 million observations had been unavailable to climate scientists. The data produced has opened up access to information on historical periods of extreme rainfall. Understanding these events of the past will help us to predict and plan for similar events in the future — like planning decisions made by farmers for worst-case drought scenarios.

A picture of an example of the handwritten rainfall recordings that were digitised. The paper has a large grid on it, with months running along the left-most column and years running along the top. Written in each grid box in pencil is a number recorded as the level of rainfall in that month and year.
Image of a hand-written rainfall measurement taken in the UK pre-1960. Source: @ed_hawkins. (2021, May, 13). https://twitter.com/ed_hawkins/status/1392758650201120771

Another option would be to collect new local data. But building new weather stations is expensive, and often unsustainable in low resource settings. Thus, we need to get creative! Keep an eye out for another blog soon on a new project we’ve funded that uses drones equipped with thermal cameras to collect local new data in a low resource setting.

Lacuna Fund call

In the meantime, we wanted to share how excited we are to be supporting Lacuna Fund, alongside The Rockefeller Foundation, German Federal Ministry for Economic Cooperation and Development (BMZ) and Google.org. Through Lacuna Fund, we have the opportunity to support researchers from the most affected (but least culpable) communities. So they have the tools to address the urgent challenges impacting their countries today, and in future. If you’re interested in learning more about Lacuna Fund or want to stay up to date on what’s happening with this call, then visit their website.

--

--