What does Machine Learning have to do with weather?

Toby Coleman
bytehub-ai
Published in
4 min readAug 28, 2020
Forecast global temperatures from GFS

In this post we’re going to run through how to generate an ML-enhanced weather forecast. But first, what do we mean by this, and why would you want to combine machine-learning with weather forecasting anyway?

Usually when we think of weather forecasts, we mean predictions published by large government-backed weather agencies such as NOAA, ECMWF and the UK Met Office. These organisations develop and run huge computer models of the earth’s atmosphere and oceans, and output massive datasets containing forecasts of temperature, pressure, wind-speeds and many other variables across the entire globe.

These datasets are a fantastic and often over-looked resource: as we’ve pointed out, weather variables can enhance the predictive power of models in many forecasting applications.

But we also often hear complaints, for example that the model doesn’t output data that is local enough, or that it doesn’t include a specific forecast variable like fog or visibility. This is where machine learning can come to the rescue: using historical forecast data we can train a system to provide a bespoke, ML-enhanced weather forecast, which can be much more accurate and useful than simply using a weather model on its own.

For this walk-through, we’re going generate an improved forecast for the temperature at a weather station in Germany, using the 24-hour ahead forecast provided by NOAA’s Global Forecast System (GFS) model. If you’d like to try it yourself then take a look at our code, or run it in a Google Colab notebook.

We start by comparing to the raw model forecasts to the actual data from the weather station and plotting the errors. GFS provides forecasts both at ground/surface level, and at a height of 2m. The 2m forecast is clearly better, which makes sense because the weather station is located on a roof.

But by plotting the temperature data over a few days we can also see some problems: for example that GFS predicts the temperature should drop much faster during the afternoons than is usually the case. This might be because heat tends to build up inside nearby buildings and has a moderating effect on the recorded temperature.

Actual measured temperature (blue) and GFS forecast variables (red, green)

Machine-learning excels at this sort of problem. We can use it to learn predictable temperature patterns that are specific to this location, and then apply this knowledge to improve the raw weather forecast from GFS.

In this example, we just used a simple linear regression, and managed to reduce the forecasting error by almost 14% compared to simply using the GFS 2m forecast on its own.

The ML-enhanced forecast has lower error than the raw GFS forecast

We’ve really just scratched the surface of what is possible here. Linear regression is a very simple ML technique. More sophisticated models built using tools like XGBoost, Tensorflow or PyTorch can improve the forecasts still further. And we also don’t need to restrict ourselves to forecasting the weather: a machine-learning model can take weather inputs with other information and directly address a business problem, for example by forecasting daily passengers on a transport system, customer orders on a website, or demand for energy in a specific region.

Code for this demo and Google Colab notebook.

At ByteHub AI we build tools to make it easier to integrate weather data with your ML applications and analytics. Get in touch if you’d like to know more about how we prepared the historical forecast data for this demo, or would like to know more about ML-enhanced forecasting.

--

--

Toby Coleman
bytehub-ai

Data Scientist and ML Engineer. Interested in time-series modelling and forecasting problems. Current project: https://github.com/bytehub-ai/bytehub/