Creating weather forecast notebook with GFS model data and Planet OS API

Rossby waves illustrated by 500 hPa — 1000 hPa geopotential height

Have you ever wondered how the weather portals are creating weather forecasts for any part of the world?

Here’s a demonstration how to get from raw data to animated weather maps and graphs with just a few simple commands. Starting from the raw data means that you are not limited to just creating weather forecasts, but you can run your own dedicated analysis or models on top of the data.

Note that current blog post is an overview of the demo I created in ipython Jupyter notebook format.

What will you learn?

  • How to fetch the data with different API versions;
  • How to easily and interactively visualise it with ipywidgets.

And you’ll find some technical discussion about the aggregation periods and numerical accuracy of the data.

First, let’s investigate the Planet OS point API.

The easiest way to create the API request is to use the interface at http://data.planetos.com/datasets/noaa_gfs_pgrb2_global_forecast_recompute_0.25degree which allows you to choose location on the map and then provides you with an example query, like

http://api.planetos.com/v1/datasets/noaa_gfs_pgrb2_global_forecast_recompute_0.25degree/point?origin=dataset-details&lat=49.5&apikey=<YOUR API KEY HERE>&lon=-50.5&_ga=1.213338589.872693513.1473057826

The data is returned in JSON format (CSV option available as well), but we’ll convert it to Python Pandas dataframe to make it easier to understand.

This way, you’ll get a good overview of metadata and available variables. You will then notice that there are a lot of variables, which can be a bit confusing at first.

Here are three things to keep in mind:

  1. Context — group of variables which share exactly the same dimensions
  2. Reftime — start time of forecast
  3. Time– time for which forecast was made

Also, please note that by default, only the first vertical level of each variable is returned, so to get variable at all levels z=all API parameter must be set during the request time. Another very important API parameter is count, which limits the number of variables returned for each context.

In order to avoid browsing through variables at sigma or cloud top levels (that are really only useful for special cases) I’ve used dropdown menus created with ipywidgets. This way the user can select first vertical level type and then variables that belong to that type.

Now, after selecting the variable in dropdown, just executing next block in notebook will create a plot of this variable.

Or a graph with more lines if the variable has several vertical levels.

Now, it’s useful to go through different variables, just to understand what data it consists of. For example, it’s usually not obvious to people that wind speed data at the heights of 80 m and 100 m can also be found.

Next important feature I’d like to introduce is raster API.

It’s very similar to the point API with the exception that it returns data for 2D matrix not for point. Though as the data delivery format is JSON, the size of one request is limited.

As shown in the notebook It’s still easy to plot directly, and with ipywidgets you can simply create a scrollbar and look at data for all timesteps. If you’re using the current code example, you might need to set the minimum and maximum values manually.

Finally we demonstrate a new feature in GFS data introduced by Planet OS.

For improving user experience, we will change the aggregation periods for accumulated and averaged variables. The problem is that original data is given at varying aggregation periods, from 1 to 6 hours, which is ok for professional users and archiving, but difficult to understand for newcomers who just want a timeseries with 1 hour interval.

How does it look like? Plotting downward shortwave radiation flux at surface using original data from nomads OpenDAP server and Planet OS, will give the following result

It’s clear that the numbers don’t match, and in fact they shouldn’t , as the accumulation periods are different. But to show that these values are in principle equal, we compute integrated radiation from both sources

By counting blue (PO) and orange (nomads) boxes, you can see that the areas are roughly equal with just differences. Where do these differences come from? By taking closer look at the integrals, we see that all values are multiples of 10, there’s nothing in between. It turns out that it’s the maximum numerical accuracy for this particular variable, meaning that small values may contain large relative errors. It also puts some constraint on our ability to change the aggregation interval. You’ll find more details about this issue from the notebook.

Thanks for reading! I hope that this short introduction will encourage many of you to start using weather forecast data and that the notebook will work as a starting point for creating analytical dashboards for environmental data.

Go to the Jupyter Notebook on GitHub


I’m a Data Integration Engineer at Planet OS. We’ve developed a big data platform for weather and environmental data with an open data portal called Datahub that enables easy access to high-quality weather data. Recently, we’ve been building data integration and intelligence solutions for the renewable energy industry. Before joining Planet OS, I was responsible for developing and maintaining the numerical weather prediction model and processing meteorological satellite and weather radar data at the Estonian Weather Service.