Considering rich Python Ecosystem of tools, libraries, and frameworks for data crunching, I’d like to share a few examples how to plug Datahub API as a data source to your Python-based workflow. I’ll start with a simple example: how to turn Datahub API JSON output into Numpy arrays using Pandas framework.
Parsing JSON into Numpy itself is boring 😀, so let’s add some complexity and use API “pagination” for querying a larger data chunk over time dimension and produce a simple animation from it.
I’m using a thin HTTP request wrapper to make API queries look Python-native.
Then I’ll rewrite this API query that fetches metadata using Slumber.
APIKEY = 'my API key'
API = slumber.API(
metadata = API.datasets('noaa_ww3_at').get(apikey=APIKEY)
Now, I’m going to make an API paginator to fetch a sequence of data chunks using Slumber and
while / yield loop. For this example, I’ll take area-based query of Datahub API. Feel free to check the documentation to understand better how that API endpoint works. In general, it returns set of objects where each has a timestamp, two-dimensional array of values and arrays of longitude and latitude. Values are defined by the variable we’ve specified in API request.
Now comes the part that turns JSON into Pandas DataFrame. You can notice that we use latitudes as index and longitudes as columns to represent the spatial aspect of data.
Then I need to represent the temporal aspect of data, and I chose to use Pandas DataPanel for that. Since we already have the dictionary of timestamp to DataFrame relation, we turn it into DataPanel:
full_data_panel = pd.Panel(full_data_dict)
Why is DataPanel better than a dictionary of DataFrames? It’s simple — better integration with native Numpy & Pandas calculations across full data array. For example, now we can run expressions like this one:
data_min = np.nanmin(full_data_panel.values)
Next step is to create a loop to visualize DataPanel into animation.
And here is the resulting animation that shows the Wind Speed near NY shores. The data comes from the NOAA WaveWatch III Regional US East Coast Wave Model (dataset id
To replicate that example you can check the full Python script with all these parts put together — rasterise_area.py
Additionally, it can work as a regular command-line tool if you provide proper input. So that animation was generated by running it with these arguments:
python3 rasterise_area.py \
noaa_ww3_at Wind_speed_surface \
To make this script work you will also need an API key, which is available for free after a simple signup at Planet OS Datahub.
Please leave your questions and suggestions for the next set of examples.