Archived Training Dataset Downloads now Available on Radiant MLHub

Kevin Booth
Radiant Earth Insights
3 min readFeb 8, 2021

A little over a year ago, we launched the first iteration of Radiant MLHub in the form of a STAC-compliant API, which allows you to browse our training data collections and list and download individual assets from the items within those collections. Today, we’re announcing the ability to download an archived version of training datasets with just a single-click download. In this post, we’ll describe the process for downloading datasets, the structure of the archived datasets, and provide some tips for effectively traversing the downloaded datasets.

Building Footprint Labels, Tropical Storm Imagery, and Wind Speed Labels, and Crop Type Labels all available on Radiant MLHub.

Downloading Datasets

We are now offering three different methods of downloading our datasets. The easiest method, downloading on our registry, can be accessed by navigating to a dataset page and clicking on the “Download” link for each collection you would like to download.

The second easiest method is downloading datasets via our new archive API endpoint. By making a GET request to the endpoint below with the ID of the collection you’d like to download replacing “{COLLECTION_ID}” and your API key replacing “{API_KEY},” an HTTP status code 302 response will be returned with the location of the archive file located within the “Location” header in the response.

/mlhub/v1/archive/{COLLECTION_ID}?key={API_KEY}

The final and most advanced method of downloading datasets would be to navigate our STAC API directly and download each individual STAC item. We do not recommend this if you intend to download the entire dataset.

Archive Structure

Once extracted, each archive download contains a similar structure. Files that are common to all items (i.e., documentation) are contained within the “_common” folder. Each STAC item then has its own folder, and contained within these folders are all of the assets for that item and a “stac.json” file which contains the STAC metadata for that item.

An example of the “ref_landcovernet_v1_labels” and “ref_landcovernet_v1_source” collections downloaded and extracted into the same parent “archives” folder is shown below.

archives/
- ref_landcovernet_v1_labels
- _common
documentation.pdf
- ref_landcovernet_v1_labels_28QDE_00
labels.tif
source_dates.csv
stac.json
- ref_landcovernet_v1_labels_28QDE_01
labels.tif
source_dates.csv
stac.json
...
- ref_landcovernet_v1_source
- ref_landcovernet_v1_source_37PCM_22_20180102
B01.tif
B02.tif
B03.tif
B04.tif
B05.tif
B06.tif
B07.tif
B08.tif
B8A.tif
B09.tif
B11.tif
B12.tif
CLD.tif
SCL.tif
- ref_landcovernet_v1_source_37PCM_22_20180107
B01.tif
B02.tif
B03.tif
B04.tif
B05.tif
B06.tif
B07.tif
B08.tif
B8A.tif
B09.tif
B11.tif
B12.tif
CLD.tif
SCL.tif
...

Tips

Some datasets contain multiple label types or multiple source imagery types. For example, our Chesapeake Land Cover dataset has land cover labels, NLCD labels, building footprint labels, Landsat source imagery, and NAIP source imagery. If you are only interested in the NLCD labels and NAIP source imagery, then you only need to download the NLCD and NAIP collections.

The item STAC metadata contains relative links to all of the assets which are related to that item within the “assets” key. For label items, the STAC metadata will also contain relative links to all source imagery STAC files which are related to that label item. These relative links all assume that both collections are extracted into the same parent folder, as shown in the file structure example above.

Conclusion

The addition of these new methods of downloading datasets should eliminate the frustration of having dataset downloads, which took multiple days to complete and, in some cases, failed in the middle of a download. We are constantly looking for ways to make datasets more accessible and easy to integrate with your processes. If you have any suggestions on improvements we can make to Radiant MLHub, feel free to send us an email at hello@radiant.earth

--

--

Radiant Earth Insights
Radiant Earth Insights

Published in Radiant Earth Insights

Increasing shared understanding of our world through community-led initiatives that make data easier to access and use.

Kevin Booth
Kevin Booth

Written by Kevin Booth

Engineering Manager @ourradiantearth