Accessing and Downloading Training Data on the Radiant MLHub API

Kevin Booth
Radiant Earth Insights
6 min readJun 29, 2020

By Kevin Booth, Engineering Manager, Radiant Earth Foundation

This article is an introductory guide to help you navigate the Radiant MLHub API and download training data. Basic knowledge of JSON and navigating RESTful APIs with Python are recommended.

Fundamentals of the Radiant MLHub API

The Radiant MLHub API is a STAC compliant API that serves metadata about label items and source imagery and links to download these items.

STAC is an organization of metadata for imagery and labels, making it easy to search for items that match spatial, temporal, or other criteria. At the root level of the STAC API is a list of collections of items. In the Radiant MLHub API, each collection contains items for either source imagery or labels for a dataset. These items are descriptions of source imagery or labels and links to download assets related to these items. Properties found in these item descriptions include spatial extent, temporal extent, band descriptions in the case of optical imagery, label types and label properties in the case of labels, and other information like DOIs and citation examples to reference.

Authentication

To access the API, you must be authenticated with an API key, which you can acquire on the Radiant MLHub Dashboard. All requests to the API must be made with the API key present in the query parameters as shown below:

https://api.radiant.earth/mlhub/v1/collection?key=YOUR_API_KEY_HERE

Download Links

Links to external resources (documentation, geojson files, imagery files, etc.) are replaced with a link to the /download endpoint of the API. Below is an example of such a link.

https://api.radiant.earth/mlhub/v1/download/fd965acfbd8583cc5231ba6f98bbc7412a7e569e74a2b03745c6c962aa0c9fc6

These links are unique to you and expire after 6 hours. If you need to download the resource again after the link expires, just retry the request to the API, and a new link will be generated. These download links return an HTTP redirect response to the resource. You can read the Location header to retrieve the actual resource URI. Sometimes the URI scheme returned is not always an HTTP resource. For example, if the resource is hosted on an S3 bucket, a URI will be returned, as shown below:

s3://somebucket/foo/bar/object.txt

When downloading resources, you should check which scheme is present in the URI and download it using the appropriate method. We have an example of this in our BigEarthNet Tutorial Notebook.

Building labels from the SpaceNet 5 dataset

Paging

The /collections and /collections/COLLECTION_ID/items endpoints do not always return all their items at once. Instead, pages of results are returned. To navigate to the following page of results, you can find the next page link within the “links” property which will have a “rel” type of “next.” For more information about paging, see the STAC API specification.

Datasets on Radiant MLHub

Datasets on Radiant MLHub are typically split into two STAC collections: One contains STAC items for the source imagery and the other STAC items for the labels. For example, the Chesapeake Bay dataset is split into six collections: building footprint labels, Chesapeake Conservancy land cover labels, NLCD land cover labels, NAIP imagery, Landsat 8 leaf-on, and Landsat 8 leaf-off composite. We’ve done this for two reasons. The first reason is that it negates the need to check whether an item in a collection is a source imagery item or a label item. You can always assume that items in the label collection are label items, and items in the source imagery collection are source imagery items. The second reason we split the two up is that multiple label collections could use the same source imagery. If the source imagery were to be contained within the same collection as labels, then there would be duplicate source imagery items. In practice, you should never need to search through source imagery collections directly. Label items will contain links to the source imagery items relevant to that specific label.

Label Items

Label items are a JSON object with properties describing the type of label, possible label values, spatial and temporal extents, and links to the label assets to download. An example from the BigEarthNet dataset is shown below:

{
"labels":[
"Sea and ocean"
],
"seasonal_snow":false,
"cloud_and_shadow":false,
"datetime":"2018-05-29T11:54:01Z",
"label:description":"Land Cover Type Classification",
"label:type":"vector",
"label:classes":[
{
"name":"labels",
"classes":[
"Discontinuous urban fabric",
"Non-irrigated arable land",
"Pastures",
"Coniferous forest",
"..."
]
},
{
"name":"seasonal_snow",
"classes":[
"True",
"False"
]
},
{
"name":"cloud_and_shadow",
"classes":[
"True",
"False"
]
}
]
}

Assets: Labels and Documentation

The assets section will contain links to documentation files and either a geojson file in the case of vector labels or a tiff in the case of raster labels. The assets property is structured as a dictionary and items that contain the link to the labels always have the key “labels.” Below is an excerpt from the same BigEarthNet label item.

{
"labels": {
"href": "s3://radiant-mlhub/bigearthnet/labels/S2A_MSIL2A_20180529T115401_84_58.geojson",
"title": "Land Type Classes"
}
}

Links: Source Imagery

The links section contains links to the root of the catalog, the parent collection, and the source imagery items. You can determine which type each link is by reading the “rel” property. Source imagery items will have the “rel” property of “source.”

[
{
"rel": "source",
"href": "https://api.radiant.earth/mlhub/v1/collections/bigearthnet_v1_source/items/bigearthnet_v1_labels_S2A_MSIL2A_20180529T115401_84_58",
"type": "application/json"
},
{
"rel": "parent",
"href": "https://api.radiant.earth/mlhub/v1/collections/bigearthnet_v1_labels",
"type": "application/json"
},
{
"rel": "root",
"href": "https://api.radiant.earth/mlhub/v1/",
"type": "application/json"
}
]

Source Imagery Items

Source imagery items contain all information required to determine the location and time that the imagery was taken, as well as links to download either individual bands of the imagery or the multi-band files. The metadata related to the sensor, and the platform used to take the image, are also included in the source imagery item.

Assets

Assets (i.e., GeoTIFF files) are linked to in the assets property of the source imagery item. Depending on the dataset, there will either be one multi-band asset, multiple single-band assets, or a combination of the two.

Additional Properties

Other aspects of the source imagery item may be useful to know, such as when and where the imagery was taken. The datetime property answers the ‘when’ portion within the properties object. The ‘where’ part is answered by the geometry and bbox properties of the item. The example below shows a source imagery item that contains three single-band files.

{
"id":"bigearthnet_v1_source_S2A_MSIL2A_20180529T115401_77_67",
"collection":"bigearthnet_v1_source",
"type":"Feature",
"properties":{
"datetime":"2018-05-29T11:54:01Z",
"eo:bands":[
{
"common_name":"Blue",
"description":"Blue",
"name":"B02"
},
{
"common_name":"Green",
"description":"Green",
"name":"B03"
},
{
"common_name":"Red",
"description":"Red",
"name":"B04"
}
],
"eo:constellation":"Sentinel-2",
"eo:gsd":30,
"eo:instrument":"MSI",
"eo:platform":"Sentinel-2"
},
"assets":{
"B02":{
"eo:bands":[],
"href":"https://api.radiant.earth/mlhub/v1/download/B02DOWNLOAD",
"title":"S2A_MSIL2A_20180529T115401_77_67_B02",
"type":"image/tiff; application=geotiff; profile=cloud-optimized"
},
"B03":{
"eo:bands":[],
"href":"https://api.radiant.earth/mlhub/v1/download/B03DOWNLOAD",
"title":"S2A_MSIL2A_20180529T115401_77_67_B03",
"type":"image/tiff; application=geotiff; profile=cloud-optimized"
},
"B04":{
"eo:bands":[],
"href":"https://api.radiant.earth/mlhub/v1/download/B04DOWNLOAD",
"title":"S2A_MSIL2A_20180529T115401_77_67_B04",
"type":"image/tiff; application=geotiff; profile=cloud-optimized"
}
},
"bbox":[
-7.548429099733649,
3.7845317195412,
36.08348843294896,
55.214656159591776
],

"geometry":{
"coordinates":[
[
[
36.08348843294896,
3.7845317195412
],
[
36.08348843294896,
55.214656159591776
],
[
-7.548429099733649,
55.214656159591776
],
[
-7.548429099733649,
3.7845317195412
],
[
36.08348843294896,
3.7845317195412
]
]
],
"type":"Polygon"
},
"stac_extensions":[
"eo"
]
}

Recommended Workflow

The workflow we recommend for using training data hosted on Radiant MLHub is as follows. First, make an API request to the /items endpoint for the label collection you wish to download. This page returns a limited number of label items, so you’ll need to page through these results. We have an example of paging on our BigEarthNet Tutorial Notebook. Next, for each labeled item, you should download the label file contained within the assets property. Next, you should find all the links to source imagery within the links property. Source imagery links will have a “rel” type of “source.” You can then make an API request for the source imagery link and download the assets contained within the response. Sometimes source imagery is shared between multiple label items, so keep track of which source imagery you have already downloaded. Now that you have both the labels and source imagery downloaded, you can load the data and begin training your model.

Resources

--

--