Radiant MLHub Python Client — Beta Release

Using the Python client to discover and download training datasets without managing API requests.

Published in

Radiant Earth Insights

6 min readMar 16, 2021

We are excited to announce the first beta release of the radiant_mlhub library, a Python client for working with the Radiant MLHub API! With this release, users can work with Radiant MLHub datasets through an intuitive Python interface without having to worry about constructing API requests and managing authentication.

The library is still in the early stages of development, but we encourage you to try it out and give us feedback on how well it addresses your use-cases. This article will walk you through the process of installing and configuring the library, navigating datasets and their collections, and downloading training datasets. For more detailed documentation of the Python library, please see the official documentation here. A basic knowledge of Python programming is recommended.

Radiant MLHub is an open-access platform for high-quality, geospatial training data ranging from African crop types and tropical storms wind speed to land cover classification, building footprints, map features, and landscape changes.

Radiant MLHub API

Radiant MLHub is an open library for geospatial training data to advance machine learning applications on Earth Observations. It serves as a resource for a community of practice, giving data scientists benchmarks to train and validate their models and improve its performance. You can read more about the API and available datasets in this article:

Accessing and Downloading Training Data on the Radiant MLHub API

An introductory guide to help you navigate the Radiant MLHub API and download training data.

medium.com

The radiant_mlhub client is intended to be a user-friendly interface to this API for Python users. Some features of the library include:

Simple authentication handling
Automatic retrying of failed requests
Working with datasets and collections as Python objects (using PySTAC)

Installation

The radiant_mlhub library requires Python 3.6 or above. To install the library using pip :

$ pip install radiant-mlhub

Authentication

To access the API, you must be authenticated with an API key, which you can acquire on the Radiant MLHub Dashboard. Once you have created an account, go to the “API Keys” tab in the top-left of the dashboard to create an API key.

The radiant_mlhub allows you to store this API key locally and automatically adds it to all requests. The simplest way to configure the library to use your key is using a “profile”. You can create a profile using the mlhub configure CLI command:

$ mlhub configure
API Key: <COPY_YOUR_API_KEY_HERE>

This command will save the API key to a .mlhub/profiles file in your home directory (typically something like /Users/username on Macs and C:\\Users\UserName on Windows). If you do not have write access to your home directory, or prefer not to save the API key to disk, you can also set an MLHUB_API_KEY environment variable to the value of your API key.

Working with Datasets

Datasets in Radiant MLHub are groups of labels along with their associated source imagery. In most cases, these datasets are themselves composed of separate collections, where each collection contains either labels or source imagery.

We can use the radiant_mlhub.Dataset class to list datasets and print the ID and title of each one:

>>> from radiant_mlhub import Dataset
>>> for dataset in Dataset.list():
...     print(f'{dataset.title} ({dataset.id})')
BigEarthNet (bigearthnet_v1)
Chesapeake Land Cover (microsoft_chesapeake)
CV4A Kenya Crop Type Competition (ref_african_crops_kenya_02)
Dalberg Data Insights Crop Type Uganda (ref_african_crops_uganda_01)
Great African Food Company Crop Type Tanzania (ref_african_crops_tanzania_01)
LandCoverNet (landcovernet_v1)
Open Cities AI Challenge (open_cities_ai_challenge)
PlantVillage Crop Type Kenya (ref_african_crops_kenya_01)
Semantic Segmentation of Crop Type in Ghana (su_african_crops_ghana)
Semantic Segmentation of Crop Type in South Sudan (su_african_crops_south_sudan)
Spacenet 1 (spacenet1)
Spacenet 2 (spacenet2)
Spacenet 3 (spacenet3)
Spacenet 4 (spacenet4)
Spacenet 5 (spacenet5)
Spacenet 6 (spacenet6)
Spacenet 7 (spacenet7)
Tropical Cyclone Wind Estimation Competition (nasa_tropical_storm_competition)
Western USA Live Fuel Moisture (su_sar_moisture_content_main)

We can see that there are 19 distinct datasets available via the API. Once we know the ID of a dataset, we can fetch it from the API directly using the Dataset.fetch method. For our demo, let’s use the dataset from the Tropical Cyclone Wind Estimation Competition (nasa_tropical_storm_competition):

>>> dataset = Dataset.fetch('nasa_tropical_storm_competition')

Source Imagery and Label Collections

We can use the Dataset.collections property to inspect the source imagery and label collections associated with our dataset:

>>> dataset.collections
[<Collection id=nasa_tropical_storm_competition_train_labels>, <Collection id=nasa_tropical_storm_competition_train_source>, <Collection id=nasa_tropical_storm_competition_test_source>]

We can filter this down to just the source imagery or label collections for the dataset using the Dataset.collections.source_imagery and Dataset.collections.labels attributes, respectively:

>>> dataset.collections.source_imagery
[<Collection id=nasa_tropical_storm_competition_train_source>,
 <Collection id=nasa_tropical_storm_competition_test_source>]
>>> dataset.collections.labels
[<Collection id=nasa_tropical_storm_competition_train_labels>]

Each of these collections is a PySTAC Collection instance, so we can use the methods and properties from that class to get more information about the collection.

>>> labels_collection = dataset.collections.labels[0]
>>> labels_collection.id
'nasa_tropical_storm_competition_train_labels'
>>> labels_collection.title
'NASA Tropical Storm Wind Speed Competition Training Labels'
>>> labels_collection.extent.spatial.to_dict()
{'bbox': [[-180.0, -90.0, 180.0, 90.0]]}
>>> labels_collection.extent.temporal.to_dict()
{'interval': [['2000-01-01T00:00:00Z', '2019-12-31T00:00:00Z']]}
>>> labels_collection.license
'CC-BY-4.0'

The data from this collection has a global extent, covers the years 2000–2019, and is licensed under the Creative Commons Attribution 4.0 International license.

Download Collection Archives

To make it easier to download all of the assets for a given collection, Radiant MLHub provides compressed archives for each collection available through the API. You can read more about the content and structure of these archives in this post:

Archived Training Dataset Downloads now Available on Radiant MLHub

A little over a year ago we launched the first iteration of Radiant MLHub in the form of a STAC-compliant API which…

medium.com

The Python client provides a convenient Collection.download method to download these archives. We’ll give this method a path to directory into which the archive will be downloaded (the file name of the archive will be determined automatically from the remote file):

>>> labels_collection.download('~/Downloads')
PosixPath('/Users/username/Downloads/nasa_tropical_storm_competition_train_labels.tar.gz')

We can also download the archives for all collections in a dataset using the Dataset.download method:

>>> dataset.download('~/Downloads')
[PosixPath('/Users/jduckworth/Downloads/nasa_tropical_storm_competition_train_labels.tar.gz'),
PosixPath('/Users/jduckworth/Downloads/nasa_tropical_storm_competition_train_source.tar.gz'),
PosixPath('/Users/jduckworth/Downloads/nasa_tropical_storm_competition_test_source.tar.gz')]

In our case, this will download the archives for all 3 of the collections associated with the nasa_tropical_storm_competition.

If a file already exists, the library will check that it is complete by comparing the size to the size of the archive on the server. If the download is complete, it will skip that file, otherwise, it will resume downloading from the point where the last download stopped. This means that if the download fails due to a broken connection you can simply call the download method again to pick up where you left off. The client will also automatically retry requests that fail due to a connection error up to 10 times before raising an exception.

Now that we have all of the archives downloaded, we can use the tar utility to extract them and take a look at the data:

$ cd ~/Downloads
$ tar -xzf nasa_tropical_storm_competition_train_source.tar.gz
$ ls nasa_tropical_storm_competition_train_source | head -n 5collection.json
nasa_tropical_storm_competition_train_source_abs_000
nasa_tropical_storm_competition_train_source_abs_001
nasa_tropical_storm_competition_train_source_abs_002
nasa_tropical_storm_competition_train_source_abs_003$ ls nasa_tropical_storm_competition_train_source/nasa_tropical_storm_competition_train_source_abs_000features.json image.jpg     stac.json

The collection.json and stac.json files are STAC Collection and Item objects, respectively, and the features.json file represents training features associated with the image in image.jpg (storm ID, relative time, etc.).

Conclusion

We’ve seen how we can use the radiant_mlhub Python client to manage API authentication, discover and fetch datasets, and download assets for those datasets. For more examples of using Radiant MLHub to download and visualize training data, check out the Radiant MLHub Tutorials repository.

Development of the radiant_mlhub library is ongoing, and we plan to issue a second beta release near the end of April 2021. We hope to include the following improvements and features in that release:

More detailed collection descriptions
Collection-level summaries of image formats, number of images, size of archives, and other useful information
Publish as conda-forge package
More powerful spatiotemporal search capabilities

Get Involved!

Though the library is still in a beta stage, we encourage users to try it out and send us feedback; we want to know how we can make the Python client and the API more useful for on-the-ground practitioners of machine learning on satellite imagery.

You can report bugs or feature requests for the radiant_mlhub Python client using GitHub issues. For questions and troubleshooting regarding the Radiant MLHub API, you can send us a support email at support@radiant.earth. To stay connected to the Radiant MLHub community, join the RadiantMLHub Slack channel.

Thanks, and we look forward to hearing from you!