Radiant MLHub Python Client — Beta Release
Using the Python client to discover and download training datasets without managing API requests.
We are excited to announce the first beta release of the radiant_mlhub
library, a Python client for working with the Radiant MLHub API! With this release, users can work with Radiant MLHub datasets through an intuitive Python interface without having to worry about constructing API requests and managing authentication.
The library is still in the early stages of development, but we encourage you to try it out and give us feedback on how well it addresses your use-cases. This article will walk you through the process of installing and configuring the library, navigating datasets and their collections, and downloading training datasets. For more detailed documentation of the Python library, please see the official documentation here. A basic knowledge of Python programming is recommended.
Radiant MLHub API
Radiant MLHub is an open library for geospatial training data to advance machine learning applications on Earth Observations. It serves as a resource for a community of practice, giving data scientists benchmarks to train and validate their models and improve its performance. You can read more about the API and available datasets in this article:
The radiant_mlhub
client is intended to be a user-friendly interface to this API for Python users. Some features of the library include:
- Simple authentication handling
- Automatic retrying of failed requests
- Working with datasets and collections as Python objects (using PySTAC)
Installation
The radiant_mlhub
library requires Python 3.6 or above. To install the library using pip
:
$ pip install radiant-mlhub
Authentication
To access the API, you must be authenticated with an API key, which you can acquire on the Radiant MLHub Dashboard. Once you have created an account, go to the “API Keys” tab in the top-left of the dashboard to create an API key.
The radiant_mlhub
allows you to store this API key locally and automatically adds it to all requests. The simplest way to configure the library to use your key is using a “profile”. You can create a profile using the mlhub configure
CLI command:
$ mlhub configure
API Key: <COPY_YOUR_API_KEY_HERE>
This command will save the API key to a .mlhub/profiles
file in your home directory (typically something like /Users/username
on Macs and C:\\Users\UserName
on Windows). If you do not have write access to your home directory, or prefer not to save the API key to disk, you can also set an MLHUB_API_KEY
environment variable to the value of your API key.
Working with Datasets
Datasets in Radiant MLHub are groups of labels along with their associated source imagery. In most cases, these datasets are themselves composed of separate collections, where each collection contains either labels or source imagery.
We can use the radiant_mlhub.Dataset
class to list datasets and print the ID and title of each one:
>>> from radiant_mlhub import Dataset
>>> for dataset in Dataset.list():
... print(f'{dataset.title} ({dataset.id})')
BigEarthNet (bigearthnet_v1)
Chesapeake Land Cover (microsoft_chesapeake)
CV4A Kenya Crop Type Competition (ref_african_crops_kenya_02)
Dalberg Data Insights Crop Type Uganda (ref_african_crops_uganda_01)
Great African Food Company Crop Type Tanzania (ref_african_crops_tanzania_01)
LandCoverNet (landcovernet_v1)
Open Cities AI Challenge (open_cities_ai_challenge)
PlantVillage Crop Type Kenya (ref_african_crops_kenya_01)
Semantic Segmentation of Crop Type in Ghana (su_african_crops_ghana)
Semantic Segmentation of Crop Type in South Sudan (su_african_crops_south_sudan)
Spacenet 1 (spacenet1)
Spacenet 2 (spacenet2)
Spacenet 3 (spacenet3)
Spacenet 4 (spacenet4)
Spacenet 5 (spacenet5)
Spacenet 6 (spacenet6)
Spacenet 7 (spacenet7)
Tropical Cyclone Wind Estimation Competition (nasa_tropical_storm_competition)
Western USA Live Fuel Moisture (su_sar_moisture_content_main)
We can see that there are 19 distinct datasets available via the API. Once we know the ID of a dataset, we can fetch it from the API directly using the Dataset.fetch
method. For our demo, let’s use the dataset from the Tropical Cyclone Wind Estimation Competition (nasa_tropical_storm_competition
):
>>> dataset = Dataset.fetch('nasa_tropical_storm_competition')
Source Imagery and Label Collections
We can use the Dataset.collections
property to inspect the source imagery and label collections associated with our dataset:
>>> dataset.collections
[<Collection id=nasa_tropical_storm_competition_train_labels>, <Collection id=nasa_tropical_storm_competition_train_source>, <Collection id=nasa_tropical_storm_competition_test_source>]
We can filter this down to just the source imagery or label collections for the dataset using the Dataset.collections.source_imagery
and Dataset.collections.labels
attributes, respectively:
>>> dataset.collections.source_imagery
[<Collection id=nasa_tropical_storm_competition_train_source>,
<Collection id=nasa_tropical_storm_competition_test_source>]
>>> dataset.collections.labels
[<Collection id=nasa_tropical_storm_competition_train_labels>]
Each of these collections is a PySTAC Collection
instance, so we can use the methods and properties from that class to get more information about the collection.
>>> labels_collection = dataset.collections.labels[0]
>>> labels_collection.id
'nasa_tropical_storm_competition_train_labels'
>>> labels_collection.title
'NASA Tropical Storm Wind Speed Competition Training Labels'
>>> labels_collection.extent.spatial.to_dict()
{'bbox': [[-180.0, -90.0, 180.0, 90.0]]}
>>> labels_collection.extent.temporal.to_dict()
{'interval': [['2000-01-01T00:00:00Z', '2019-12-31T00:00:00Z']]}
>>> labels_collection.license
'CC-BY-4.0'
The data from this collection has a global extent, covers the years 2000–2019, and is licensed under the Creative Commons Attribution 4.0 International license.
Download Collection Archives
To make it easier to download all of the assets for a given collection, Radiant MLHub provides compressed archives for each collection available through the API. You can read more about the content and structure of these archives in this post:
The Python client provides a convenient Collection.download
method to download these archives. We’ll give this method a path to directory into which the archive will be downloaded (the file name of the archive will be determined automatically from the remote file):
>>> labels_collection.download('~/Downloads')
PosixPath('/Users/username/Downloads/nasa_tropical_storm_competition_train_labels.tar.gz')
We can also download the archives for all collections in a dataset using the Dataset.download
method:
>>> dataset.download('~/Downloads')
[PosixPath('/Users/jduckworth/Downloads/nasa_tropical_storm_competition_train_labels.tar.gz'),
PosixPath('/Users/jduckworth/Downloads/nasa_tropical_storm_competition_train_source.tar.gz'),
PosixPath('/Users/jduckworth/Downloads/nasa_tropical_storm_competition_test_source.tar.gz')]
In our case, this will download the archives for all 3 of the collections associated with the nasa_tropical_storm_competition
.
If a file already exists, the library will check that it is complete by comparing the size to the size of the archive on the server. If the download is complete, it will skip that file, otherwise, it will resume downloading from the point where the last download stopped. This means that if the download fails due to a broken connection you can simply call the download
method again to pick up where you left off. The client will also automatically retry requests that fail due to a connection error up to 10 times before raising an exception.
Now that we have all of the archives downloaded, we can use the tar
utility to extract them and take a look at the data:
$ cd ~/Downloads
$ tar -xzf nasa_tropical_storm_competition_train_source.tar.gz
$ ls nasa_tropical_storm_competition_train_source | head -n 5collection.json
nasa_tropical_storm_competition_train_source_abs_000
nasa_tropical_storm_competition_train_source_abs_001
nasa_tropical_storm_competition_train_source_abs_002
nasa_tropical_storm_competition_train_source_abs_003$ ls nasa_tropical_storm_competition_train_source/nasa_tropical_storm_competition_train_source_abs_000features.json image.jpg stac.json
The collection.json
and stac.json
files are STAC Collection and Item objects, respectively, and the features.json
file represents training features associated with the image in image.jpg
(storm ID, relative time, etc.).
Conclusion
We’ve seen how we can use the radiant_mlhub
Python client to manage API authentication, discover and fetch datasets, and download assets for those datasets. For more examples of using Radiant MLHub to download and visualize training data, check out the Radiant MLHub Tutorials repository.
Development of the radiant_mlhub
library is ongoing, and we plan to issue a second beta release near the end of April 2021. We hope to include the following improvements and features in that release:
- More detailed collection descriptions
- Collection-level summaries of image formats, number of images, size of archives, and other useful information
- Publish as
conda-forge
package - More powerful spatiotemporal search capabilities
Get Involved!
Though the library is still in a beta stage, we encourage users to try it out and send us feedback; we want to know how we can make the Python client and the API more useful for on-the-ground practitioners of machine learning on satellite imagery.
You can report bugs or feature requests for the radiant_mlhub
Python client using GitHub issues. For questions and troubleshooting regarding the Radiant MLHub API, you can send us a support email at support@radiant.earth. To stay connected to the Radiant MLHub community, join the RadiantMLHub Slack channel.
Thanks, and we look forward to hearing from you!