RarePlanes — Dataset, Paper, and Code Release

Jake Shermeyer
The DownLinQ
3 min readJun 9, 2020

--

Today, in collaboration with AI.Reverie, we are proud to announce the release of the RarePlanes dataset, research paper, and codebase. RarePlanes is a unique open-source machine learning dataset that incorporates both real and synthetically generated satellite imagery. The dataset specifically focuses on the value of synthetic data to aid computer vision algorithms in their ability to automatically detect aircraft and their attributes in satellite imagery. Although other synthetic/real combination datasets exist, RarePlanes is the largest openly-available very-high resolution dataset built to test the value of synthetic data from an overhead perspective.

Example of the real and synthetic datasets present in RarePlanes. Can you spot the difference? The top two rows feature the real Maxar WorldView 3 satellite imagery and the bottom two rows show the AI.Reverie synthetic data. The dataset features variable weather conditions, biomes, and ground surface types.

The real portion of the dataset consists of 253 Maxar WorldView-3 satellite scenes spanning 112 locations and 2,142km² with ~14,700 hand annotated aircraft. The accompanying synthetic dataset is generated via the novel AI.Reverie simulation platform and features 50,000 synthetic satellite images with over ~600,000 aircraft annotations. Both the real and synthetically generated aircraft feature 10 fine grain attributes including: aircraft length, wingspan, FAA wingspan class, wing-shape, wing-position, propulsion, number of engines, number of vertical-stabilizers, if it has canards, and aircraft role. The paper also showcases many experiments to evaluate the real and synthetic datasets and compare performances. By doing so, we show the value of synthetic data for the task of detecting and classifying aircraft from an overhead perspective.

RarePlanes Locations. A map of all of the real (blue dots) and synthetic (red dots) contained in the dataset.

Download Information

The dataset is made available via the AWS Open Data Program, permissively licensed (CC BY-SA 4.0), and can now be downloaded for free. All you need is an AWS account and the AWS CLI installed and configured. Once you’ve done that, simply run the command(s) below to download the datasets to your working directory!

Real (~107 GB):
aws s3 cp --recursive s3://rareplanes-public/real/tarballs/ .
Synthetic (~211 GB):
aws s3 cp --recursive s3://rareplanes-public/synthetic/ .
Model Weights (~4 GB):
aws s3 cp --recursive s3://rareplanes-public/weights/ .

The Paper

The paper details the dataset and baseline experiments we conducted and can be read here:

https://arxiv.org/abs/2006.02963

Example of aircraft detection results. (a) ground truth, (b) model trained real dataset (c) model trained on synthetic dataset (d) model fine tuned on real subset.

The Codebase

We also provide pre-processing code to work with the dataset, create labels, as well as up to 110 custom classes using combinations of the attributes:

https://github.com/aireveries/RarePlanes

RarePlanes Attributes and Label Scheme. The 5 features, 10 attributes, and 33 sub-attributes annotated for each aircraft.

The User Guide

Finally we provide a user-guide as well as a full listing of all of the content featured in this blog post, which can be found on the CosmiQ Works website:

https://CosmiQWorks.org/RarePlanes

What’s Next?

Although this post represents the end of the runway on the initial RarePlanes research study, we plan to have more great RarePlanes content coming up. Watch the DownLinQ and the skies and you will see some more planes in the future.

--

--

Jake Shermeyer
The DownLinQ

Data Scientist at Capella Space. Formerly CosmiQ Works.