RarePlanes — Dataset, Paper, and Code Release

Jake Shermeyer
Jun 9, 2020 · 3 min read

Today, in collaboration with AI.Reverie, we are proud to announce the release of the RarePlanes dataset, research paper, and codebase. RarePlanes is a unique open-source machine learning dataset that incorporates both real and synthetically generated satellite imagery. The dataset specifically focuses on the value of synthetic data to aid computer vision algorithms in their ability to automatically detect aircraft and their attributes in satellite imagery. Although other synthetic/real combination datasets exist, RarePlanes is the largest openly-available very-high resolution dataset built to test the value of synthetic data from an overhead perspective.

Example of the real and synthetic datasets present in RarePlanes. Can you spot the difference? The top two rows feature the real Maxar WorldView 3 satellite imagery and the bottom two rows show the AI.Reverie synthetic data. The dataset features variable weather conditions, biomes, and ground surface types.

The real portion of the dataset consists of 253 Maxar WorldView-3 satellite scenes spanning 112 locations and 2,142km² with ~14,700 hand annotated aircraft. The accompanying synthetic dataset is generated via the novel AI.Reverie simulation platform and features 50,000 synthetic satellite images with over ~600,000 aircraft annotations. Both the real and synthetically generated aircraft feature 10 fine grain attributes including: aircraft length, wingspan, FAA wingspan class, wing-shape, wing-position, propulsion, number of engines, number of vertical-stabilizers, if it has canards, and aircraft role. The paper also showcases many experiments to evaluate the real and synthetic datasets and compare performances. By doing so, we show the value of synthetic data for the task of detecting and classifying aircraft from an overhead perspective.

RarePlanes Locations. A map of all of the real (blue dots) and synthetic (red dots) contained in the dataset.

Download Information

The dataset is made available via the AWS Open Data Program, permissively licensed (CC BY-SA 4.0), and can now be downloaded for free. All you need is an AWS account and the AWS CLI installed and configured. Once you’ve done that, simply run the command(s) below to download the datasets to your working directory!

Real (~107 GB):
aws s3 cp --recursive s3://rareplanes-public/real/tarballs/ .
Synthetic (~211 GB):
aws s3 cp --recursive s3://rareplanes-public/synthetic/ .
Model Weights (~4 GB):
aws s3 cp --recursive s3://rareplanes-public/weights/ .

The Paper

The paper details the dataset and baseline experiments we conducted and can be read here:


Example of aircraft detection results. (a) ground truth, (b) model trained real dataset (c) model trained on synthetic dataset (d) model fine tuned on real subset.

The Codebase

We also provide pre-processing code to work with the dataset, create labels, as well as up to 110 custom classes using combinations of the attributes:


RarePlanes Attributes and Label Scheme. The 5 features, 10 attributes, and 33 sub-attributes annotated for each aircraft.

The User Guide

Finally we provide a user-guide as well as a full listing of all of the content featured in this blog post, which can be found on the CosmiQ Works website:


What’s Next?

Although this post represents the end of the runway on the initial RarePlanes research study, we plan to have more great RarePlanes content coming up. Watch the DownLinQ and the skies and you will see some more planes in the future.

The DownLinQ

Welcome to the archived blog of CosmiQ Works, an IQT Lab

The DownLinQ

As of March 2021, CosmiQ Works has been folded into IQT Labs. An archive will remain here to showcase historical work from CosmiQ Works that took place July 2016 — March 2021.

Jake Shermeyer

Written by

Data Scientist at Capella Space. Formerly CosmiQ Works.

The DownLinQ

As of March 2021, CosmiQ Works has been folded into IQT Labs. An archive will remain here to showcase historical work from CosmiQ Works that took place July 2016 — March 2021.