You Only Look Once — Multi-Faceted Object Detection w/ RarePlanes
In a previous post, we announced the release of the RarePlanes dataset and results of the baseline experiments. Today, we seek to demonstrate the versatility of the dataset further as well as its distinct utility. We trained an object detection model to identify not only aircraft but also their features such as the number of engines, wing shape, etc. and built a tutorial so you can do this yourself!
With this tutorial series, we outline the entire machine learning pipeline from start to finish of training a YOLOv5 model on the RarePlanes dataset. If you wish to follow along with this blog, please find the GitHub with spin up instructions here. The AWS AMI can also be accessed here by selecting AMI and then searching for “CosmiQ_YOLO_Planes”.
A quick refresher: the RarePlanes dataset was created by CosmiQ Works and AI.Reverie by combining remote sensing data, primarily of airfields, with synthetically generated data. These images were then classified by five features, ten attributes, and thirty-three sub-attributes. Each plane was annotated by creating diamonds from nose to wing-tip to tail all the way around to preserve width and length ratios, then, different aircraft features were labeled for each annotation. More information can be found in our post here. Below, is a tree of the aircraft classification taxonomy used in the dataset.
The Model (YOLOv5):
Before we dive in, a little context. We performed the following pipeline from both a semantic segmentation approach as well as the object detection approach we settled upon. Ultimately, we settled upon the object detection approach using YOLOv5 as, somewhat unsurprisingly (hindsight is 20/20), the segmentation approach struggled to separate similar nearby objects.
You Only Look Once version 5 (YOLOv5), like its predecessors, is an object detection network. It splits the input image into a grid and then outputs a matrix of bounding box confidences and class probabilities for each grid square. These outputs are then filtered with overlapping and low-confidence detections removed from the final predictions. Those bounding boxes are then piped into a neural network which makes detections. Using the YOLO grid proposal approach (instead of the bulkier region proposal network used in R-CNN style networks), the speed of the predictions is significantly faster allowing YOLOv5 to work in real-time. We chose to work with Ultralytics’ implementation of YOLOv5 is as it is incredibly straightforward, making creating a pipeline using the model significantly less involved than similar approaches.
The RarePlanes dataset includes tiled images which are localized around plane instances which can be found in the `PS-RGB_tiled` directory. We recommend training on these images initially as they increase training speed. If you’re doing this locally, you’ll need to download the data from s3 here. Once you download the images, they must be organized in the following structure (we’ve taken care of this in the AMI):
Using the RarePlanes dataset you have many options for the features you want to detect. For example, you can detect the locations of aircraft, a single attribute of aircraft, or a unique combination of attributes. If we want to detect unique combinations of attributes, the first step in the pre-processing pipeline is creating the custom classes. Any combination of ‘role’,’num_engines’, ‘propulsion’, ‘canards’, ‘num_tail_fins’,’wing_position’, ‘wing_type’, ‘faa_wingspan_class’ can be used to create a custom class. In this tutorial, we chose to combine num_engines and propulsion in our custom class as we hope to boost model inference of each by forcing the model to attempt to identify both of the related attributes.
The list of these custom classes must then be added to the YOLO specific data .yaml file which includes file paths to the training and test images, the number of classes and the class list. An example can be found here.
The final step is creating YOLO labels from the georeferenced tiled images which are space delimited text files that contain the class type of each of the objects, their location, and size. These labels are created for each image to create bounding boxes around each object for training and model evaluation.
The infrastructure for training and running an initial inference pipeline is prebuilt into this YOLOv5 implementation. It essentially involves telling the script where to find the imagery and training labels we created above.
Using the command below, training took about 4–5 hours using 2 NVIDIA Titan XP GPUs.
The inference and scoring scripts are also prebuilt in this YOLOv5 implementation and can be used as an initial gauge of performance. Simply by pointing the function towards the trained weights we can run inference on all 2700+ images in less than two minutes.
As you can see, simple one-line bash commands to run these scripts. These results are not the most accurate, though, as they include duplicate and partial predictions. The rest of the pipeline runs an additional round of non-max suppression to destroy duplicates and stitches the predictions and scores them on the pre-tiled images. Now, let’s see how we did.
Using the F1 metric (the SpaceNet metric) with an Intersection over Union (IoU) of 0.5, the results were incredibly robust for common planes in the dataset with F1 scores in the 90’s. Notably, the model was able to discern the location and number of engines without the training dataset having specific annotations for engines. The number of engines was associated with each plane instance, but not the engines themselves.
Additionally, for less common planes, results were markedly less impressive, likely due to the lack to instances for the model to appropriately fit. In this previous post, we discuss the use of synthetic data to augment these rare classes (or rare planes!) in order to boost class-specific performance.
Robust machine learning relies heavily on high quality datasets. While performance has notably improved with the invention of AlexNet and Convolutional Neural Networks, the prediction mechanisms lack true demonstrated intelligence. Ultimately, the model relies upon “seeing" enough similar scenarios to the test scenario to make an accurate prediction (which can be hundreds or thousands of scenarios). With this, diverse, methodical, well-tagged datasets create effective models with the caveat that you don’t necessarily need tons of data. Diverse, high quality data can often create similarly performant models with significantly less data with even just 3% of the data creating a 2/3 as performant model. Read more about our deep dive into the features of satellite data for machine learning to create a Satellite Utility Manifold here.
As a data scientist, though, one’s role is not the feed a model as much data as possible, but to generate the most accurate predictions to solve some problem. In this case, we did so by creating meaningful custom classes, but in other cases this can mean excluding less relevant characteristics from consideration, etc. Creating custom classes can improve performance as it forces the model to consider specific attributes of the planes. For example, using the combination of propulsion type and the number of engines we saw an improvement in the classification of both attributes. Creating bias is an intrinsic part of this process.
In conclusion, pipelines like this one, can be applied cross domain from the obvious national security applications to health related ones like automating the detection of specific cell types in histological scans with similar if not superior accuracy to manual counting. RarePlanes, too, could enable significant advancement in the computer vision world though testing the value of synthetic data, improving detection techniques, or evaluating zero shot or few-shot learning. In the coming weeks and months, we hope to bring similar application-centric studies and tutorials as we aim to move the field forward.