You Only Look Twice (Part II) — Vehicle and Infrastructure Detection in Satellite Imagery
Rapid detection of objects of vastly different scales over large areas is of great interest in the arena of satellite imagery analytics. In the previous post (6) we implemented a fully convolutional neural network classifier (You Only Look Twice: YOLT) to rapidly localize boats and airplanes in satellite imagery. In this post we detail efforts to extend the YOLT classifier to multiple scales, both at the vehicle level and at infrastructure scales.
1. Combined classifier
Recall that our YOLT training data consists of bounding box delineations of airplanes, boats, and airports.
Our previous post (6) demonstrated the ability to localize boats and airplanes via training a 3-class YOLT model. Expanding the model to four classes and including airports is relatively unsuccessful, however, as we show below.
2. Scale Confusion Mitigation
There are multiple ways one could address the false positive issue noted in Figure 2. Recall from 6 that for this exploratory work our training set consists of only a few dozen airports and a couple hundred airplanes, far smaller than usual for deep learning models. Increasing this training set size could greatly improve our model, particularly if the background is highly varied. Another option would be to use post-processing to remove any detections at the incorrect scale (e.g.: an airport with a size of 50 meters). Another option is to simply build dual classifiers, one for each relevant scale. We explore this final option below.
3. Infrastructure Classifier
We train a classifier to recognize airports and airstrips using the training data described in 6 of 37 Planet images at ~3m ground sample distance (GSD). These images are augmented by rotations and rescaling in the hue-saturation-value (HSV) space.
Over the entire corpus of airport test images, we achieve an F1 score of 0.87, and each image takes between 4–10 seconds to analyze depending on size.
4. Dual Classifiers — Infrastructure + Vehicle
We are now in a position to combine the vehicle-scale classifier trained in 6 with the infrastructure classifier of Section 3 above. For large validation images, we run the classifier at three different scales: 120m, 200m, and 2500m. The first scale is designed for small boats, while the second scale captures commercial ships and aircraft, and the largest scale is optimized for large infrastructure such as airports. We break the validation image into appropriately sized bins and run each image chip on the appropriate classifier. The myriad results from the many image chips and multiple classifiers are combined into one final image.
Overlapping detections are merged via non-maximal suppression, and all detections above a certain threshold are plotted. The relative abundances of false positives and false negatives is a function of the probability threshold. A higher threshold means that only highly probable detections are plotted, yielding fewer detections and therefore fewer false positives and more false negatives. A lower threshold yields more detections and therefore more false positives and fewer false negatives. We find a detection probability threshold of between 0.3 and 0.4 yields the highest F1 score for our validation images. Figure 5 below shows all detections above a threshold of 0.3.
In this post we applied the You Only Look Twice (YOLT) pipeline to localizing both vehicles and large infrastructure projects, such as airports. We noted poor results from a combined classifier, due to confusion between small and large features, such as highways and runways. We were, however, able to successfully train a YOLT classifier to localize airports.
Running the boat + airplane (vehicle) and infrastructure (airport) classifiers in parallel at the appropriate scale yields much better results. We yield an F1 score of greater than 0.8 for all categories. Our detection pipeline optimizes for accurate localization of clustered objects (not for speed), and even so it processes vehicles at a rate of 20–50 km² per minute, and 900–1500 km² per minute for airports. Results so far are encouraging, and it will be interesting to explore in future works how well the YOLT pipeline performs as the number of object categories is increased.
May 29, 2018 Addendum: See this post for paper and code details.