Super-Resolution and Object Detection: A Love Story- Part 4

Jake Shermeyer & Adam Van Etten

Detection of objects in the 30cm native resolution imagery.

In our previous three posts [1, 2, 3], we showcased the initial results of our study on super-resolution and object detection performance. In this post, we dive a bit deeper into class specific results, and draw our final conclusions to close out this blog series. Ultimately, we undertook a rigorous study of the utility provided by super-resolution techniques towards the detection of objects in satellite imagery. We paired two super-resolution techniques (VDSR and RFSR) with advanced object detection methods and searched for objects in a satellite imagery dataset with over 250,000 labeled objects in a diverse set of environments.

Performance of YOLT RFSR 4x 30cm imagery (top left), YOLT RFSR 4x 60cm imagery (top right), YOLT VDSR 2x 15cm Imagery(bottom left), and YOLT VDSR 4x 120cm Imagery (bottom right). Cars are green, buses/trucks are blue, small aircraft are red, and large aircraft are yellow. We use a low detection threshold of 0.1 (this detection threshold yields fewer false negatives but more false positives).

Our results indicate, that while super-resolution is not a direct replacement for actual imagery, the application of SR techniques as a preprocessing step does provide an improvement in object detection performance at most resolutions. For both models, the greatest benefit is achieved at the highest resolutions, as super-resolving native 30 cm imagery to 15 cm yields a 16–20% improvement in mAP. Super-resolution techniques provide lesser gains at coarser resolutions for 2x enhancement (3–10% improvement), though 4x enhancement yields a 7–9% improvement.

These findings indicate that if the data input to SR algorithms is too coarse, the algorithms are less effective and unable to find enough unique discriminating features to adequately reconstruct higher resolution images. It is apparent from this research that super-resolution for satellite imagery should be quantified in terms of GSD gained, not in terms of enhancement level. These techniques can effectively improve resolution by tens of centimeters up to 1 or perhaps 2 meters. Creating an enhancement of ~5 or more meters is unrealistic using SR, and consequently not an ideal approach for improving the value of coarser satellite imagery.

Given the relative ease of applying SR techniques, the general improvement observed in this study is noteworthy and could be a valuable preprocessing step for future object detection applications with satellite imagery, particularly when searching for certain objects with few distinguishing features (specifically boats, small aircraft, and buses/trucks). However, at the highest resolutions these techniques are less effective for small vehicles and large airplanes. This is likely due to the relationship between the window size (the amount of imagery an object detection algorithm can see), the types of objects, and the number of pixels of each object. Cars require less fine detail and large planes begin to be clipped in half and lose the required neighboring context in the finest resolutions.

Average precision performance plots for buses and trucks with YOLT. Note that the 15cm super-resolution output is an effective technique for discrimination of this class.
Average precision performance plots for large aircraft with YOLT. Note that the 15cm super-resolution output and oversampled 30cm output causes a reduction in window size, reducing what an algorithm can see consequently causing a drop-off in performance for this class. Performance peaks at 60–120cm GSD.
Average precision performance plots for small vehicles with YOLT. Note that the 15cm super-resolution output is less effective than simply oversampling the 30cm imagery to match a 15cm GSD grid.

Finally, this study showcases the value of data quality. Previous research has shown that for small vehicles, an average precision of 0.91 can be achieved given similar 30cm oversampled imagery with YOLT. In this study, in a nearly identical testing scenario with 30cm oversampled imagery, an average precision for detecting small vehicles with YOLT declined to 0.81. This drop-off is likely indicative of the labeling issues common of the xView dataset and shows that precise and exhaustive labeling is a requirement for the best and most consistent object detection performance.

Challenges of the xView dataset. Red = Car, Green=Truck, Orange= Bus, Purple= Boat. Note the multiple missed cars and the incorrectly sized labeling (left) and the erroneous boat labels (right). These errors are unfortunately prevalent in the dataset and diminish its overall value.

In summary we find that:

  • Super-resolution is less effective as resolution degrades and the most valuable in finer resolutions
  • Super-resolution is a relatively inexpensive enhancement that can improve object detection performance, particularly for certain classes with distinctive features
  • Less computationally expensive super-resolution techniques can be just as effective as high-performance computing techniques in certain resolutions
  • Data quality matters. Precise and exhaustive labeling is a requirement for the best results

Thanks for reading our series on object detection and super-resolution performance. Again we encourage you to check out our other stories on the DownlinQ, our source code on both super-resolution and object detection, and check our our arXiv paper on this topic for a deeper dive.