Super-Resolution and Object Detection: A Love Story- Part 4
Jake Shermeyer & Adam Van Etten
In our previous three posts [1, 2, 3], we showcased the initial results of our study on super-resolution and object detection performance. In this post, we dive a bit deeper into class specific results, and draw our final conclusions to close out this blog series. Ultimately, we undertook a rigorous study of the utility provided by super-resolution techniques towards the detection of objects in satellite imagery. We paired two super-resolution techniques (VDSR and RFSR) with advanced object detection methods and searched for objects in a satellite imagery dataset with over 250,000 labeled objects in a diverse set of environments.
Our results indicate, that while super-resolution is not a direct replacement for actual imagery, the application of SR techniques as a preprocessing step does provide an improvement in object detection performance at most resolutions. For both models, the greatest benefit is achieved at the highest resolutions, as super-resolving native 30 cm imagery to 15 cm yields a 16–20% improvement in mAP. Super-resolution techniques provide lesser gains at coarser resolutions for 2x enhancement (3–10% improvement), though 4x enhancement yields a 7–9% improvement.
These findings indicate that if the data input to SR algorithms is too coarse, the algorithms are less effective and unable to find enough unique discriminating features to adequately reconstruct higher resolution images. It is apparent from this research that super-resolution for satellite imagery should be quantified in terms of GSD gained, not in terms of enhancement level. These techniques can effectively improve resolution by tens of centimeters up to 1 or perhaps 2 meters. Creating an enhancement of ~5 or more meters is unrealistic using SR, and consequently not an ideal approach for improving the value of coarser satellite imagery.
Given the relative ease of applying SR techniques, the general improvement observed in this study is noteworthy and could be a valuable preprocessing step for future object detection applications with satellite imagery, particularly when searching for certain objects with few distinguishing features (specifically boats, small aircraft, and buses/trucks). However, at the highest resolutions these techniques are less effective for small vehicles and large airplanes. This is likely due to the relationship between the window size (the amount of imagery an object detection algorithm can see), the types of objects, and the number of pixels of each object. Cars require less fine detail and large planes begin to be clipped in half and lose the required neighboring context in the finest resolutions.
Finally, this study showcases the value of data quality. Previous research has shown that for small vehicles, an average precision of 0.91 can be achieved given similar 30cm oversampled imagery with YOLT. In this study, in a nearly identical testing scenario with 30cm oversampled imagery, an average precision for detecting small vehicles with YOLT declined to 0.81. This drop-off is likely indicative of the labeling issues common of the xView dataset and shows that precise and exhaustive labeling is a requirement for the best and most consistent object detection performance.
In summary we find that:
- Super-resolution is less effective as resolution degrades and the most valuable in finer resolutions
- Super-resolution is a relatively inexpensive enhancement that can improve object detection performance, particularly for certain classes with distinctive features
- Less computationally expensive super-resolution techniques can be just as effective as high-performance computing techniques in certain resolutions
- Data quality matters. Precise and exhaustive labeling is a requirement for the best results
Thanks for reading our series on object detection and super-resolution performance. Again we encourage you to check out our other stories on the DownlinQ, our source code on both super-resolution and object detection, and check our our arXiv paper on this topic for a deeper dive.