Super-Resolution and Object Detection: A Love Story- Part 4

Jake Shermeyer & Adam Van Etten

Jake Shermeyer
The DownLinQ
4 min readJan 7, 2019

--

Detection of objects in the 30cm native resolution imagery.

In our previous three posts [1, 2, 3], we showcased the initial results of our study on super-resolution and object detection performance. In this post, we dive a bit deeper into class specific results, and draw our final conclusions to close out this blog series. Ultimately, we undertook a rigorous study of the utility provided by super-resolution techniques towards the detection of objects in satellite imagery. We paired two super-resolution techniques (VDSR and RFSR) with advanced object detection methods and searched for objects in a satellite imagery dataset with over 250,000 labeled objects in a diverse set of environments.

Performance of YOLT RFSR 4x 30cm imagery (top left), YOLT RFSR 4x 60cm imagery (top right), YOLT VDSR 2x 15cm Imagery(bottom left), and YOLT VDSR 4x 120cm Imagery (bottom right). Cars are green, buses/trucks are blue, small aircraft are red, and large aircraft are yellow. We use a low detection threshold of 0.1 (this detection threshold yields fewer false negatives but more false positives).

Our results indicate, that while super-resolution is not a direct replacement for actual imagery, the application of SR techniques as a preprocessing step does provide an improvement in object detection performance at most resolutions. For both models, the greatest benefit is achieved at the highest resolutions, as super-resolving native 30 cm imagery to 15 cm yields a 16–20% improvement in mAP. Super-resolution techniques provide lesser gains at coarser resolutions for 2x enhancement (3–10% improvement), though 4x enhancement yields a 7–9% improvement.

These findings indicate that if the data input to SR algorithms is too coarse, the algorithms are less effective and unable to find enough unique discriminating features to adequately reconstruct higher resolution images. It is apparent from this research that super-resolution for satellite imagery should be quantified in terms of GSD gained, not in terms of enhancement level. These techniques can effectively improve resolution by tens of centimeters up to 1 or perhaps 2 meters. Creating an enhancement of ~5 or more meters is unrealistic using SR, and consequently not an ideal approach for improving the value of coarser satellite imagery.

Given the relative ease of applying SR techniques, the general improvement observed in this study is noteworthy and could be a valuable preprocessing step for future object detection applications with satellite imagery, particularly when searching for certain objects with few distinguishing features (specifically boats, large aircraft, and buses/trucks).

Performance of YOLT models on buses and trucks as a function of sensor resolution. The lower axis indicates the sensor resolution, with average precision plotted on the y-axis.
Performance change over original resolution for buses & trucks (see figure above).
Performance of YOLT models on small aircraft as a function of sensor resolution. The lower axis indicates the sensor resolution, with average precision plotted on the y-axis.
Performance change over original resolution for small aircraft (see figure above).

Finally, this study showcases the value of data quality. Previous research has shown that for small vehicles, an average precision of 0.9 can be achieved given similar 30cm imagery with YOLT. In this study, in a nearly identical testing scenario with 30cm imagery, an average precision for detecting small vehicles with YOLT declined to 0.63. This drop-off is likely indicative of the labeling issues common of the xView dataset and shows that precise and exhaustive labeling is a requirement for the best and most consistent object detection performance.

Challenges of the xView dataset. Red = Car, Green=Truck, Orange= Bus, Purple= Boat. Note the multiple missed cars and the incorrectly sized labeling (left) and the erroneous boat labels (right). These errors are unfortunately prevalent in the dataset and diminish its overall value.

In summary we find that:

  • Super-resolution is less effective as resolution degrades and the most valuable in finer resolutions
  • Super-resolution is a relatively inexpensive enhancement that can improve object detection performance, particularly for certain classes with distinctive features
  • Less computationally expensive super-resolution techniques can be just as effective as high-performance computing techniques in certain resolutions
  • Data quality matters. Precise and exhaustive labeling is a requirement for the best results

Thanks for reading our series on object detection and super-resolution performance. Again we encourage you to check out our other stories on the DownlinQ, our source code on both super-resolution and object detection, and check our our arXiv paper on this topic for a deeper dive.

--

--

Jake Shermeyer
The DownLinQ

Data Scientist at Capella Space. Formerly CosmiQ Works.