Super-Resolution on Satellite Imagery using Deep Learning, Part 2

In Part 1, we trained the neural network CosmiQNet to enhance 3-band satellite imagery. In this post, we continue this investigation with Multi-Spectral Imagery (MSI) available in the satellite imagery of SpaceNet.

Our thesis, in Part 1, is that geographically relevant imagery of higher quality can be used to train a machine learning algorithm to enhance satellite imagery. We redesign CosmiQNet to test new hypotheses:

  • Using all the channels of MSI can improve visual enhancement,
  • Training can occur on imagery that is similar but does not necessarily overlap as in Part 1.

The latter hypothesis was necessary since SpaceNet does not include multiple images over the same region at the time of this post.

The metric that we use to measure enhancement is the distribution of Peak Signal-to-Noise Ratio (PSNR). Since satellite imagery is large and enhancement is not uniform, we present the measure of enhancement as an overlay on the image instead of assigning one number to the process.

The enhancement that is performed is essentially an intermediate step in imagery analysis. A more appropriate metric is the impact of enhancement on automated analysis — we plan to explore this in part 3.

By using MSI to improve super-resolution, we demonstrate a novel application of MSI. One common use of MSI data is to combine the bands to form an index to infer various environmental conditions (e.g., vegetation density, drought). It would be interesting, but beyond the scope of this post, to explore how a DNN like CosmiQNet might be useful to enhance the performance of NDVI-type analyses.

Multi-Spectral Imagery

Multi-spectral imagery (MSI) consists of data collected by sensors of multiple bands of the electromagnetic spectrum. The filtering of photons results in lower (worse) resolution of MSI compare to panchromatic imagery. The lower resolution make MSI a natural candidate for super-resolution.

Image 1: Spectral Bands from WorldView-2 and WorldView-3, see WorldView-2 Specification and the WorldView-3 Specification for details.

While the bands are relatively disjoint, objects seen from satellites typically have significant overlap within bands. The super-resolution process in our experiment enhances only the Coastal, Blue, and Green bands. We chose these three bands arbitrarily from the eight — Coastal, Blue, and Green are the first three channels of the GeoTIFFs in SpaceNet. In hindsight, this choice maximizes the spectral distance of the bands and possibly increases the difficulty of enhancement.

Two SpaceNet images over Rio that we analyze are displayed in Image 2 (training) and Image 3 (testing). A subset of the bands are displayed in these images. Unlike the experiment in part 1 where enhancement of boats is compared to the enhancement of the entire image, we do not isolate a region of structure to compare.

Image 2: The training image of Rio from SpaceNet. This image is a mosaic from multiple WorldView 2 images and shows only three of the eight bands. Water regions are distinguishable from city regions.
Image 3: The test image from SpaceNet. While this is an image over Rio, it does not overlap geographically with the training image.


CosmiQNet is a deep neural network that we have developed to perform several different tasks from building detection to 3-band super-resolution. The DNN is architected as a sequence of perturbative layers and trained a layer at a time. The application determines the number of convolutional sublayers appropriate for each perturbative layer. The perturbative layers are an extension of a ResNet layer that gained fame for winning the 2015 Large Scale Visual Recongnition Challenge.

Our experiment is to demonstrate the impact that 8-bands of MSI has on the super-resolution capability of CosmiQNet. For this experiment, we present two versions of CosmiQNet:

  • (3-band CosmiQNet) Each perturbative layer takes input and produces output in the Coastal, Blue, and Green spectral bands. The other 5 bands are ignored in this DNN.
  • (8-band CosmiQNet) Each perturbative layer takes input and produces output in all 8 bands. This network requires more weights than the 3-band version which results in slightly longer training times.
Image 4: CosmiQNet adapted to support MSI. In one version, only three bands (Coastal, Blue, and Green) are used to enhance those bands. In the other version, all eight bands are used to enhance the Coastal, Blue, and Green bands. The green circle represents the trainable bypass parameters of the blue perturbation layers.

CosmiQNet is constructed in TensorFlow and trained using an Nvidia DevBox with 4 (Maxwell) Titan X GPUs. Each layer was trained for roughly 12 hours — optimizing this training was not fully investigated. For both versions of CosmiQNet, the optimization function is the same: the mean squared error (MSE) between the original image and the enhanced image in the Coastal, Blue, and Green channels. It is computationally easier for gradient descent based training algorithms to optimize MSE than PSNR, even though PSNR is essentially a normalized version of MSE.


Similar to the plots in Part 1, we quantify performance by the distribution of the PSNR gain (compared to linear interpolation) overlaying the image. The red areas represent increased enhancement compared to linear interpolation, while the blue areas represent decreased enhancement compared to linear interpolation.

While it may be desirable to have the entire region red (meaning better than linear interpolation), the linear interpolation process results in a very high PSNR in regions with little structure like water, grassy areas, and forests, and it results in significantly lower PSNR in regions without structure. Hence, the impact of super-resolution is amplified when gain occurs in regions with structure. Similarly, the negative impact of poor super-resolution performance in unstructured regions is offset by the high PSNR from linear interpolation in these regions.

Often, the areas with structure are more likely to contain an object of interest to the analyst and thus have increased value. We do not partition the MSI into regions with more structure and regions with less structure. Comparing the plots in Images 5 and 6 with the SpaceNet imagery in Images 2 and 3 clearly shows enhancement within the city areas.

The key results are in presented in Image 5 (for training) and Image 6 (for testing):

  • The 8-band version of CosmiQNet outperformed the 3-band version. The increase in red regions in the city areas is visually apparent. Further research is required to determine whether this PSNR performance enhancement translates to performance enhancement in object detection and/or image classification tools.
  • The 8-band version’s outperformance is validated by the test SpaceNet image. This demonstrates a degree of transferability of CosmiQNet beyond overlapping regions to geographically similar regions.
Image 5: Distribution of PSNR gain by CosmiQNet compared to linear interpolation on the training SpaceNet image. The left image corresponds to the 3-band version of CosmiQNet and the right image corresponds to the 8-band version of CosmiQNet. Red corresponds to improved PSNR while blue corresponds to worsened PSNR. The PSNR and its gain are measured using the Coastal, Blue, and Green bands of the SpaceNet image. The large blue region corresponds to water area where the linear interpolation performs roughly 20 dB better than it does in the non-water regions.
Image 6: DistrIbution of PSNR gain by CosmiQNet compared to linear interpolation on the test SpaceNet image. The left image corresponds to the 3-band version of CosmiQNet and the right image corresponds to the 8-band version of CosmiQNet. Red corresponds to improved PSNR while blue corresponds to worsened PSNR. The PSNR and its gain are measured using the Coastal, Blue, and Green bands of the SpaceNet image. The test image does not overlap with the training image but rather is a different part of the city of Rio.

What’s next?

We have demonstrated the potential value of MSI in image enhancement using deep learning. Future research is required to determine how much better the ehancement process can become and to quantify the enhancement’s impact on automated analysis. We organize future research directions into the following categories:

  • Algorithm development. We plan to integrate algorithmic advances into future versions of CosmiQNet. Adversarial networks appear promising for image enhancement and for improving the quantifying metric.
  • Enhancement thresholding. At some resolution scale, PSNR decreases even in the structured areas. Quantifying these limits and their relevance to applications can provide perspective on the utility of image enhancement.
  • Impact on object detection. In previous posts, we demonstrated the relationship of resolution on the performance of object detection algorithms. A natural extension of this work is to quantify the effect of image enhancement 0n similar object detection problems.
  • Synthesis applications. Labeled data is a scarce resource. We would like to investigate how CosmiQNet can be used to synthetically increase the size and diversity of a labeled data set.
  • Pan-sharpening. CosmiQNet can be retrained and adapted to improve pan-sharpening of satellite imagery using MSI.

We plan to present more details of our research in future posts in this series.

Like what you read? Give Patrick Hagerty a round of applause.

From a quick cheer to a standing ovation, clap to show how much you enjoyed this story.