Artificial “Multispectralization” of Color Satellite Imagery via GANs
In this blog post, we discuss the analogous task of “multispectralization”: predicting 8-band visible and near-infrared (VNIR) multispectral images from 3-band RGB color images. The intuition behind this project is that contextual information in the image may allow deep learning algorithms to deduce the object type of a given pixel, thereby enabling the inference of additional multispectral values for this pixel since many objects have documented distinct spectral signatures.
Although our multispectralization GAN achieves a high average peak signal-to-noise ratio (PSNR) score, the failures of our multispectralization GAN also prove intriguing. That is, certain objects in our satellite imagery confuse the GAN and hence attain very low local PSNR scores. These objects have “interesting” spectral signatures from the viewpoint of the GAN.
We conclude this post with ideas on how to further pursue the interaction between multispectral satellite imagery and deep learning.
The dataset used in this project is the second SpaceNet dataset, which provides satellite imagery for four different cities (Las Vegas, Paris, Shanghai, and Khartoum). However, we restrict our attention here to Las Vegas and Khartoum. The imagery is comprised of grayscale 30 cm GSD, as well as 30 cm 3-band RGB color imagery, and 30 cm 8-band VNIR multispectral imagery.
Generative Adversarial Networks and Multispectralization
In a previous blog post on colorization via generative adversarial networks, we described the basic definitions and ideas of GANs as well as the specific architecture we chose to address the problem of artificial colorization. Since we use the same architecture for multispectralization (except changing the number of input and output bands from 1 & 3 to 3 & 8), we refer the reader to the colorization blog post for a technical discussion of GANs in this context.
For the remainder of this post, the multispectralization GAN will be treated as a black box, whose input is a 3-band RGB image and whose output is an 8-band multispectral image, and we focus on the output and the applications of our multispectralization GAN.
The Artificial NDVI index and the Artificial NDWI index
The normalized difference vegation index (NDVI) is a well-established mechanism for identifying and visualizing vegetation in multispectral remote sensing images. It relies on the principle that the chlorophyll in living plant material strongly absorbs visible light, and strongly reflects near-infrared light. The NDVI is calculated from multispectral data according to the following formula: [ NDVI = (Xnir — Xred)/(Xnir + Xred) ], where Xnir refers to the near-infrared band and Xred refers to the red band.
We can use the GAN to generate the NDVI index from 3-band imagery. Given a 3-band RGB color satellite image x, one can artificially generate an 8-band multispectral image G(x) using the multispectralization GAN, and then compute the NDVI G(x). We will call this the artificial NDVI.
The following visualization is produced by taking an 8-band multispectral image over Las Vegas y, its corresponding 3-band color image u, and comparing the NDVI of y and G(u).
Figure: Left is the real NDVI and right is the artificial NDVI
The real and artificial NDVIs produce strikingly similar visual results. Both indicate high (green) values for vegetation as can be seen in the golf course and various house lawns.
Similar to NDVI, the normalized difference water index (NDWI) is a standard index for monitoring changes in water content of leaves or a body of water. The NDWI is computed from multispectral data as follows: [ NDWI = (Xgreen — Xnir)/(Xgreen + Xnir) ], where Xgreen refers to the green band and Xnir refers to the NIR band . This formulation of NDWI produces an image in which the positive data values are typically open water areas; while the negative values are typically non-water features.
As with NDVI, we can compute NDWI via 3-band imagery and the GAN. The following visualization compares the real NDWI and artificial NDWI of an image over Las Vegas.
Figure: Left is the real NDVI and right is the artificial NDVI
Both the real NDWI and the artificial NDWI show high (blue) values for backyard pools. There are also some high NDWI values for shadows in both images, but this is a well-documented property of the NDWI because dark water bodies and shadows have similar spectral properties.
The PSNR Heat Map as a Measure of Image Reconstruction Quality
We now explore the limitations and failures of the multispectralization GAN. That is, given an 8-band multispectral image y, and its underlying 3-band RGB image u, when do y and G(u) significantly differ? And can one speculate as to why they differ?
The Peak signal-to-noise ratio (PSNR), which is used in a variety of applications including image compression, can also be used to describe the quality measurement between an original and reconstructed image. For example, CosmiQ has used the PSNR metric previously to assess the performance of a super resolution algorithm. (Chao Dong also used the PSNR metric in the paper “Image Super-Resolution Using Deep Convolutional Networks”.) PSNR is usually expressed in terms of the logarithmic decibel scale, and the higher the PSNR, the better the quality of the reconstructed image. The precise definition and formulas for PSNR can be found here.
The PSNR heat map is simply an image that maps the PSNR computed locally around each pixel in an artificially generated 8-band image. In our example, the PNSR corresponding to a pixel in an artificial 8-band image is the average PSNR of the 8 x 8 square image surrounding that pixel.
The following two visualizations are PSNR heat maps over Las Vegas and Khartoum.
There are a few observations to make about these two PSNR heat maps.
- Overall, the GAN recovers a majority of the multispectral information in both images. Many areas in Las Vegas and Khartoum are achieving PSNR scores in the 40’s, which exceed our initial expectations based upon prior work including that of Dong mentioned above. That said, our initial findings should be considered preliminary only.
- No single PSNR number can characterize the quality of an entire, large image. Some areas and objects in an image are reconstructed with higher accuracy than others.
- Given limited training data, cities with more homogenous landscapes like Las Vegas achieve higher PSNR scores than cities with heterogeneous landscapes like Khartoum.
Objects with “Interesting” Spectral Signatures from the Viewpoint of GANs
The PSNR heat maps introduced in the previous section demonstrate that the mulitspectralization GAN reconstructs multispectral image information more accurately in some areas that in others. In this section, we take a closer look at examples of this phenomenon. Furthermore, we interpret the areas and objects where the PSNR score is low as “interesting” from the viewpoint of the GAN.
We now visualize five areas/objects together with their PSNR heat maps.
In general, areas or objects with unexpected spectral properties (from the viewpoint of the GAN) will light up as red and orange.
For example, given a pixel belonging to the green tennis courts in image scene D, the GAN might be mistaking this pixel for grass and hence predict multispectral values that correspond to grass in the training data. The PSNR heat map compares these values to the original image and obtains a low PSNR score displayed as bright orange over the tennis courts.
For another example, given a pixel belonging to shallow and muddy water in image scene A, which is Khartoum’s White Nile river, the GAN might be mistaking this pixel for deeper water and predicting multispectral values that correspond to deeper water in the training data.
Future Research Plans
We intend to continue to explore the potential value of multispectral information for deep learning object detection algorithms, as well as the extent to which GANs might be used as a pre-processing step to enhance their performance. We are considering developing a data set that will include labeled objects with known unique VNIR spectral signatures. In the recent research paper entitled Detection of Classification of Buildings in Multispectral Satellite Imagery by Ishii et al. (link), the authors demonstrated significant improvement of their deep learning object detection algorithm when using seven of the eleven Landsat 8 bands compared to when using only the three RGB bands. Based on their results, we believe imagery focused on solar farms would be a reasonable starting point for creating a new data set consisting of low-resolution images from NASA’s Landsat 8 and very high resolution images from DigitalGlobe’s Worldview 2 or Worldview 3.
Given such a labeled data set, we could explore and quantify the performance of several popular deep learning object detection algorithms (e.g., YOLT and MNC) against both 3-band and 8-band data for low and high resolution imagery. We could then apply our “multispectralization GAN” as a pre-processing step to determine whether any of the performance degradation in going from 8-band to 3-band could be recovered, as we demonstrated in our prior post with the MNC algorithm applied to grayscale imagery (link). Such work could further illustrate the potential utility of the GAN as a pre-processing step for object detection algorithms in particular scenarios, for example, in bandwidth-limited applications.
Attribution: Special thanks to Adam, Dave, Jake, Lisa, and Ryan.