Earth Observation Even on a Cloudy Day

Published in

GeoAI

8 min readJan 19, 2021

Authors: Guneet Mutreja, Rohit Singh, Mayank Khanduja;

This blog will provide an overview of Synthetic-Aperture Radar (SAR) to RGB image translation using the recently implemented CycleGAN model in the ArcGIS API for Python.

Motivation

Consider a scenario in which a cloudy day is preventing the use of optical imagery for earth observation. Synthetic-aperture Radar (SAR) which is an active data collection type provides an alternative method of image capture that can penetrate clouds and produce the desired ground imagery.

While this application of SAR is obviously useful, it is a complex technology with its own hurdles for those unfamiliar with it. Fortunately, deep learning image translation models allow users to convert SAR images to a more easily understandable optical RGB image.

One such model is CycleGAN, which has recently been added to the arcgis.learn module of ArcGIS API for Python. The rest of this blog will go through the steps showing how the model can be used and the challenges we faced.

It is important to note that the CycleGAN model expects unpaired data and it does not have any information on mapping SAR to RGB pixels, so it may map dark pixels in the source image to darker shaded pixels in the other image which may not be right always (especially in agricultural land areas). If this kind of problem is faced where results are mismatched because of wrong mapping, Pix2Pix model which expects paired data can be used.

Background

Image translation is a technique to make neural networks learn to translate one image to another. We made use of Generative Adversarial Networks, or GANs to perform this task. GANs are a set of generative models that can generate realistic images, videos, etc. thus, can help us generating RGB images from SAR images on cloudy days. There are multiple generative models like BiGAN, Pix2Pix, CoGAN, CycleGAN, etc. out of which we made use of CycleGAN.

Model architecture

The Cycle Generative Adversarial Network, or CycleGAN, is an approach to training a deep convolutional neural network for image-to-image translation tasks. The Network learns a mapping between input and output images using an unpaired dataset. For Example: Generating RGB imagery from SAR, multispectral imagery from RGB, map routes from satellite imagery, etc.

The model architecture is comprised of two generator models: one generator (Generator-A) for generating images for the first domain i.e. SAR imagery in our case (Domain-A) and the second generator (Generator-B) for generating images for the second domain i.e. RGB imagery (Domain-B). Here, both these generator models are U-Net based encoder-decoders.

Domain-B -> Generator-A -> Domain-A
Domain-A -> Generator-B -> Domain-B

Each generator has a corresponding discriminator model (Discriminator-A and Discriminator-B). The Discriminator model is an image classifier that takes real images from the respective Domain and generated images from the respective Generator to predict whether they are real or fake.

Domain-A -> Discriminator-A -> [Real/Fake]
Domain-B -> Generator-A -> Discriminator-A -> [Real/Fake]
Domain-B -> Discriminator-B -> [Real/Fake]
Domain-A -> Generator-B -> Discriminator-B -> [Real/Fake]

Based on the prediction made by the discriminator, the loss is calculated.

How the loss is calculated while training?

The loss used to train the Generators consists of three parts:

Adversarial Loss: We apply Adversarial Loss to both the Generators, where the Generator tries to generate the images of its domain, while its corresponding discriminator distinguishes between the translated samples and real samples. Generator aims to minimize this loss against its corresponding Discriminator that tries to maximize it.
Cycle Consistency Loss: It captures the intuition that if we translate the image from one domain to the other and back again we should arrive at where we started. Hence, it calculates the L1 loss between the original image and the final generated image, which should look the same as the original image. It is calculated in two directions:

Forward Cycle Consistency: Domain-B -> Generator-A -> Domain-A -> Generator-B -> Domain-B
Backward Cycle Consistency: Domain-A -> Generator-B -> Domain-B -> Generator-A -> Domain-A

3. Identity Loss: It encourages the generator to preserve the color composition between input and output. This is done by providing the generator an image of its target domain as input and calculating the L1 loss between input and the generated images.

Domain-A -> Generator-A -> Domain-A
Domain-B -> Generator-B -> Domain-B

As all of these loss functions play critical roles in arriving at high-quality results. Hence, both the generator models are optimized via a combination of all of these loss functions.

Data and preprocessing

The sample data we will be using is a single band (HH) Capella Space’s simulated SAR imagery which we received in tiff format and optical RGB imagery for Rotterdam in the Netherlands. We have converted the single band SAR imagery to 8 bit unsigned, 3 bands raster using the Extract Bands raster function in ArcGIS Pro, which will allow us to export 3 band JPEG images for data preparation.

Exporting training data

Deep learning models need training data to learn from, so we will use the Export Training Data for Deep Learning tool in ArcGIS Pro to export appropriate training samples from our data. ArcGIS Pro has recently added support for exporting data in the Export Tiles format, which we will be using for this task. This newly added format allows you to export image chips of a defined size without requiring any labels. With this process, we exported 3087 chips, of which we used approximately 90% (2773) images for training and the remaining 308 images for validating our model.

Exported chips using Export Tiles format

Model Training

After exporting the training data, we used ArcGIS Notebooks and the arcgis.learn module in the Python API to train the model.

Prepare data

Before we can begin to train our model, we first need to prepare our data. To do this, we used the prepare_data() function available in the API and passed the dataset_type parameter as “CycleGAN”. The prepare_data() function prepares a data object from the training data that we exported in the previous step. This data object consists of training and validation data sets with the specified transformations, chip size, batch size, split percentage, etc.

Finding optimal learning rate and model fitting

After we prepared the data, we then began the process of finding an optimal learning rate using lr_find() and fitting our model using the fit() method in the API.

Based on multiple experiments we did, we found that choosing the learning rate from the steepest slope generated the best results, so we chose 2e-04 while fitting the model below. Increasing training data and choosing the learning rate from the steepest slope allowed us to get rid of the Mode Collapse problem we were facing in the initial experiments.

From the statistics in the figure above, we can see that our validation loss continues to decrease with each epoch of training. The resulting validation loss of our initial 25 epochs still left room for improvement, so we trained the model for an additional 25 epochs.

Results visualization

Next, we will validate the model by visualizing a few samples from the validation data set we prepared by simply calling show_results() using the API.

Results achieved after training the model for 50 epochs

The screenshot above showed that the model was trained well to convert SAR to RGB images as well as RGB images to SAR.

The trained model was then inferenced on a larger extent, results of which are shown in the screenshot below indicating that our model can realistically convert SAR to RGB images.

Optical imagery vs translated imagery using the model we trained

Inferencing on a larger extent

Once we were satisfied with the results of our trained model, we performed inferencing on a larger scale to convert SAR imagery to RGB. We did this by using the Classify Pixels using Deep Learning tool available in ArcGIS Pro.

Classify pixels using deep learning tool in ArcGIS Pro

The resulting inferenced imagery is presented in the figure below. You can observe that the SAR imagery is now more interpretable by humans after being translated to optical imagery using the model.

SAR to RGB image translated using the process

Conclusion

In this post, we have seen a practical application of using generative deep learning to convert Synthetic-aperture Radar (SAR) imagery to optical RGB imagery. This is made possible through the image-to-image translation models like CycleGAN in the arcgis.learn module of ArcGIS API for Python.

Earth observation is an important, yet challenging task, especially on cloudy days. SAR to RGB image translation using models like CycleGAN can be a great tool to overcome the limitations of optical imagery. The exercise shows how generative deep learning models can help us reap the benefits of SAR imagery even on cloudy days.

To get a glimpse of the complete workflow behind this study, do check this sample notebook on ArcGIS API for Python [5].

Acknowledgment

We wish to acknowledge Capella Space for making the SAR imagery available for this study. Capella has recently unveiled the world’s highest-resolution commercial SAR imagery that allows us to monitor our planet in all-weather and in all-light conditions.

References

How CycleGAN works: https://developers.arcgis.com/python/guide/how-cyclegan-works/
Jun-Yan Zhu, Taesung Park, Phillip Isola, Alexei A. Efros, “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”, 2017;arXiv:1703.10593.
SAR to RGB image translation using CycleGAN: https://developers.arcgis.com/python/sample-notebooks/sar-to-rgb-image-translation-using-cyclegan/