Rapid revisit deforestation detection with X-band SAR

Published in

ICEYE Analytics

7 min readJun 10, 2022

This is the fourth blog of a five-part series on the AI4SAR project, sponsored by ESA Φ-lab,. If you haven’t heard about AI4SAR yet, now is a good time to see our project page.

In this article we’ll provide you an overview of why we started to do deforestation at ICEYE and how we kept developing our deforestation model during the AI4SAR project.

Figure 1 : A stack of multilooked stripmap images from our 1 day ground track repeat test site. Each tick is a single day.

A graph showing the amount of deforestation in Figure 1 over time. — Figure 2 : Daily deforestation over the site in Figure 1.

Why do SAR deforestation?

Deforestation with all types of SAR imagery has a fairly large disadvantage compared to hyperspectral imagery such as the freely available global imagery of Sentinel-2 or the rasters published for research and noncommercial uses by the Planet NICFI program. That is that there is no NDVI-index or any feature related to the optical spectrum color of green in SAR. Non-coherent analysis of deforestation must be done based just on the backscatter intensity and texture of the SAR image, which can be challenging.

However there is a very good reason to develop deforestation detection with SAR imagery, namely time. As an example consider a meteorological cloud coverage study by (Collow, Allison B. Marquardt, Mark A. Miller, and Lynne C. Trabachino, 2016) — The authors used a set of three radar-based instruments to estimate the percentage of the horizon over the test site under cloud coverage for the observation period of the one year 2014 on a site that can be seen in Figure 3 and of which one figure of interest is displayed in Figure 4:

Figure 3 : The study site for Collow, Allison B. Marquardt, Mark A. Miller, and Lynne C. Trabachino. “Cloudiness over the Amazon rainforest: Meteorology and thermodynamics.” *Journal of Geophysical Research: Atmospheres* 121.13 (2016): 7990–8005.

Figure 4: Monthly average cloud fraction in Manacapuru, Brazil, in 2014. Measured by CERES (red), MPL (black) and ARSCL (blue). Error bars represent the 25th and 75th percentiles of daily averaged cloud fractions for every month. Collow, Marquardt, Miller and Trabachino, 2016.

This provides a valid niche for SAR deforestation detection. Namely by using a SAR based imaging satellite the acquisitions can be guaranteed and response time is determined by only acquisition frequency, not by weather conditions.

Why a constellation of small satellites?

Anybody who’s worked with SAR data knows how great working with earth observation data that ignores clouds can be. A better question is, why use an X-band smallsat instead of for example data from the excellent ESA Sentinel-1 C-band SAR satellites?

This depends on the application. C-band will penetrate the canopy more, providing more information about underlying things such as degradation. However the backscatter will be more diffuse, and the base resolution is lower.

The most pertinent point comes from applications where active response is desired. By having a large constellation of SAR satellites in various orbits and a multi-incidence capable neural network revisit and re-observation times can be pushed to the order of single days or even hours.

This provides an opportunity for a paradigm shift. The observer can move from a reactive role into an active role. Still active sites can be identified and intervention can be staged.

Label scarcity

One of the bigger challenges in training multi-temporal models is gathering the training data. Nowadays we are lucky to have multiple global land-use land-cover datasets such as the Esri LULC or the ESA LULC rasters. However these datasets typically have temporal resolutions in the order of years, while we need temporal frequencies in the order of days to annotate our image stacks properly.

We knew that we could not annotate enough data to build a robust model ourselves. Transfer learning would help, but there is no SAR ImageNet (yet…). We needed to take a different approach.

Forest-like features

For our baseline encoder we gathered a training set of 1266 single look complex (SLC) strip image 10s frames from tropical forests all around the world, with a footprint of approximately 30km x 70km each. We then trained a ~40M trainable parameter convolutional neural network to turn SAR images into forest — non-forest land cover maps by using the Esri LULC as training labels.

After this we had an encoder capable of turning the complex, high-dynamic range and high entropy SAR images into a smooth and limited posteriori probability map. In short we taught a network to convert SAR images into forest-like features. Half way there!

Multi-temporal predictions

The end product we want is not a forest-non -forest classifier. We’ll still need to teach the model to detect deforestation sites. For this step we’ll need high quality labels annotated at correct timesteps. We annotated 1608 deforestation sites ourselves to work as a proof of concept.

Spatio-temporal models via the ConvLSTM architecture

One recurrent problem in analyzing SAR imagery is how to handle the spatio-temporal nature of SAR data analysis. The information content available in a single pixel without any context is practically zero due to the phenomenon of speckle. Some of the most valuable SAR analytics rely on large stacks of satellite imagery and temporal models capable of analysing the information present, coherent or not.

It is also typical to multilook SAR images in either the spatial or temporal dimension to reduce the effects of speckle, using context to improve the signal to noise ratio.

In our work we wished to experiment with a spatio-temporal model capable of using both spatial and temporal context to handle stacks of SAR imagery. Simply adding a temporal dimension to a 2D convolutional neural network might be reasonable for low-depth SAR stacks. However we wished to use a model capable of analyzing stacks with tens or hundreds of images.

After a literature review we ended up doing an implementation of a ConvLSTM model. A LSTM (Hochreiter, Sepp, and Jürgen Schmidhuber. 1997) is an archetypal temporal model capable of analysing temporal sequences of up to hundreds of timesteps. A ConvLSTM (Xingjian, S. H. I., et al. 2015) is an extension of the LSTM model to the spatio-temporal domain via modifying the inputs to work via the convolution operator instead of the multiplicative operator. We were inspired by the work of (Rußwurm M, Körner M. Multi-Temporal Land Cover Classification with Sequential Recurrent Encoders. ISPRS International Journal of Geo-Information. 2018; 7(4):129) in our attempt to use for change detection in stacks of SAR images. A figure adapted from their work can be seen in Figure 5:

Figure 5 : ConvLSTM. Adaptation of a schematic of a ConvLSTM model by (Rußwurm M and Körner M, 2018), schematic modified to better represent our approach.

They used a bidirectional RNN, we use a single direction model. At each timestep an image patch is entered into the model. The model convolves the image patches while simultaneously propagating the resulting features forward in time. The inner states are also convolved together, giving the model outputs that are spatially more smooth than a corresponding pixelwise LSTM would be. Some example representations of ConvLSTM states can be seen in Figure 6:

Figure 6 : Inputs vs. gate, state and cell states by (Rußwurm M and Körner M, 2018). In their case the ConvLSTM had learned to ignore cloud coverage interrupting their time series.

However ramping up the model capacity of a recurrent neural network in an efficient way can be challenging. Too many recurrent layers might lead to numerical issues, and the model has a relatively high memory footprint per input patch pixel. To alleviate these issues we took the forest-encoder trained with land cover maps, froze its weights and attached it to a ConvLSTM-based model. We then trained the ConvLSTM to detect deforestation areas on stacks of SAR images.

Results

After all these steps we have finally created a model capable of turning stacks of multi-incidence angle SAR images into deforestation polygons

Figures 7 and 8: A segmented deforestation site outlied on a pair of downsampled SAR strip images.

Figure 10 : An overview of our test site. Red polygons mark deforestation detected during the imaging interval.

Now we had a deforestation detection system capable of monitoring hundreds of thousands of square kilometers of area by using a constellation of satellites. The next step was natural : we deployed it on a large scale, took the predictions, corrected them and fed them back to the machine as fresh training labels, creating a feedback loop of greater awareness of things happening in the world’s forests.