The pains of classifying flooded forests in satellite data
About a tricky detection use case — from weeks of data pre-processing to training 2 CNNs; and why the answer might be in infrared band data.
Too much water is destroying Latvian forests
Forests cover 52% of Latvia’s territory, which makes it one of the most forested countries of the European Union. This translates to 3.8 million hectares of forest cover. There has been an increase of forest area since the beginning of the 20th century, from 27% to 52%, increasing the area of damaged forest from fires, bark beetles or floods correspondingly. Excess water is the 3rd most common cause for forest damage in Latvia. It occurs after heavy rainfall or is even caused by beaver dams that end up affecting the flow of water.
The goal of the AI for Earth 2 — Forest Health Challenge was to detect flooded forests in 2 monitored areas in Latvia — 21157 hectares of Saldus and 2276 hectares of Kalsnava area.
Training a computer vision model to see through the foliage from space
The goal for our team of AI engineers was to build a machine learning model that can detect excess water in forests from satellite data to alert forestry services. But is it possible to see wet forest ground through leaves from space? We assumed the real catch of this Challenge might be on the side of data more than in modelling. Did we find a dataset good enough to train detection models with sufficient accuracy?
A wealth of remote sensing data
The data available for this challenge were satellite images as well as field data provided by our Challenge partner ForestRadar and the State Forest Research Institute “Silava”. The satellite imagery was taken from Sentinel-1 and Sentinel-2 satellites. We also had orthophoto imagery along with lidar data from drones at our disposal. Moreover, we got our hands on the Digital Terrain Model (DTM) and Canopy Height Model (CHM) for the areas of interest.
Vector-based radar data that defined our areas of interest and flooded areas for each year was used as ground truth. The images used were in .tif format to go with QGIS and FME programs, in later stages with Python for modelling.
Due to the limited time we had for the Challenge we had to select the dataset we’d focus on. We decided to use the Sentinel and orthophoto images along with the radar data. The Sentinel images have a maximum of 10m resolution while the orthophoto images have a 40cm resolution, making them heavier and much more detailed. The latter do not have any problems with clouds unlike the Sentinel-2 images.
GIS-based pre-processing
The most time consuming part of the Challenge was the collection of appropriate imagery and the preparation of the data to be used in modelling. Damaged area polygons, such as the ones shown in the image below, were provided along with the year that they were flagged.
Time stamping the flooding events was crucial, without it wouldn’t be possible to find images ‘before’ and ‘after’ the flood event. Another hurdle we faced was the fact that Sentinel-2 images needed to have small amounts of cloud coverage and that had to be checked manually. These constraints took away time available for image collection and eventually, it reduced the amount of data for model training and validation.
Steps taken in data pre-processing of Sentinel-2 images:
- Split Saldus area of interest (AOI) in half (Saldus 1, Saldus 2), the produced Sentinel images would be otherwise too large. The first half of Saldus AOI was of size 1439x1203 pixels and the second half was of size 1442x1207.
- Download of appropriate imagery from 2016 to 2021 for all AOIs. The images were downloaded through Sentinel-hub and saved in Natural Color, with the 3 corresponding RGB bands.
- Creation of masks for damaged areas for each year.
- Upload of each image (in .tif format) and its corresponding mask (in .npy format).
Steps taken in data pre-processing of orthophoto images:
- The area of interest was divided into 79 tiles. To reduce the weight of the dataset we only kept the tiles that contained any flood damage, as shown in the image below. This reduced the number of tiles to 51.
- The images that were saved had four bands, RGB and Near-Infrared. Since they were 6250 x 6250 pixels big, we created smaller chips of the images.
- We created and uploaded the 224 x 224 px images of healthy and damaged forests to be used as model inputs.
Flood inundation map for Saldus
This is where radar stepped in to serve as ground truth. Sentinel-1 SAR GRD data was used for the Saldus area of interest. We followed these steps in the process:
- Each Sentinel-1 image downloaded had 3 bands: VV (vertical/vertical), VH (vertical/horizontal) and angle.
- We applied a speckle filter to reduce noise (RefinedLee filter).
- We applied a threshold to create a difference image by dividing after and before images. The threshold was of value 1.25.
- We masked out pixels that have permanent water for more than 5 months by using Global Surface Water (GSW) dataset.
- We’ve created masks of an area with a slope greater than 5 degrees.
The resulting image of a calculated flooded area for Saldus AOI is shown below.
Using convolutional neural networks to solve the problem
We wanted to classify the pixels of our images in one of two classes — ‘Flood’ or ‘No Flood’. We decided to use 3 models to solve this problem.
- XGBoost algorithm served as baseline
- U-Net CNN
- Inception-v4 CNN
Both CNNs were pre-trained in the Image-net dataset. The desired result of our models would be to correctly classify each pixel as flooded or not.
XGBoost
Extreme Gradient Boostingis a supervised machine learning algorithm used to train a model in order to find patterns in a dataset with labels and features. This type of algorithm is not meant to be used for image classification. In contrast to a CNN, it takes each piece of data as independent. Nevertheless, we used it to classify each pixel individually and took it as a benchmark for the rest of our models. We trained the XGBoost with the Sentinel-2 images and the corresponding masks to obtain the classification for each pixel as flooded or non-flooded.
U-Net
U-Net is a Convolutional Neural Network (CNN) that was developed for biomedical image segmentation. The network consists of a contracting and an expansive path, which give it the U-shaped architecture. The U-Net is meant to solve pixel-wise classification problems, segmenting the image and classifying each part. The prediction is a mask for the image. We first trained the U-Net with the Sentinel-2 images and the corresponding masks and later tried the same with the orthophoto images.
Inception-v4
An Inception network is a Convolutional Neural Network (CNN) which consists of repeating patterns of convolutional design configurations called Inception modules. It has the ability to extract features from data of varying scales through the utilisation of varying convolutional layers. This CNN classifies the entire image, appointing one label to each image.
At first, we didn’t think about using this type of model since we thought it worked better with semantic segmentation. But since the performance of the U-Net on the orthophoto dataset was low; we wanted to try out a different approach. We believe this could be attributed to the resolution of the images, showing only a small portion of the surface.
To create images and masks suitable for the CNNs, subset images were created. For each AOI and for each year, subset images and their corresponding masks were created around a damaged area. This was represented by a polygon as can be seen in the figure below.
The same procedure was applied to the orthophoto images. Subset images of 224x224 px size were created for the model input. For the Inception-v4, some of the orthophoto images were separated into “damaged” and “healthy”, since this model does not need masks.
Model results showed the need for exploring further paths
The evaluation metrics we used were:
- Intersection Over Union (IOU)
- Confusion matrix
- Classification accuracy of the model
The results derived from the models are presented in the tables below.
For the Sentinel-2 data, we started setting our benchmark with the XGBoost model getting around 66–67% classification accuracy. This model was not meant for image classification, it takes each pixel independently and classifies if as flooded or not flooded. The U-Net improved the performance with a 71% IOU.
When we trained the U-Net with the orthophoto images, we obtained a significantly poorer performance (55% IOU) than with the Sentinel-2 images. With the Inception-v4 the performance improved, getting an accuracy of 74%. It’s important to mention here that the dataset was balanced but relatively small. Increasing the amount of images would get us a more precise result.
Improving performance with near-infrared band
Due to time constraints during the AI for Forest Health Challenge, we could only use satellite images that had 3 bands (RGB). A promising follow-up would be to add a Normalised Difference Water Index (NDWI), which is a satellite-derived index from the Near-Infrared (NIR) and Short Wave Infrared (SWIR) channels. Near infrared band is reflected by vegetation. This index is strongly related to the plant water content, which makes it a very good proxy for plant water stress. Another layer to be added could be a Normalised Difference Vegetation Index (NDVI), which quantifies vegetation by measuring the difference between near-infrared and red light (which vegetation absorbs).
Another possible approach could be to limit the training of the algorithms to forested areas with floods, i.e. to areas that do not have other types of flooded terrain.
Not seeing the forest for the trees
Literally. Identifying flooded areas under tree foliage proved to be very difficult. We’ve also learned that working with geospatial data poses many challenges — from the manual inspection of images all the way to the size of the whole dataset. In this challenge we did our best to acquire suitable satellite and orthophoto images in order to classify their pixels as flooded or not. The results show some promise, especially considering that more images could always be added to the model training phase, as well as more epochs to the CNNs.
We hope the Challenge will be followed up with some of the suggested next steps and experimentation. Monitoring the changes in Latvian forests is crucial for their health. Especially since human activities and climate change have stressed the ecosystem even more and made the preservation of forests and wildlife more difficult.
Enias Vodas & Julieta Millán
AI for Earth engineers
AI for Forest Health Team: Deepali Bidwai, Mohammad Alasawdah, Tim De Craecker, Julieta Millán, Enias Vodas