Jumpstart Your Machine Learning Satellite Competition Submission

A comprehensive guide to getting started with the ETCI Flood Detection competition.

Published in

Floodbase

12 min readApr 30, 2021

*** Note if you find this tutorial helpful or the problem interesting check out our current openings at https://cloudtostreet.info/careers. We are always looking for passionate experts to join our growing team! ***

Introduction

The ETCI Flood Detection competition is an exciting new machine learning contest organized by the NASA Interagency Implementation and Advanced Concepts team. The goal of the competition is to detect flood events from satellite imagery. This challenge is particularly meaningful to us at Cloud to Street as our mission is to detect flooding events in near real time anywhere in the world in order to save lives and better direct assistance to those most affected. Therefore we would like to share our machine learning and remote sensing expertise to help participants do the best they can in this contest or just to learn more about the fascinating domain of machine learning. If you are interested in similar datasets, we recently published the Sentinel1Floods11 dataset which can be used to train water detection models on radar and optical satellite imagery. Check out our other ideas and efforts to make machine learning flood detection build climate resilience, including research how to incorporate crowdsourced data into flood segmentation models and using GANs to synthesize SWIR and segment floods on Planetscope.

You can download the dataset and find all of the code in this tutorial here: Google CoLab

Satellite Data

Now let’s dig into the dataset. The dataset consists of synthetic aperture radar (SAR) images taken over several regions experiencing a flooding event. For those unfamiliar with this type of satellite imagery, SAR images are acquired with an active microwave sensor, which is boarded on a satellite. SAR systems transmit pulses of electromagnetic waves towards the Earth surface and record the corresponding echoes. This differs from optical satellite imagery, which captures the reflected solar light and can provide an image of the Earth’s surface only in the daytime and in the absence of clouds.

Since SAR satellites carry an active sensor, they can image at both, day and night, and in almost all weather conditions. This is especially interesting to map flooding events, since large clouds hover over the scene and become, in practice, transparent to the electromagnetic waves, making possible to perform real time flood detection. SAR imagery looks however different to images from Google Earth. This is because they contain information of the reflected electromagnetic energy, resulting in grayscale images to the human eye.

If we grab a random image from the dataset we will find a pair of images with the same date and tile number named “<region>_<datetime>*_x-*_y-*_<vv | vh>.png”. The VV and VH postfix denote the polarization state of transmitted and received radar signals. VV stands for vertically polarized transmitted radar and vertical polarized received radar. VH stands for vertically polarized transmitted radar and horizontally polarized received radar. Normally the values of SAR images vary greatly but the team at NASA has already pre-processed and scaled the images to integer values between 0–255 making them easy for us to work with. The creators of this dataset have also tiled (aka cropped) the large original images into a more manageable 256x256 pixels2 size images. The images were collected from several regions around the world including Nebraska USA, Alabama USA, Bangladesh India, Red River North USA, and Florence Italy. Images of regions were captured over multiple days. Let’s take a look at some of these images:

VV and VH Sentinel 1 images from Bangladesh region

VV and VH Sentinel 1 images from Alabama region

VV and VH Sentinel 1 images from Nebraska region

VV and VH Sentinel 1 images from Florence region

It is often difficult for an untrained eye to detect water from these grayscale images so it is common practice to combine the grayscale images into a more informative colorspace. We stack the two images together and then create a new grayscale image using a combination of the first two. The combination or ratio image is created as follows:

Or in Python terms:

The final three channel RGB images are blue where there is water making it easy for even those unfamiliar with this type of image to see water.

Generated RGB images from Bangladesh region

Generated RGB images from Nebraska region

Generated RGB images from Alabama region

Generated RGB images from Florence region

Annotation Data

Now that we have a good grasp of what the data looks like let’s analyze what the machine learning model will be asked to create. In the end, the model will take an input image similar to the previous section and produce an image of the same size with binary values of 0 and 1 corresponding to “not a flood region” or “flood region”. The creators of this challenge give us a flood mask for every image in the training set and a water body mask for all training and validation images. Let’s take a look at some of the RGB images, flood masks, and water body masks:

RGB, Flood mask, and Water Body mask for image in Alabama region

RGB, Flood mask, and Water Body mask for image in Nebraska region

RGB, Flood mask, and Water Body mask for image in Bangladesh region

We can see that the flood region labels correspond to blue/brown regions in our RGB images and green regions correspond to land.

Downloading and Formatting the Dataset

Downloading the competition dataset

To access the dataset we need to first sign in or create a CodaLab profile. Once you have an account you must apply to the competition. After receiving confirmation (should be less than a day), can then download the files via a Google Drive link under “Participate → Get Data”. If you are using a remote system and want to download the file directly you can use the gdown package. In order to follow along with the tutorial, we recommend that you create a folder to place both the training and validation datasets in.

Alternatively you can follow our Google CoLab file to download the dataset, train a segmentation model, and create a submission file.

Creating dataframes

Now that we have the dataset downloaded we want to format the data so that getting an image and all of its additional information is quick and simple. A popular framework for organizing datasets is the DataFrame from the Pandas library. We can use this data structure for many purposes including finding an images flood mask or finding all images for a particular region. For the purposes of this tutorial we will use the DataFrame class as a readily accessible Excel sheet but feel free to learn more about the awesome capabilities of this data structure. We will construct separate DataFrames for the training and validation sets.

Let’s create the training DataFrame. Note, this section may be confusing so you can easily just run the code in Colab copy and paste this code and move to the next section. We start by finding all of the VV image paths, extracting the filenames, and finding the regions that they belonged to.

Next, we will cycle through these paths to find the corresponding VH image, water body mask, flood mask, and region name. In each for loop, we adjust the file paths and use the VV image name to find the corresponding path and place it in a particular list object.

Finally we create a dictionary which will be used to create the Dataframe. Keys will be the titles of our DataFrame’s columns and the values will be the columns. You can get the size of the dataframe by train_df.shape and see the first five rows of data with train_df.head().

The validation set is almost the same but since it doesn’t contain flood masks we don’t create that column of the DataFrame. Please note in the last line we sort the columns, this will be very important for submitting results.

Creating training and development splits

The validation set doesn’t contain any labels so we can’t use it quantitatively to measure how well our trained model is performing. Therefore we create a development set by splitting the training set. Note, the exact definition of the term “development set” varies in the field of machine learning but you can think of the development set in this case as a “validation set” and the actual validation set as a “test set”. There are many ways of splitting a dataset into a training and development set but to match a real life scenario we are going to split the training set by regions. The training set contains three regions: Nebraska, Alabama (northal), and Bangladesh. We randomly select one region for the development set and leave the rest in the training set.

Since we recorded what region each image belongs to we can easily filter the DataFrame to select all images from a particular region and create new training and development DataFrames.

Creating dataset class

At Cloud to Street we use PyTorch as our deep learning library so we will use it for this tutorial as well. All modern deep learning libraries (PyTorch, TensorFlow, Keras, MXNet, etc.) are similarly powerful but many researchers and engineers enjoy PyTorch for its ease of use and dynamic model interaction.

With that being said, we need to create a PyTorch Dataset class in order to pass the data to a model later. You can follow this official PyTorch tutorial to learn even more about this. To create a custom PyTorch Dataset object we create a child class based on PyTorch’s Dataset class. We need to overwrite the __init__, __len__, and __getitem__ methods.

The __init__ method is where we will load the dataset which is just loading the DataFrame. The __len__ method just returns the total number of VV and VH pairs we have in the dataset. The __getitem__ method receives an index of the dataset and loads the example for the model. In this method we will be loading the images into memory, normalizing them from 0–255 to 0–1, combining the VV and VH images into an RGB image, and finally applying transformation functions before returning the example.

For those unfamiliar with image transformations, we slightly perturb the data each time it is loaded so that the model sees slightly different versions of the images. This leads to less overfitting and better validation performance generally. Applying transformations to segmentation tasks is slightly more complicated compared to classification because we need to apply the same transformation to both the input image and the mask. We will use the Albumentations library for transformations.

We will add transformations for the training set but leave the development set without transformations. We do this so that we can consistently see how our model is progressing during training.

PyTorch uses a special class called a DataLoader to efficiently load examples from our Dataset class using parallel processing. We can set parameters such as batch size, number of parallel processes (num_workers), and order shuffling.

Creating a Semantic Segmentation Model

We just learned how to load images and make them ready to be put into a deep learning model, now let’s build our model. There are thousands of different semantic segmentation models and you can even create your own, however we will use the tried and true UNet architecture developed by Ronneberger et al. This network will take our input RGB image, process it, and then construct a segmentation mask of the same size as our input. To get us started quickly, we will use the Segmentation Models PyTorch package to get our UNet model.

We have a couple of options to choose from for this model. The most important for us to get started is the in_channels and class arguments. Since we converted our images into a three channel image (RGB) we select 3 in_channels. Our model will predict either flood or no-flood in the image hence our number of classes is 2.

For efficient model training you should find a GPU to train your model. We will assume that you either have one or are using Google CoLab and selected a GPU instance. In PyTorch we need to load the model and data into GPU memory for fast computation. You can do this simply as so:

device = 'cuda' 
model.to(device)

If you don’t have access to a GPU you can still play with this code but we would recommend against training your model as it would take a very long time. To run this code without a GPU you only need to set the device = ‘cpu’ instead.

device = 'cpu'

Training the Model

We have our data formatted and our model loaded, let’s train a model! To train a model we need to know how to update it’s weights and how to track the performance of the model. We will use the built in Adam optimizer and CrossEntropyLoss.

The main parameter for our training loop is the number of times we show our model the entire training dataset, this is called an epoch. In the training loop we load a batch of images and labels, load them into GPU memory, pass the images through our model, compute the loss between the models output and ground truth mask, update the model weights, and then compute the metrics. We essentially do the same thing in the validation or development loop but do not update the model weights. Note, the tqdm function gives us a progress bar when training.

We can track the progress of the model by following the loss and other metrics such as the mean intersection over union metric (mIoU). This metric gives us an idea of how well our model’s predictions are overlapping the ground truth labels. We compute the global average mIoU over all images in the sets.

Testing the Model

After we train the model for however many epochs you decide, you will want to evaluate the model on the validation set that is judged in this competition. First we create our validation dataset object without transformations. Then we create a final validation loop very similar to the previous development loop.

What’s different in this loop is that we are tracking both the models predictions and input to the model. We want to keep track of these so that we can submit our final predictions for the competition but also to see qualitatively (remember we don’t have the flood masks) how well our model does on data it has not seen previously.

After the model has gone through all of the data in the validation set we convert our lists of predictions and inputs into numpy arrays:

View our results

Let’s see how well the model is trained after 5 epochs by visualizing the input and output of the model.

We can see that the model appears to be doing pretty well already so far even only running 5 epochs. The predictions appears to be a bit conservative when predicting a flood event. We are sure that you can beat this result and we are excited to see you do that!

Submitting our Results

Once we are happy with our model’s final predictions, we can submit the results to CodaLab and have them scored. According to the instructions of the competition we need to save our binary result images as a zip file. Since there is no easy way to do this in Python (if there is please let us know!), we will save our predictions with np.save() and then zip the file from the command line. Note, we need the following arguments set correctly to ensure backward compatibility with Python 2.7 (which appears to be used to score the results).

If you are using a linux based like Ubuntu then you should be able to zip the file from Python:

The final submission.zip file is ready to be uploaded and scored!

Extra: Exploring the Noise in the Dataset

Some of you may have noticed irregular images when exploring the dataset like us. It appears that some tiles at the edge of the collected SAR images contain no information and have not been removed. These images contain all 0 or 255 values without any terrain features. Let’s filter out these images from the training set! We take our useful training DataFrame and find all of the VV image paths (we use VV images but VH images appear to have the same values in noisy cases). We then load each VV image and check the unique values in the image, if they only contain 0 or 255 values we mark their index down.

All images with tracked indices are removed from the training DataFrame, leaving only images with at least some terrain data (turns out to be 75% of the data! A neat experiment is to see if removing these images improves the models performance.