A simple cloud-detection walk-through using Convolutional Neural Network (CNN and U-Net) and Fast.ai library

Published in

Analytics Vidhya

6 min readMar 18, 2020

Update

For information about the course Introduction to Python for Scientists (available on YouTube) and other articles like this, please visit my website cordmaur.carrd.co.

Introduction

Neural nets can be both, threatening and "passionate" at the same time. The more you learn and the more you go deep into it, you unleashes its power and a whole bunch of new ideas and applications come to our minds. Environmental applications are amongst those subjects that attracts more attention daily and remote sensing is one powerful tool for researchers, students and policy makers to understand environmental processes.

But, how could we bridge between neural nets and remote sensing into something useful? A quick research on the internet will show you up many ideas. The purpose of my first Medium publication, is to provide a simple introductory notebook on using Convolutional Neural Networks (CNNs) to segment clouds from satellite images.

The Dataset

Its not my objective to teach the concepts of CNNs. For that you could rely on the excellent fast.ai course Practical Deep Learning for Coders. One important part of any neural network is the training phase, where we should come up with examples of input data (namely X or inputs) and corresponding answers (Y or targets) so the computer can truly “learn” what we are trying to teach. In this project we will be using the 38-Cloud Segmentation in Satellite Images dataset available in Kaggle (Figure 1).

Figure 1 — Kaggle dataset ( https://www.kaggle.com/sorour/38cloud-cloud-segmentation-in-satellite-images)

The dataset is composed of satellite scenes cropped into 384x384 patches (suitable for deep learning purposes). In total, there are 8400 patches for training and 9201 patches for testing, separated in directories for the Red, Green, Blue and NIR (Near Infrared) bands and an additional directory to store the reference mask (ground truth — *_gt). The structure is shown in Figure 2.

Data Pre-processing

In order to get the data ready for our neural network, we need first to pre-process data to fit our model. As we will be using a pre-trained ResNet architecture, we can only fit 3 bands into the model, and preferably RGB, as the model has been pre-trained with these bands. Thus, the first step for data pre-processing is to create RGB patches from the given images. We will do this using the PIL (Python Imaging Library) to open the red, green and blue .tif images and save them as a single RGB .png file. The following code does that (make sure to have the same structure as the original dataset) and also normalizes the images.

Also, we still need to convert the ground truth images to .png and convert it to values 0 (no cloud) and 1 (cloud) and store them into a folder called ‘labels’.

After these steps, we should have 8400 rgb patches and the same amount for ground truth in the train_rgb and labels folders respectively. To meet the fast.ai requirements, we should organize our data into data/images (rename the train_rgb folder to images) and data/labels manually.

Creating the Data Loader

As deep learning framework we will be using the Fast.ai library that is a high level API on the top of PyTorch. To install it, following its documentation, simply type:

conda install -c pytorch -c fastai fastai

Before creating a NeuralNet model, we need to prepare a data loader, that will handle the access to the file system, separate it into batches, and apply transformations (to produce data augmentation) if necessary.

First of all, we will import the library and point it to our data to check if everything is fine.

The expected result is “8400 8400”.

After that, we will create a function to map images to respective masks and test it with open_image and open_mask functions that load them into tensors. The result is shown in Figure 3.

Figure 3: Image and corresponding mask, already loaded as tensors.

Already with the mapping function, we can create a Fastai data bunch. For our purpose, instead of using the entire set of 8400 images as training, we will let the library divide them into training (80%) and validation (20%) sets. Also, we will use some data augmentation. Data augmentation is a technique to increase the number of training samples by applying some random transformations like rotation, flipping, warp, and others.

And here is our data loader (or data bunch):

ImageDataBunch;

Train: LabelList (6720 items)
x: SegmentationItemList
Image (3, 384, 384),Image (3, 384, 384),Image (3, 384, 384),Image (3, 384, 384),Image (3, 384, 384)
y: SegmentationLabelList
ImageSegment (1, 384, 384),ImageSegment (1, 384, 384),ImageSegment (1, 384, 384),ImageSegment (1, 384, 384),ImageSegment (1, 384, 384)
Path: D:\DeepLearning\38cloud_images\38-Cloud_training\data\images;

Valid: LabelList (1680 items)
x: SegmentationItemList
Image (3, 384, 384),Image (3, 384, 384),Image (3, 384, 384),Image (3, 384, 384),Image (3, 384, 384)
y: SegmentationLabelList
ImageSegment (1, 384, 384),ImageSegment (1, 384, 384),ImageSegment (1, 384, 384),ImageSegment (1, 384, 384),ImageSegment (1, 384, 384)
Path: D:\DeepLearning\38cloud_images\38-Cloud_training\data\images;

Test: None

The Data Bunch keeps track of the samples and respective labels and, in the case of image segmentation, also merges both for a fast visualization. We can test it using “data.show_batch(2)” for example (Figure 4):

The Model

Now that we have the Data Bunch ready, lets create the model. For image segmentation, the standard model is the U-Net. The U-Net architecture has a contracting path, in which it works as a normal CNN model, downsampling the image into several features and then it follows the inverse path, upsampling until the original resolution. More information concerning the U-Net architecture can be found in the deeplearning.net tutorial.

We will use a pre-trained ResNet 34 version of the U-Net, that has 34 layers in the contracting path. To create it, we will define a accuracy function, to measure the performance of the mode, the weight decay (regularization to avoid overfitting of the model) value and the learning rate (rate that will be multiplied to the gradient to adjust parameters during back-propagation step).

And here are the results. We got 96% of accuracy using only RGB bands and 80% of the training set and training for 3 epochs (Figure 5). That took 45 min on my notebook 6Gb NVIDIA GPU.

Figure 5: Accuracy for 3 epochs of training.

After training we can save the model and test its accuracy visually on the validation set.

Conclusion

Hope with this walk-through you can have a brief introduction to working on satellite images with deep learning approach. We can see, from this simple experiment, that we are able to achieve really good results using the U-Net, and using only the Red, Green and Blue bands to detect clouds from remote sensed images.

My next two stories explain how to create, from scratch, a dataset and a simple U-Net model in PyTorch to do the segmentation using the 4 channels available on this dataset. The stories are:

The full notebook with this code is available here. Hope you have enjoyed and feel free to post doubts or comments. Also, don’t forget to take a look at the new stories.