How to create a custom Dataset / Loader in PyTorch, from Scratch, for multi-band Satellite Images Dataset from Kaggle

Maurício Cordeiro
Analytics Vidhya
Published in
5 min readApr 3, 2020

--

Cloud Segmentation

Update

For information about the course Introduction to Python for Scientists (available on YouTube) and other articles like this, please visit my website cordmaur.carrd.co.

Introduction

In my last Medium story (here) I proposed an approach using the high level API Fast.ai to detect cloud contours in satellite images. Detecting object contours (i.e. all the pixels belonging to the same object) is called semantic segmentation. The dataset that was used for the task is the 38-Cloud: Cloud Segmentation in Satellite Image, from Kaggle.

Although we could achieve a relatively good accuracy of 96% with a few lines of code, the model was not able to consider all the input channels, Red, Green, Blue and NIR(Near Infrared) provided in the dataset. The problem is that most of the semantic segmentation models found in deep learning frameworks like Keras, Fast.ai and even PyTorch are designed to, and come with pre-trained weights, to work with RGB images. Besides that, the vision module of these libraries are also stuck to RGB files. That is the reason we ignored the NIR channel in the previous story and used only RGB patches.

--

--

Maurício Cordeiro
Analytics Vidhya

Ph.D. Geospatial Data Scientist and water specialist at Brazilian National Water and Sanitation Agency. To get in touch: https://www.linkedin.com/in/cordmaur/