Image preprocessing for segmentation: Part 1

Amartya Hatua
2 min readJul 5, 2022

--

Image segmentation is a critical and challenging topic. Recently, I tried it for the first time and found numerous blogs and articles available where the training and testing methods are described precisely with lots of examples and source code. However, there are not many references on data preprocessing for satellite image segmentation. So in this article, I describe how I approached the problem and formed the solution.

For image segmentation, we need two types of images, the satellite image; and the ground truth. Generally, these images come in .tif format. For training and testing both the images need to be divided into small patches. Hence, for every small patch or small satellite image, there will be a corresponding small ground truth image. Before dividing the images into small patches, we need to check the dimensions of the images, and whether they are the same or not. In most cases, the sizes differ. So the first job is to change the dimensions of the ground truth image and make it identical to that of the satellite image.

Satellite image and Corresponding ground truth image

Resize Image

The resizing of ground truth images can be done very easily using the gdal from the osgeo library, but here I faced a problem in reading the .tif file using gdal. So I took a longer route to resize the images using the rasterio library. I am providing the python function for both methods.

Using the rasterio library we are following a few steps: i) divide the image into bands; ii) save each of the bands as a temporary .tif file; iii) resize each of the temporary files and save them again [1]; iv) combine all the resized temporary files and get the final output file. One obvious critique of this method would be that it is far from optimized. Yes, this is not an optimized method. However, it gives an alternate solution to instances where the osgeo library does not work. Also, these steps help to visualize the intermediate images (GT_B1.tif, GT_B2.tif, etc.) and understand the underlying process.

In the second part, I’ll discuss the process to divide a large image into small patches. Please visit Part 2 of this series.

Reference:

[1] https://rasterio.readthedocs.io/en/latest/topics/resampling.html

--

--