Satellite Imagery Road Segmentation

Nithish
6 min readApr 17, 2022

--

segmentation

Table of contents

  1. Introduction
  2. Business Problem
  3. Understanding the Data
  4. Machine Learning problem
  5. Data preprocessing
  6. Modeling
  7. Results
  8. Deployment
  9. References

1. Introduction

If you travel frequently, you are highly likely to use maps to navigate from point a to b. You may be surprised by the great extent to which maps can impact our day-to-day lives. A road map or street map is a map that primarily displays roads and transport links. for maps to be reliable, maps should be up to date with the ever-changing and ever-expanding road network.

The combined road length of our planet is about 33 million km (paved and unpaved) roads are paved and expanded every day so satellite imaging is extensively used in mapping roads.

2. Business Problem:

Since our road network is millions of km long It will take a substantial amount of manpower to manually map all the roads on our planet and it is virtually impossible to keep up with the ever-expanding road network. As roads are paved and expanded every day, automatically extracting roads from satellite images is crucial for keeping maps up-to-date. satellites can provide high-resolution topographical maps.

However, these data make roads difficult to identify as they look visually similar to rivers and railways. road extraction methods like segmentation performed by classical computer vision algorithms may not yield the best results as they are dependent on features extracted from the image. Deep learning, which is a subset of machine learning has shown a significant performance and accuracy gain in the field of computer vision compared to classical computer vision algorithms. So, we will be using deep neural networks to extract road information from ariel images.

3. Understanding the Data:

Source: https://www.kaggle.com/balraj98/massachusetts-roads-dataset

Image — aerial images of the state of Massachusetts. Each image is 1500×1500 pixels in size, covering an area of 2.25 square kilometers

Mask — masked image is created from the original image by assigning a different pixel value to the feature that is to be segmented from its surroundings.

metadata.csv contains the paths for the original image and the mask image.

label_class_dict.csv contains the RGB values of the features.

4. ML Problem:

Deep-learning segmentation algorithm trained on images and masks to segment out the road from the rest of the features present in the image using dice loss.

Dice Loss

5. Data Preprocessing

Preprocessing: plotting images revealed that a handful of images were missing a portion of their data.

Image with a missing chunk and its mask

Incomplete images will degrade the performance of the model, so we will be removing images that are missing more than 10% of its data.

Function to find incomplete images

All the images are of size 1500*1500 resizing them to smaller dimensions will not preserve all the information of the original image, so instead images are cropped into smaller images (512*512)to preserve all the information.

Full size and cropped images

Mask Extraction: Mask in our data is an RGB image but, the segmentation network is similar to how we treat standard categorical values, we’ll create our target by one-hot encoding the class labels — essentially creating an output channel for each of the possible classes.

6. Modelling

1. Network Architecture (U-net)

U-Net is a convolutional neural network originally developed for segmenting biomedical images. When visualized the architecture of U-Net resembles the letter U hence the name. U-Net consists of 2 two major parts, the left part is called the contracting path, and the right part is the expansive path.

U-net Architecture

Contracting Path

Each block in the contracting path contains two 3*3 convolution layers and a 2*2 max-pooling layer applied on top of it. Each block doubles the number of filters, so as we go down, the image depth doubles at each block, and feature size decreases due to the max-pooling layer. Essentially contracting path acts as a downsizer.

Now at the bottom of the network, there are two convolution layers without a max-pooling layer before connecting to the expansive path of the network.

Expansive path

The expansion section consists of several expansion blocks with each block passing the input to two 3*3 Conv layers and a 2*2 upsampling layer that halves the number of channels at each block.

It also includes a concatenation layer with the correspondingly cropped (56*56 is cropped from 64*64) feature map from the contracting path. The crop and concatenation step acts as a skip connection in each block carrying the information from the contracting path.

In the end, the 1*1 Conv layer is used to match the number of feature maps as same as the number of segments required in the output.

Note: Custom U-net was used to train this model

2. Training

Custom U-net model: U-net with 2 million parameters was used as segmentation model.

custom U-net architecture

Performance Metric (IoU Score): IoU measures the overlap between 2 boundaries. IoU score ranges from 0 to 1 which specifies the amount of overlap between the predicted and ground pixels in a segmentation task.

IoU of 0 denotes that there is no overlap between the boxes

IoU of 1 means that the union of the boxes is the same as their overlap indicating that prediction and ground truth are completely overlapping.

IoU

Loss Function (Dice Coefficient): The Dice coefficient is very similar to the IoU. They are positively correlated. The dice Coefficient also ranges from 0 to 1 Dice Coefficient is 2 * the Area of Overlap divided by the total number of pixels in both images.

Dice Coefficient

Model Training: 512*512 images are used to train the model using DICE loss, 20% of the data is set for validation purposes. Model converges after 5 epochs.

7. Results

The test set contains about 1500 images and our model, can predict their segmentation maps in almost no time using GPU with an IoU score of 73%.

8. Deployment

Flask API is used to host the application in a local machine and ngrok is used to tunnel localhost into the public network

deployment

9. References

--

--