Satellite Imagery Road Segmentation

6 min readApr 17, 2022

Introduction
Business Problem
Understanding the Data
Machine Learning problem
Data preprocessing
Modeling
Results
Deployment
References

1. Introduction

If you travel frequently, you are highly likely to use maps to navigate from point a to b. You may be surprised by the great extent to which maps can impact our day-to-day lives. A road map or street map is a map that primarily displays roads and transport links. for maps to be reliable, maps should be up to date with the ever-changing and ever-expanding road network.

The combined road length of our planet is about 33 million km (paved and unpaved) roads are paved and expanded every day so satellite imaging is extensively used in mapping roads.

2. Business Problem:

Since our road network is millions of km long It will take a substantial amount of manpower to manually map all the roads on our planet and it is virtually impossible to keep up with the ever-expanding road network. As roads are paved and expanded every day, automatically extracting roads from satellite images is crucial for keeping maps up-to-date. satellites can provide high-resolution topographical maps.

However, these data make roads difficult to identify as they look visually similar to rivers and railways. road extraction methods like segmentation performed by classical computer vision algorithms may not yield the best results as they are dependent on features extracted from the image. Deep learning, which is a subset of machine learning has shown a significant performance and accuracy gain in the field of computer vision compared to classical computer vision algorithms. So, we will be using deep neural networks to extract road information from ariel images.

3. Understanding the Data:

Source: https://www.kaggle.com/balraj98/massachusetts-roads-dataset

Image — aerial images of the state of Massachusetts. Each image is 1500×1500 pixels in size, covering an area of 2.25 square kilometers

Mask — masked image is created from the original image by assigning a different pixel value to the feature that is to be segmented from its surroundings.

metadata.csv contains the paths for the original image and the mask image.

label_class_dict.csv contains the RGB values of the features.

4. ML Problem:

Deep-learning segmentation algorithm trained on images and masks to segment out the road from the rest of the features present in the image using dice loss.

5. Data Preprocessing

Preprocessing: plotting images revealed that a handful of images were missing a portion of their data.

Incomplete images will degrade the performance of the model, so we will be removing images that are missing more than 10% of its data.

Function to find incomplete images

All the images are of size 1500*1500 resizing them to smaller dimensions will not preserve all the information of the original image, so instead images are cropped into smaller images (512*512)to preserve all the information.

Mask Extraction: Mask in our data is an RGB image but, the segmentation network is similar to how we treat standard categorical values, we’ll create our target by one-hot encoding the class labels — essentially creating an output channel for each of the possible classes.

6. Modelling

1. Network Architecture (U-net)

U-Net is a convolutional neural network originally developed for segmenting biomedical images. When visualized the architecture of U-Net resembles the letter U hence the name. U-Net consists of 2 two major parts, the left part is called the contracting path, and the right part is the expansive path.

Contracting Path

Each block in the contracting path contains two 3*3 convolution layers and a 2*2 max-pooling layer applied on top of it. Each block doubles the number of filters, so as we go down, the image depth doubles at each block, and feature size decreases due to the max-pooling layer. Essentially contracting path acts as a downsizer.

Now at the bottom of the network, there are two convolution layers without a max-pooling layer before connecting to the expansive path of the network.

Expansive path

The expansion section consists of several expansion blocks with each block passing the input to two 3*3 Conv layers and a 2*2 upsampling layer that halves the number of channels at each block.

It also includes a concatenation layer with the correspondingly cropped (56*56 is cropped from 64*64) feature map from the contracting path. The crop and concatenation step acts as a skip connection in each block carrying the information from the contracting path.

In the end, the 1*1 Conv layer is used to match the number of feature maps as same as the number of segments required in the output.

Note: Custom U-net was used to train this model

2. Training

Custom U-net model: U-net with 2 million parameters was used as segmentation model.

custom U-net architecture

Performance Metric (IoU Score): IoU measures the overlap between 2 boundaries. IoU score ranges from 0 to 1 which specifies the amount of overlap between the predicted and ground pixels in a segmentation task.

IoU of 0 denotes that there is no overlap between the boxes

IoU of 1 means that the union of the boxes is the same as their overlap indicating that prediction and ground truth are completely overlapping.

Loss Function (Dice Coefficient): The Dice coefficient is very similar to the IoU. They are positively correlated. The dice Coefficient also ranges from 0 to 1 Dice Coefficient is 2 * the Area of Overlap divided by the total number of pixels in both images.

Model Training: 512*512 images are used to train the model using DICE loss, 20% of the data is set for validation purposes. Model converges after 5 epochs.