Dstl Satellite Imagery Feature Detection

Can you train an eye in the sky?

Lokesh Kumar Gupta
TheCyPhy
13 min readDec 11, 2020

--

“If We Want Machines to Think, We Need to Teach Them to See” Fei-Fei Li

Throughout this article, I will discuss an image segmentation problem such as in Dstl Satellite Imagery Feature Detection and how can it be solved using deep learning.

The topics that will be discussed in this tutorial are:

  1. Business Problem.
  2. Source of data.
  3. Deep Learning Problem Formulation.
  4. Data preprocessing.
  5. Exploratory Data Analysis and observation.
  6. The existing approach to the problem.
  7. My first cut approach to solving the problem.
  8. Models explanation.
  9. Error analysis.
  10. Comparison of the models in tabular format.
  11. Deployment.
  12. Future work.
  13. References.

1. Business problem:

1.1 Overview:

The significant increase of satellite imagery has given a radically improved understanding of our planet. It has enabled us to better achieve everything from mobilizing resources during disasters to monitoring the effects of global warming. Due to the recent advancements in computer vision and deep learning, along with important improvements in low-cost high-performance GPUs, the Object recognition in aerial imagery enjoys growing interest nowadays.

The possibility of accurately identifying different kinds of objects in aerial pictures, like buildings, roads, vegetation, and different classes, might greatly facilitate in several applications, such as creating and keeping up-to-date maps, environment monitoring, disaster relief, and improving urban planning. This domain also offers scientific challenges to the computer vision.

These types of complex and large datasets are increasing exponentially in number, so the Defence Science and Technology Laboratory (Dstl) is seeking novel solutions to reduce the burden on their image analysts. In this problem, we have to accurately classify features in overhead imagery. Automating feature labeling will help in smart decisions more quickly around the defense and security but also bring innovation to computer vision methodologies applied to satellite imagery.

1.2 Objective :

In this problem, Dstl provides 1km x 1km satellite images in both 3-band and 16-band formats. The objective is to detect and classify the types of objects found in these regions.

In other words semantic segmentation of different classes in satellite imagery.

1.3 Constraints:

  • Maximum Jaccard Index.
  • Doesn’t need so much interpretability.
  • low latency concerns.

2. Source of Data:

The DSTL posted the above problem on the Kaggle platform. Here is a link below to download the datasets.

Link: Dstl Satellite Imagery Feature Detection.

2.1 Data overview:

We have been provided with the following CSV files:

  1. train_wkt.csv — The WKT format of all the training labels. It consists:
  • ImageId — ID of the image.
  • ClassType — Type of objects (1–10).
  • MultipolygonWKT — The labeled area, which is multipolygon geometry represented in WKT format.

2. three_band.zip — The complete dataset of 3-band satellite images.

3. sixteen_band.zip — The complete dataset of 16-band satellite images.

4. grid_sizes.csv — The sizes of grids for all the images. It consists:

  • ImageId — ID of the image.
  • Xmax — maximum X coordinate for the image.
  • Ymin — minimum Y coordinate for the image.

5. sample_submission.csv — The sample submission file in the correct format. It consists:

  • ImageId — ID of the image.
  • ClassType — Type of objects (1–10).
  • MultipolygonWKT — The labeled area, which is multipolygon geometry represented in WKT format.

6. train_geojson.zip — The geojson format of all the training labels.

  • For each image, we are given three versions: grayscale, 3-band, and 16-band. Details are presented in the table below:
  • Object Types: We have to classify objects into one of these classes.
  1. Buildings — large building, residential, non-residential, fuel storage facility, fortified building.
  2. Misc — Manmade structures.
  3. Road.
  4. Track — poor/dirt/cart track, footpath/trail.
  5. Trees — woodland, hedgerows, groups of trees, standalone trees.
  6. Crops — contour ploughing/cropland, grain (wheat) crops, row (potatoes, turnips) crops.
  7. Waterway.
  8. Standing water.
  9. Vehicle Large — large vehicle (e.g. lorry, truck, bus), logistics vehicle.
  10. Vehicle Small — small vehicle (car, van), motorbike.

3. Deep Learning Problem Formulation:

  • The above problem is an image segmentation problem, where input is an image and output is also an image and can be solved using Deep Learning techniques to detect and classify the types of objects found in a particular region.
  • The Convolutional Neural Networks (CNN) is the main supervised approach that can be used for this task. U-Net is specially designed to effectively solve image segmentation problems. The most successful state-of-the-art deep learning method the Fully Convolutional Network (FCN) can be used for this task.

3.1 Performance Metric:

The performance metric is chosen for this problem which is the Average Jaccard Index. This is a vector-based metric where we use polygon geometries to evaluate how well your predictions are aligned with the answer.

The Jaccard Index for two regions A and B, also known as the “intersection over union”, is defined as:

where TP is the true positives area, FP is the false positives area, and FN is the false negatives area.

4. Data preprocessing:

  • First, combine all train images into a large X array and their masks into a large Y array and saved them into the disk.
  • The masks for each image is calculated and shown in below code snippets.
  • We used only train images and produced more images using augmentation. We did random crops of size 160*160 for both images and masks. we split the train images into train data, validation data, and test data.
  • We took nearly 8K train images and their masks(6K from augmentation and 2k from non-augmentation), 1.6K validation images and their masks(1.2K from augmentation and 0.4K from non-augmentation), and 1.6K test images and their masks(1.2K from augmentation and 0.4K from non-augmentation).
  • Below is the code snippet for random crops.

5. Exploratory Data Analysis and its observation:

To solve the problem through machine learning, first, we have to understand the data which is an important approach is known as Exploratory data analysis. This provides insights into data and the advanced exploration of the data in different visualizations.

5.1 Reading the data

5.2 Basic information about data

Observations:

  1. There are 25 unique images in the training data.
  2. Trees are present in mostly all the given images.
  3. Man-made structures and buildings are present in half of the images.
  4. The waterway is present only in one image.

5.3 Areas of the objects in training images

Observations:

  1. There are 25 unique images in the training data.
  2. Crops occupied the largest area in an image.
  3. After crop class trees and tracks classes occupied a large area respectively.
  4. Buildings are present in 11 of the 25 training images.
  5. Water and vehicle classes are present in very less quantities.

5.4 Plotting train images

5.5 Plotting images from wkt multipolygon objects

5.6 Display an image and its masks

Observations:

  1. Training labels are in wkt format(Multipolygons) and defines different objects in the training images.
  2. Training set images belong to different places and also very different in context.
  3. Some classes are very rare such as water bodies and vehicle classes.
  4. Objects in the images are not in equal proportions.

6. Existing approach of the problem

Solution 1:

https://www.kaggle.com/alijs1/squeezed-this-in-successful-kernel

The above link applied a modified U-Net — an artificial neural network for image segmentation. They first preprocess the data i.e made masks for given images and used only 16 band M channels images.

After preprocessing they passed images and masks into modified U-Net model and trained the model. And then did prediction by model. In post-processing, convert predicted masks to WKT format.

By this model implementation, the final result they got is

Solution 2:

https://deepsense.ai/deep-learning-for-satellite-imagery-via-image-segmentation

The above link team secured 4th place in this Kaggle competition. They applied a modified U-Net — an artificial neural network for image segmentation. They solved the problem in the following steps:

  1. Preprocessing

For each image in the data, three versions: grayscale, 3-band, and 16-band are given. They resized and aligned 16-band channels to match those from 3-band channels. The alignment was necessary to remove shifts between channels. Finally, all channels were concatenated into a single 20-channels input image.

2. Modeling

Their fully convolutional model is inspired by the family of U-Net architectures, where low-level feature maps are combined with higher-level ones, which enables precise localization. Our final architecture is depicted below

3. Post-processing

Actual truth labels are provided in WKT format, presenting objects as polygons (defined by their vertices). In the post-processing stage, they used morphology dilation/erosion and simply removed objects/holes smaller than a given threshold.

By this approach implementation, the average IOU was 0.46.

7. My First Cut Approach to the above problem:

  1. The first way to know about the data is to do some preprocessing techniques such as read WKT format, project geometry to pixel coordinates, and open GeoTiff files.
  2. Then performed Exploratory Data Analysis to understand the data and to get insights of the dataset. This will help me to get a better score.
  3. The masks(labels) are generated for each of the objects in the train images.
  4. After that, I used only train images, and using augmentation produced more train images and split the train images into the train data, validation data, and test data.

5. Model implementation:

  • This problem is an image segmentation problem where input is an image and output is also an image and can be solved using Deep Learning techniques to detect and classify the types of objects found in a particular region.
  • The Convolutional Neural Networks (CNN) is the main supervised approach that can be used for this task. U-Net is especially designed to effectively solve image segmentation problems. The U-Net architecture will be modified to get a good score.
  • Relu activation function and He normal weight initializer will be used. Batch normalization and dropout will be used to prevent overfitting.
  • Transfer learning can be used to initialize weights of the U-Net model and the most successful state-of-the-art deep learning method the Fully Convolutional Network (FCN) can be used for this task.

6. Post-Preprocessing:

Ground truth labels are provided in WKT format, presenting objects as polygons (defined by their vertices). First, converted predicted masks into polygons and then polygons into wkt.

8. Models Explanation

Here we used two models:

  1. U-Net.
  2. Seg-Net.

U-Net:

  • The U-Net architecture is built upon the Fully Convolutional Network and modified in a way that it yields better segmentation in medical imaging. The U-Net owes its name to its symmetric shape, which is different from other FCN variants. Compared to FCN-8, the two main differences are (1) U-net is symmetric and (2) the skip connections between the downsampling path and the upsampling path apply a concatenation operator instead of a sum. These skip connections intend to provide local information to the global information while upsampling.
  • Because of its symmetry, the network has a large number of feature maps in the upsampling path, which allows to transfer information. By comparison, the basic FCN architecture only had number of classes feature maps in its upsampling path.
  • We modified U-Net according to our problem and below is the code snippet which explains all things.
  • Using the U-Net model the Jaccard score we got for test data is 0.3752.

Seg-Net:

SegNet is applied to solve the image segmentation problem. It consists of sequence of processing layers (encoders) followed by a corresponding set of decoders for a pixelwise classification. Below image summarizes the working of SegNet.

One key feature of SegNet is that it retains high frequency details in the segmented image as the pooling indices of encoder network is connected to pooling indices of decoder networks. In short, the information transfer is direct instead of convolving them.

  • We modified Seg-Net according to our problem and below is the code snippet which explains all things.
  • Using the Seg-Net model the Jaccard score we got for test data is 0.4463.

Performance characteristic graphs of SegNet Model through Tensor board:

Model performance

Observations:

From the above plot the train jaccard score(orange curve) and validate jaccard score(blue curve) both smoothly increases as when epoch increases.

Thus by the graph train jaccard score obtained as 0.42 and validate jaccard score obtained as 0.35. And the model is little bit underfit.

Model performance

Observations:

From the above plot the train loss(orange curve) and validate loss(blue curve) both are smoothly decreases as when epoch increases.

Thus by the graph train loss obtained as 0.1269 and validate loss obtained as 0.0881.

9. Comparison of the models in tabular format and results

  • Comparison of models is as follows:
  • From the above table, we can conclude that Seg-Net is the best model.

Results:

The output masks of an image by Seg-Net model are:

10. Error analysis:

The main idea behind error analysis is to know what type of data is leading to a bad Jaccard Score and whether we have similar patterns in the data.

We did this based on the Jaccard Score of each data point. We divided Jaccard Score into very low Jaccard Score, medium Jaccard Score, and very high Jaccard Score. Then took the very low Jaccard Score data points and did the EDA to get the patterns.

  • Below is the code to explain all these things.

Observations:

  1. Classes 2, 3, 4, 6, 7, 8, and 9 are totally absent.
  2. Only a few images have class 10.
  3. Class 4 is almost present in all images.
  • EDA on pixels:

Observations:

  1. The percentage of values less than threshold 0.4 is 57.066874999999996.
  2. More values less than taken threshold value implies that image brightness is not so good.

11. Deployment

We deployed the Seg-Net model and code snippets for deployment are given below:

The code snippet for the main app.py is

The code snippet for the index html page is

Now here is the recording of the inference time.

12. Future Work

  1. In the future, if we train with more satellite images we can expect better results than the above results with a good score.
  2. As the class number of vehicles is present in a very small quantity, so to combat this separate models can be trained for large and small vehicles.
  3. Multi-Scale Attention for Semantic Segmentation can be used to improve the results.

Github Repo

  • If you are interested in this case study or wants to improve it further, then Jupyter Notebook is available with all code at my following repo:-

--

--