Dstl Satellite Imagery Feature Detection
Table of Contents:
1. Business problem
2. Overview of Data
3. Performance Metrics
4. EDA
5. Existing Approach
6. Data Pre-Processing
7. Modelling
8. Error Analysis
9. Deployment
10. Future Work
11. Reference
1. Business Problem
Analyzing satellite/aerial images is been playing a major role in various fields like Disaster management, Defence, Monitoring effects of global warming, Urban Planning, etc. All these things can be automated by integrating this field with Deep Learning/Computer Vision.
Here, Object recognition could be a primary task in analyzing satellite images. In the current scenario, this could be done more accurately due to the advancement in both hardware (CPUs and GPUs) and Deep learning techniques
This blog contains work on the above similar problem statement which is actually from Kaggle, Where 1 Km x 1 Km satellite images are provided, in various bands and our goal is to detect and classify the types of objects found in the region
2. Overview of Data
2.1. train_wkt.csv
the WKT format of all the training labels
• ImageId — ID of the image
• ClassType — the type of objects (1–10)
• MultipolygonWKT — the labeled area, which is multipolygon geometry represented in WKT format WKT -> Link
2.2. three_band.zip
the complete dataset of 3-band satellite images.
2.3. sixteen_band.zip
the complete dataset of 16-band satellite images.
2.4. grid_sizes.csv
the sizes of grids for all the images
• ImageId — ID of the image
• Xmax — maximum X coordinate for the image
• Ymin — minimum Y coordinate for the image
2.5. train_geojson.zip
the geojson format of all the training labels (essentially these are the same information as train_wkt.csv)
2.6. Class Label
- Buildings — large building, residential, non-residential, fuel storage facility, fortified building
- Misc. Manmade structures
- Road
- Track — poor/dirt/cart track, footpath/trail
- Trees — woodland, hedgerows, groups of trees, standalone trees
- Crops — contour ploughing/cropland, grain (wheat) crops, row (potatoes, turnips) crops
- Waterway
- Standing water
- Vehicle Large — large vehicle (e.g. lorry, truck, bus), logistics vehicle
- Vehicle Small — small vehicle (car, van), motorbike
3. Performance Metrics
Used Jaccard Index as Performance Metrics
4. Exploratory Data Analysis
4.1. Frequency of Class Label
Observations:
- There are 25 unique Images
- All Images have an object tree
- The waterway is present only in a few Images
- Almost all images have trees and tracks
4.2. Multipolygon
Below shows a few comparisons between polygons and the original Image
4.3. ClassWise Multipolygon
Below shows the image with class-wise multipolygon
4.4. Areas of object
Observations:
- Crops cover the largest area
- waterways and vehicles have the lowest area coverage
5. Existing Approach
Overview of Dataset
All images are resized to 3 Band RBG image size and then concatenated. The resultant contains 20 channels with 3348 x 3392. As this array is too large to process, it is converted to patches with sizes 112 x 112 x 20. Patching is carried on both Images and masks and with this various DNN architectures are trained. Architectures like Multispectral U-net, Inverted pyramid model, PSPNET, etc.
In this approach, only 8 Bands i.e. M band is used for the training of models. the model used is U_Net.
6. Data Pre-Processing
Here the main goal is to ready the dataset which further will be used to train the model.
Various functions are created which will get pixel values to a given range and also extract masks from MultipolygonWKT values provided in DataFrame (train_wkt_v4.csv).
From the above functions, masks of various images are extracted and stored in a folder as .tif file
patches of all the input images and masks are created and stored as .npy file
Above final all_images.npy and all_masks.npy is used for training the model.
7. Modelling
U_Net model is trained on the above-generated dataset.
U_Net model is a very much known model used mainly for segmentation tasks and primarily used in the medical domain. Architecture is called U Net mainly due to its symmetric shape and many skip connections.
The plot of epoch vs loss and epoch vs Jaccard_coef is shown below.
one thing which can be observed is that the last layer contains sigmoid activation and ‘binary_crossentropy’ is used as a loss.
Here in each pixel and each channel i.e. class label, we will get a probability score.
The below code snippet shows the extraction of the best threshold for each class label.
Jaccard score on the test dataset is 0.67.
Prediction of the above model
The input image is patched and model prediction on this contains patches of predicted masks. Patches of these predicted masks are combined and compared with the original masks
8. Error Analysis
Image ID for a low Jaccard score (i.e. below 0.2) is extracted and its EDA is shown below
Image ID for an average Jaccard score (i.e. between 0.2 and 0.6) is extracted and its EDA is shown below
Observations:
- The Low Jaccard score is mainly due to very low area coverage by many objects(Like small and large vehicles). Due to this, there will be many misclassifications which will drag the Jaccard score of that particular class to 0. This will again impact the overall Jaccard score of complete Image as we average out the Jaccard score of all classes
9. Deployment
The complete final pipeline is deployed using streamlit
Youtube link for above pipeline execution ->
10. Future work
- Dataset can be trained with SegNet model to improve Jaccard score
- A separate model can be used to train on small objects like vehicles
- Complete 20 Bands can be used for training and prediction
11. References
- https://www.kaggle.com/code/anomsulardi/dstl-semantic-segmentation
- https://www.kaggle.com/code/visoft/export-pixel-wise-mask/script
- https://www.kaggle.com/code/drn01z3/end-to-end-baseline-with-u-net-keras/script
- https://www.kaggle.com/code/ksishawon/segnet-dstl
- https://www.appliedaicourse.com/course/11/Applied-Machine-learning-course
you can find my complete code here — GitHub Repo
you can connect with me on Linkedin