Using Deep Learning Semantic Segmentation method for Ship detection on Satellite Optical Imagery
A Satellite Remote Sensing Use case
The Competition
A lot of work has been done over the last 10 years to automatically extract objects from satellite images with significative results but no effective operational effects.
Now Airbus is turning to Kagglers to increase the accuracy and speed of automatic ship detection.
Background
Shipping traffic is growing fast. More ships increase the chances of infractions at sea like environmentally devastating ship accidents, piracy, illegal fishing, drug trafficking, and illegal cargo movement. This has compelled many organizations, from environmental protection agencies to insurance companies and national government authorities, to have a closer watch over the open seas.
Data set
- The
train_ship_segmentations.csv
file provides the ground truth (masks of ships) in each image. If there are no ships, theEncodedPixel
column is blank. - The
sample_submission
files contain the images in thetest
images.
The Data Set and Competition can be found on Kaggle or on the Airbus Sandbox platform.
What is Semantic Segmentation
Let’s first define some crucial concepts to make sure everyone is on the same page.
- Image Classification
Let’s start with the simplest, image classification.
Each image corresponds to one and only class from a set of different classes. Thus, here we attribute a specific class to each input image.
We can go further and localize the class object within the image. Here is another cat in case one was not enough.
- Object Detection
As I explained above, the challenge in object detection is to surround each class object in an image with the corresponding bounding box. So, the difference with image classification is that each image can have multiple classes instead of one and only. Usually, we add a probability value for each object belonging to a class.
- Semantic Segmentation
This technique is what we try to achieve in the Airbus ship detection competitions on Kaggle.
This is a pixel-level classification problem where we want to apply a mask on each class object {Person, Bicycle, Background}. Later, I’ll explain how the evaluate the mask precision on predicted ships in each image regarding the ground truth (real number of ships and their localization on each image).
- Instance Segmentation
Finally, instance segmentation is going one step further and is classifying each instance of a class separately. For example, this method could detect 2 cats and 2 dogs in an image but each of the 4 animals has his own segmentation. Whereas, semantic segmentation is not discriminating objects from the same class. For example on the following image (instance segmentation), there are 3 people, technically 3 instances of the class “Person”. But the 3 are classified separately (in a different color).
Now that this is clear for everyone, we can dive into the deep blue ocean and try to detect all ships. Small exercise with the naked eye on the next image.
Ship Semantic Segmentation
Let’s get down to the Machine Learning script
First, import all necessary libraries and create the working directories
import os
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
from skimage.io import imread
import matplotlib.pyplot as plt
from skimage.segmentation import mark_boundaries
from skimage.util import montage
import gc; gc.enable() # memory is tight
from skimage.morphology import labelmontage_rgb = lambda x: np.stack([montage(x[:, :, :, i]) for i in range(x.shape[3])], -1)
ship_dir = '../input/airbus-ship-detection'
train_image_dir = os.path.join(ship_dir, 'train_v2')
test_image_dir = os.path.join(ship_dir, 'test_v2')
Then, get some metadata.
masks = pd.read_csv(os.path.join('../input/airbus-ship-detection/',
'train_ship_segmentations_v2.csv'))
print(masks.shape[0], 'masks found')#output
231723 masks found
After that, I can split the data into the training set and validation set.
from sklearn.model_selection import train_test_split
train_ids, valid_ids = train_test_split(unique_img_ids,
test_size = 0.3,
stratify = unique_img_ids['ships'])train_df = pd.merge(masks, train_ids)
valid_df = pd.merge(masks, valid_ids)
print(train_df.shape[0], 'training masks')
print(valid_df.shape[0], 'validation masks')#output
161048 training masks
69034 validation maskstrain_df['ships'].hist(bins=np.arange(16)
Since we grouped our data set by imageId and aggregated their binary value no_ships=0 and ships=1. This histogram depicts on the x-axis the number of aggregated ships per imageId. We can observe a majority of images without ships and a normal decreasing distribution of unique imageId with a least one ship.
Therefore, we need to have a better-balanced training dataset.
Undersample the empty images to get a better-balanced group classes
This code is filtering more the class without ships to create a uniform distribution.
train_df['grouped_ship_count'] = train_df['ships'].map(lambda x: (x+1)//2).clip(0, 7)#floor div // rounds the result down to the nearest whole number
#train_df['grouped_ship_count'].hist() #clip() sets min and max intervals.
def sample_ships(in_df, base_rep_val=1500):
if in_df['ships'].values[0]==0:
return in_df.sample(base_rep_val//3) # even more strongly undersample no ships
else:
return in_df.sample(base_rep_val, replace=(in_df.shape[0]<base_rep_val))
balanced_train_df = train_df.groupby('grouped_ship_count').apply(sample_ships)
balanced_train_df['ships'].hist(bins=np.arange(16)) #with 10 bins
Image processing, filtering, and masks
U-Net Model Design
The architecture looks like a ‘U’ which justifies its name. This architecture consists of three sections: The contraction, The bottleneck, and the expansion section. But the heart of this architecture lies in the expansion section. This action would ensure that the features that are learned while contracting the image will be used to reconstruct it.
from keras import models, layers
# Build U-Net model
def upsample_conv(filters, kernel_size, strides, padding):
return layers.Conv2DTranspose(filters, kernel_size, strides=strides, padding=padding)
def upsample_simple(filters, kernel_size, strides, padding):
return layers.UpSampling2D(strides)if UPSAMPLE_MODE=='DECONV':
upsample=upsample_conv
else:
upsample=upsample_simple
input_img = layers.Input(t_x.shape[1:], name = 'RGB_Input')
pp_in_layer = input_img
if NET_SCALING is not None:
pp_in_layer = layers.AvgPool2D(NET_SCALING)(pp_in_layer)
pp_in_layer = layers.GaussianNoise(GAUSSIAN_NOISE)(pp_in_layer)
pp_in_layer = layers.BatchNormalization()(pp_in_layer)c1 = layers.Conv2D(8, (3, 3), activation='relu', padding='same') (pp_in_layer)
c1 = layers.Conv2D(8, (3, 3), activation='relu', padding='same') (c1) #Going Down
p1 = layers.MaxPooling2D((2, 2)) (c1) #2x2 kernelc2 = layers.Conv2D(16, (3, 3), activation='relu', padding='same') (p1)
c2 = layers.Conv2D(16, (3, 3), activation='relu', padding='same') (c2)
p2 = layers.MaxPooling2D((2, 2)) (c2)c3 = layers.Conv2D(32, (3, 3), activation='relu', padding='same') (p2)
c3 = layers.Conv2D(32, (3, 3), activation='relu', padding='same') (c3)
p3 = layers.MaxPooling2D((2, 2)) (c3)c4 = layers.Conv2D(64, (3, 3), activation='relu', padding='same') (p3)
c4 = layers.Conv2D(64, (3, 3), activation='relu', padding='same') (c4)
p4 = layers.MaxPooling2D(pool_size=(2, 2)) (c4)c5 = layers.Conv2D(128, (3, 3), activation='relu', padding='same') (p4) #Bottle Neck
c5 = layers.Conv2D(128, (3, 3), activation='relu', padding='same') (c5)u6 = upsample(64, (2, 2), strides=(2, 2), padding='same') (c5) #Going Up
u6 = layers.concatenate([u6, c4])
c6 = layers.Conv2D(64, (3, 3), activation='relu', padding='same') (u6)
c6 = layers.Conv2D(64, (3, 3), activation='relu', padding='same') (c6)u7 = upsample(32, (2, 2), strides=(2, 2), padding='same') (c6)
u7 = layers.concatenate([u7, c3])
c7 = layers.Conv2D(32, (3, 3), activation='relu', padding='same') (u7)
c7 = layers.Conv2D(32, (3, 3), activation='relu', padding='same') (c7)u8 = upsample(16, (2, 2), strides=(2, 2), padding='same') (c7)
u8 = layers.concatenate([u8, c2])
c8 = layers.Conv2D(16, (3, 3), activation='relu', padding='same') (u8)
c8 = layers.Conv2D(16, (3, 3), activation='relu', padding='same') (c8)u9 = upsample(8, (2, 2), strides=(2, 2), padding='same') (c8)
u9 = layers.concatenate([u9, c1], axis=3)
c9 = layers.Conv2D(8, (3, 3), activation='relu', padding='same') (u9)
c9 = layers.Conv2D(8, (3, 3), activation='relu', padding='same') (c9)d = layers.Conv2D(1, (1, 1), activation='sigmoid') (c9)
d = layers.Cropping2D((EDGE_CROP, EDGE_CROP))(d)
d = layers.ZeroPadding2D((EDGE_CROP, EDGE_CROP))(d)
if NET_SCALING is not None:
d = layers.UpSampling2D(NET_SCALING)(d)seg_model = models.Model(inputs=[input_img], outputs=[d])
seg_model.summary()
Model Fitting
Result evaluation
Due to the very slow runtime until the end of the submission. I was limited in the time I could spend on this project. I decided to move forward even though results could be enhanced. There are so many exciting and challenging Satellite Imagery projects to learn from that I could not resist moving on to another project.
Conclusion
This project is a great one to start with and become familiar with Satellite Imagery in machine learning. I hope this article was informative regarding this field and has tickled your curiosity in order to read more about such projects and make you jump into Kaggle competitions (which is the best place to learn from, with practical exercises on real business cases).
I will be writing more about Computer Vision and Machine Learning and especially in the Earth Observation sector. So follow me if you share this interest.