CaseStudy-TGS Salt Identification Challenge

Manoj Guthikonda
Jan 7 · 8 min read

Table Of Contents:

  1. Introduction
  2. Business Problem
  3. ML Problem Mapping & Metric
  4. Data Description
  5. Existing Approaches
  6. First Cut Solution
  7. Exploratory Data Analysis
  8. Prerequisites
  9. Model Explanation
  10. Results & Deployment
  11. Future Work
  12. Profile
  13. References

1.Introduction

The Following is a case study of a kaggle problem.This solution will give top 6% in the leaderboard with free colab gpu in 5hrs, code is written in Pytorch.Colab notebook with code in References.

2.Business Problem

Build a model where given input of seismic image, the model predicts every pixel as salt or no salt.

Need for Automation?

Several areas of Earth with large accumulations of oil and gas also have huge deposits of salt below the surface.But unfortunately, knowing where large salt deposits are precisely is very difficult. Professional seismic imaging still requires expert human interpretation of salt bodies. This leads to very subjective, highly variable renderings. More alarmingly, it leads to potentially dangerous situations for oil and gas company drillers.

3.ML Problem Mapping & Metric:

This can be posed as a Segmentation type problem.No latency constraints are there in this case.

Performance metric used here is mean averaged precision over multiple IOU Thresholds,IOU stands for Intersection Over Union. The metric sweeps over a range of IOU thresholds, at each point calculating an average precision value. The threshold values range from 0.5 to 0.95 with a step size of 0.05: (0.5, 0.55, 0.6, 0.65, 0.7, 0.75, 0.8, 0.85, 0.9, 0.95). In other words, at a threshold of 0.5, a predicted object is considered a “hit” if its intersection over union with a ground truth object is greater than 0.5.

iou=.669
iou=[iou]*10
thresholds= np.arange(.5,1,.05)
metric= np.mean(iou>thresholds)

Submission Format: Instead of submitting masks directly for evaluation, run-length-encoding format was used.

mask= np.random.randint(2, size=(101, 101))
def rle_encoding(mask):
dots = np.where(mask.T.flatten() == 1)[0]
run_lengths = []
prev = -2#pixels are 1-indexed so correction required
for b in dots:
if (b > prev+1): run_lengths.extend((b + 1, 0))
run_lengths[-1] += 1
prev = b
rle=' '.join(map(str, run_lengths))
return rle

4. Data Description:

Dataset images and masks in .png files,link for data below

5.Existing Approaches:

  • This kernel talks about training using Focal Loss first followed by Lovasz loss,BCE can replace Focal Loss.
  • This kernel uses deep supervision to accelerate training to just 60 epochs which should normally take 200 epochs.
  • Here Snapshot ensembling and Coord Conv have been used to get a really good score.

6.First Cut Solution:

At first an easy baseline(UNet-Resnet34) was taken.Only necessary Augmentations(Normalize,Totensor())were applied here.Loss was BinaryCrossEntropy,Optimizer Adam was chosen with a learning rate of 3e-4.

7.Exploratory Data Analysis

Train Dataset consists of 4000 images and Test Dataset consists of 18000 images.The competition has very less train data.Possible overfitting throughout the competition.Need for transfer learning and Data Augmentation.

3x101x101 is the size of all the train and test images 101x101 size of train and testset masks.Some anomalies present in the dataset are black images with empty masks,vertical mask as shown below.Yellow part below shows salt.

Glimpse of Dataset
After Augment Image

Test Image Preprocessing involves taking the image and Replicate Padding it to 128x128, and Normalising the image channelwise to ((0,0,0),(1,1,1)).128 size is important as the solution involves taking in UNet which is easy to work with on power 2 dimensions.

Salt Coverage Computation: Salt Coverage seemed to be a very important variable to focus upon thus every mask in the train dataset has been computed for salt coverage in it.

mask
salt_coverage= np.mean(mask)*10
#rounding this value to int for easy validation
np.rint(salt_coverage)
#values in salt Coverage= [0,1,2,3,4,5,6,7,8,9,10]

8. Prerequisites:

SCSE blocks are supposed to add Attention Mechanism to Convolutional Networks.Attention in a broad sense in nothing but focussing on some things and not all.This is achieved here by adding parameters to judge which Spatial Pixel and which Channel are to be focussed upon.

In Conv networks as we go deep we will be left with high level features, in early layers we will find low level features.Hypercolumns on the other hand are supposed to give much more predictive power to the Network by giving it access to all the lower level along with higher level details in a Conv Network.They achieve this by taking output of every stage, upsampling it to target size and concatenating everything and passing to the last Convolution.

9.Model Explanation:

CrossValidation:Stratified Split has been used with 10fold , and stratify has been done using the salt coverage of the image.

9.1 Baseline

Baseline was performing great but very fastly overfitting.UNet with Resnet34 involves using Resnet34 as encoder and taking the output before every downsample in resnet 34 and connecting them to the decoder before respective upsampling steps in decoder.Decoder was kept fairly simple with 64 channels throughout.

blue-cyan for train,Baseline model

So Different Augmentations were used for train and val. This made the train loss to be worse than val loss at every part of training.After this the training was easy as heuristic was to train long enough that train loss is better than val loss.This seemed to correct the issue of overfitting to some extent.Dropout can seem to be a very natural solution to overfitting but i avoided dropout cause it has significantly delayed training process, providing no great outcome compared to these augmentations.

transform_train = Compose([
HorizontalFlip(p=.5),
Compose([RandomCrop(90,90),
Resize(101,101)],p=.2),
OneOf([RandomBrightness(.1),
RandomContrast(.1),RandomGamma()],p=.2),
PadIfNeeded(128,128,cv2.BORDER_REPLICATE),
Normalize(mean=(0,0,0),std=(1,1,1,))])
blue-cyan train Baseline_corrected

Then Lovasz loss which is a surrogate loss function for IOU has been optimised.But training on this loss has been very slow , thus to accelerate the training process i have pretrained weights of Baseline_corrected and this did prove to help accelerate the training process by atleast 80 epochs.Here Adam with 1e-4 learning rate has been used.Lr Scheduler also has been helping the training process it has max_lr of 1e-3 and min_lr of 1e-4 with 10 epochs per cycle and ‘triangular2' lr mode.

Baseline_lovasz

9.2: SCSE_hypercol

Next model was Baseline with Spatial and Channel Squeeze and Excitation (SCSE) blocks and Hypercolumns.SCSE Blocks are placed only in the decoder and Upsampling is used in Hypercolumns with bilinear mode.The reduction factor in SCSE Block was kept default 16.Decoder here also used upsampling instead of ConvTranspose2d.

This Network showed only little improvement.Then Error analysis on the network is performed.The analysis showed that a considerable amount of error is due to networks inaccuracies on predicting whether salt is present or not.Around 11 False Positives were present along with 24 False Negatives in 0.0 IOU Predictions.

Scse_hypercol

9.3 Binary_judge

Next model tried a simple solution on the same architecture that was to judge at the center block whether there is salt or no salt in the image.ie. Take out the output from the encoder and check for salt or no salt in there, and then only those with non empty masks will be trained for semantic segmentation.This is sort of hard attention as we are multiplying with 1,0 the targets.The training was done for 200 epochs,30 epochs with BCE loss and then with Lovasz Loss.Adam 3e-4 for BCE and then Adam 1e-4 for 90 epochs then with Cyclic Learning rate (1e-3,1e-4) for 80 epochs.The Binary Classifier on the center Block was 92% accurate on the val set whereas the Metric was .87 on val set.

The loss calculation for this model is to take the output and multiply it with 0 if the image has no salt,1 if salt is present. then this output will be used to calculate lovasz loss and for classification loss(BCE) every image will be used.

This solution improved the overall training process a lot. The convergence was very fast here and doing this helped the FalsePositives to go from 11 to 6, and it also helped the training of non-empty images.

blue-cyan train Binary_judge model

10.Results & Deployment:

Kaggle best score submit
Model Comparison
Fig1
Fig2:binary_judge

In Above figures legend is the last pic,Orange -> True Positive,Grey ->False Negative,Red -> False Positive,Blue-> True Negative.These images have been made by overlapping the predictions over the true masks, this process is only done on masks in val (fold0) set with metric ==0.0

Observations:

  • Figure 2 clearly shows less errors compared to Figure 1, this is obvious as Figure2 was from the best model.
  • High numbers of errors are coming with blue and grey(true-negative,false-negative).
  • Also many errors are made in the corners and edges.Metric calculation when done on crop of targets(90,90) was way better than the actual metric.
  • The models also seem to be confused between a complete mask and no mask.
  • Vertical masks as expected were causing problems.

In this table iou column represents the metric groups, the 2nd column is the count of the whole dataset in that metric group.

Observations:

  • Table1 is from the best model, it clearly shows overall improvement compared to Table2.
  • This is interesting as the change to add binary judge to the model is meant to only influence the 0.0 iou range(decreased) but here 1.0 iou range got better (increased) as well.

Deployment:

The Best model has been deployed on google cloud, the deployment process was kept fairly simple, It was done using flask.

Deployed model working

11.Future Work:

  1. Pseudo labelling looks very promising in this case owing to such a small train set.
  2. KFold ensemble will definitely improve by atleast .01 IOU.
  3. Snapshot Ensembling or Stochastic Weight Averaging will also definitely improve the score.
  4. DeepSupervision can be done in this network to have better convergence.
  5. CoordConv solution has been known to improve scores a bit.

12.Profile:

13.References:

Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data…

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store