Steel Defect Detection: Image Segmentation using Keras

12 min readFeb 12, 2020

--

Author: Karthik [LinkedIn]

- Detect and classify defects in steel

Keywords: Steel, Defect, Identification, Localization, Dice coefficient, segmentation models, Tensorflow, Run Length Encoding

1. Business Problem

1.1 Introduction:

Steel is one of the most important building materials of modern times. Steel buildings are resistant to natural and man-made wear which has made the material ubiquitous around the world. Identifying defects will help make production of steel more efficient. Severstal is leading the charge in efficient steel mining and production.

Credits: https://www.kaggle.com/c/severstal-steel-defect-detection/overview

1.2 Problem description:

Severstal is now looking to machine learning to improve automation, increase efficiency, and maintain high quality in their production.

The production process of flat sheet steel is especially delicate. From heating and rolling, to drying and cutting, several machines touch flat steel by the time it’s ready to ship. Today, Severstal uses images from high frequency cameras to power a defect detection algorithm.

This notebook will help engineers improve the algorithm by localizing and classifying surface defects on a steel sheet.

1.3 Source/Useful Links:

Data Source
Competition hosting company
For Classification: Xception
For Segmentation: Unet — EfficientNetB1
Training and predictions: Google Colab
Installing segmentation_models packages in Kaggle Kernel (useful for making an Inference kernel on Kaggle Platform)
Submitting *.csv on kaggle

1.4 Business objectives and constraints:

Maximize dice score
Multi-label probability estimates
Defect identification and localization should not take much time. In an ideal situation it is desirable to match with the frequency of cameras. It should finish in a few seconds. Inference kernel should take <= 1 hours run-time.
Save model weights to make inference possible anytime.

2. Deep Learning Problem

2.1 Data Description

Folder/

    sample_submission.csv    3 columns 

    train.csv                3 columns 

    test_images/             5506 .jpg images 

    train_images/            12568 .jpg images

Each image is of 256x1600 resolution. “train.csv” contains defect present image details. Its columns are:

ImageId, Class, EncodedPixels

Test data ImageIds can be found in sample_submission.csv or can be directly accessed from Image file names. Corresponding images can be accessed from train and test folders with the help of ImageIds.

Number of Defect Classes: 4

2.2 Translating to Deep Learning Problem

2.2.1 Type of Deep Learning Problem

There are 4 different classes of steel surface defects and we need to locate the defect => Multi-label Image Segmentation

2.2.2 Performance Metric:

Dice coefficient:

This metric is used to gauge similarity of two samples. The Dice coefficient can be used to compare the pixel-wise agreement between a predicted segmentation and its corresponding ground truth. The formula is given by:

where X is the predicted set of pixels and Y is the ground truth. The Dice coefficient is defined to be 1 when both X and Y are empty. The leaderboard score is the mean of the Dice coefficients for each [ImageId, ClassId] pair in the test set.

2.2.3 Deep Learning Objectives

Objective:

Maximize Dice coefficient
Identify and locate the type of defect present in the image. Masks generated after predictions should be converted into EncodedPixels.

EncodedPixels:

In order to reduce the submission file size, our metric uses run-length encoding on the pixel values. Instead of submitting an exhaustive list of indices for your segmentation, you will submit pairs of values that contain a start position and a run length. E.g. ‘1 3’ implies starting at pixel 1 and running a total of 3 pixels (1,2,3).
The competition format requires a space delimited list of pairs. For example, ‘1 3 10 5’ implies pixels 1,2,3,10,11,12,13,14 are to be included in the mask. The metric checks that the pairs are sorted, positive, and the decoded pixel values are not duplicated. The pixels are numbered from top to bottom, then left to right: 1 is pixel (1,1), 2 is pixel (2,1), etc.

Image Segmentation:

In computer vision, image segmentation is the process of partitioning a digital image into multiple segments (sets of pixels, also known as image objects). The goal of segmentation is to simplify and/or change the representation of an image into something that is more meaningful and easier to analyze. Image segmentation is typically used to locate objects and boundaries (lines, curves, etc.) in images.

More precisely, image segmentation is the process of assigning a label to every pixel in an image such that pixels with the same label share certain characteristics.
In the adjacent image, the original is hard to analyze with the help of computer vision models. With the help of image segmentation we can partition the image into multiple segments. This will make it easy for the computer to learn from patterns in these multiple segments. For example, each pixel belonging to cars is colored red.

3. Data Preparation:

Available data is not in the X_train, Y_train format, we need to generate these with the help of getting image names from train_images folder and merging these with train.csv as:

Data Extraction (archive.zip is downloaded from Kaggle to Google Drive)

Final result from DataFrame modifications

Using train_test_split before Exploratory Data Analysis we will avoid any kind of data leakage.

4. Exploratory Data Analysis:

Utility Functions for conversions between Run Length Encodings and Image Masks

**Observation:** The surface of the non-defective steel may contain different features or profile. It has to be noted that that presence of defect is limited to the 4 types of defects in this dataset. The steel surface may contain other defects but those should not be detected.

**Observation:** The regional profile on the masks of defect containing steel surfaces can be seen to be distinguishable among different classes. Defect type 1 can be seen to have multiple small size regions and defect type 4 images have multiple regions of medium size. Defect type 3 images can be seen to also contain multiple regions of medium size. While defect type 2 and type 3 images can be seen to share some regional characteristics.

4.1 ‘area’ as a new feature:

Used for thresholding masks after generating predictions

removing areas below 2 percentile and above 98 percentile to threshold area of predicted masks

Summary:

Based on range of area for each defect, we will threshold predictions to filter outliers. For e.g. some predicted masks have only 4 pixels that have value 1. Such an image will reduce the performance of the model on the final metric.

4.2 EDA conclusion:

The dataset is imbalanced thus we will use stratified sampling for splitting the dataset into train and validation datasets.
This is a multi-label image segmentation problem. As there are around 50% of images with no defects, it is equally important to identify images with no defects.
Based on area thresholds from ‘test_thresolds’ dataframe and class probability thresholds (which are to be determined after predictions from neural networks).

Procedure:

We will have a binary classification model to filter images with defects from no defect images.
A 4-label classification model to predict probablities of images beloning to each class.
4 segmentation models for four different classes to generate masks for each test image.
Convert masks to EncodedPixels and filter them as per classification probabilities.

We are generating a new solution to the business problem with available libraries: tensorflow, keras and segmentation_models.

4.3 Model Architecture:

Blue dots in the Architecture image indicates that an input is being given at that level, while black dot near “Apply threholds” correspond to the application of thresholds at the output of predicted masks. At the threshold application level images are filtered based on Defect presence probability, Defect type belongingness and area of the defect.

Note: It is important to take care that right training data is fed into each model. The effect of training data on loss function guides us through this. Binary Classifier will be trained with all images. Multi-Label Classifier will be trained with Images having defects. The defined architecture has 4 output neurons which equals with the number of Classes. Multi-label classifier training images can include defect present images and defect absent images as well if 5 neurons were chosen 4 for defect classes and 5th for “no defect” class. Here, additional Binary Classifier model becomes redundant. When using Multi-label classifier with 4 output neurons, feeding no defect images(X) implies all target data(Y) is 0 ([0,0,0,0]) which results in ‘zero’ loss. There will be no training or weight updates if loss is ‘zero’. Similarly segmentation models are trained on each defect separately. Thus, here we are using 4 segmentation models each trained separately on each defect. This is the scheme utilised in this approach while other schemes can be used and the training data fed into the model should be appropriate to the model defined. Loss function also plays a role on deciding what training data is used for the model.

5. Data generators and Model Building

Look through Github Notebook for Data Generator definition and custom metrics.

Thresholding for high precision with slight compromise on overall recall is followed to get a good Competition metric.

5.1 Binary Classifier:

Train and predict the probability of presence of defects in images

Binary cross entropy loss of the model can be seen to have large variations on validation set. This implies that the model is having tough time generalizing on unseen dataset when predicting presence of defects.

Best weights found @19epoch:

Summary: The model is having good performance on train, validation and test dataset. The values of loss and metrics can be seen to be similar in these datasets. This tells that the model is not overfitting on dataset. The f1_score of 0.921 on validation dataset is acceptable

5.2 MultiLabel Classifier:

Predict probability of presence of each defect in an image

Model similar to Binary Classifier with 4 output neurons

Summary: The multi-label classification model is generalizing well on unseen data (the values of evaluation on test set and validation set are closer to train set).

5.3 Image Segmentation:

Data preparation:

Model definition:

Legendary UNet with EfficientNetB1 backbone is used for Segmentation purposes.

5.3.1 Defect Label 1:

Dice coefficient vs epoch plot for training the segmentation model on defect 1

Note: Dice coefficient is also known as F1_score.

5.3.2 Defect Label 2:

Dice coefficient vs epoch plot for training the segmentation model on defect 2

5.3.3 Defect Label 3:

Dice coefficient vs epoch plot for training the segmentation model on defect 2

5.3.4 Defect Label 4:

Well, the training of the models was easy. Let’s see their prediction capability.

6. Inference:

Best models, from the training above, are saved to make inferences on images.

Defining dependencies for loading saved models is important while using custom metrics

For generating predictions using Classifier models

For generating predictions using Segmentation models

Area thresholds and Classification thresholds are applied to the predictions of the models.
steel_prediction() and steel_evaluation() are final functions defined for generating predictions and evaluations. [Code available in Github notebook]

Applying steel_evaluation():

Sample evaluation on a single image: This image has no defect and the models have perfectly detected that image has no defect.

6.1 Train set:

Note: If we want to move one FN to TP, more than one TN become FPs due to high imbalance in the dataset. The contribution of reduction of FP is higher than the contribution of reduction of FN in the final competition metric (Mean Dice Coefficient).

Classification Report: The model has tried to generate high precision for multilabel classification and high recall for binary classification tasks. It is important to have less False positives overall.
Confusion Matrix: Observation: The model can be seen to have some confusions. It is evident that the model has tried its best to reduce False Positives.

6.2 Validation set:

6.3 Test set:

7. Summary:

Images and its masks (in form of EncodedPixels) are provided to train a Deep Learning Model to Detect and Classify defects in steel. (Multi-label Classification). The competition is hosted by Severstal on Kaggle.
Exploratory Data Analysis revealed that the dataset is imbalanced. A new feature ‘area’ is created to clip predictions with segmentation areas within a determined range. Different classes are observed to overlap on smaller values of area feature. This makes class separation not possible based solely on ‘area’ feature. It was observed that most of the images either contain one defect or do not have a defect.
6 model architecture is generated to train and test on this dataset. One binary classifier, One Multi-Label Classifier and Four segmentation models are used for the task.
Image data contains minimal preprocessing. Pixel value scaling and Image augmentations for Model training are achieved using DataGenerators.
Minority class priority based stratified sampling is performed on the dataset to split train set into train and validation sets.
Pre-trained Deep Learning models are used: Xception architecture for Classification and legendary Unet architecture with efficientnetb1 backbone trained on ImageNet dataset for Segmentation.
Tenosorboard is utilized for saving logs and visualizing model performance at each epoch. It has been observed that the models have satisfactory performance on defined metrics. It can also be deduced that a certain degree of confusion exists in both classification and segmentation models as the defect detection and localization are not perfect.

8. Kaggle Screenshot (Best submission):

Due to limited compute, I have stopped working further on this dataset.

9. Future Work:

Higher compute will allow us to include a larger Batch size for training all the models(increasing from 8 to 16 or 32).

A single strong model (possible to define easily with Pytorch version of segmentation_models library) can improve the performance a lot. Multiple models have this performance multiplier effect which reduces overall performance (<1 x <1 x … =<<1).
Different architectures can be experimented such as combining the Binary and Multi-label Classifier into a Single Classifier model.
Improving the quality of training data fed into the Neural Networks defines the performance. Techniques such as Test Time Augmentations can be experimented while Defect region blackouts can be used to increase number of training images(converting regions of defects to black pixel intensities converts defect present images to no defect image). Resolution of the output from ImageDataGenerators can be varied.

10. References:

Data Source: https://www.kaggle.com/c/severstal-steel-defect-detection/data
For Classification: Xception: https://keras.io/applications/#xception
For Segmentation: Unet — EfficientNetB1: https://github.com/qubvel/segmentation_models
Training and predictions platform: Google Colab https://colab.research.google.com/
Course: https://www.appliedaicourse.com/course/11/Applied-Machine-learning-course
Utility functions: Run Length Encodings, metrics

Github: https://github.com/rook0falcon

LinkedIn: https://www.linkedin.com/in/karthik-kumar-billa/