Detection of Steel Defects: Image Segmentation using Keras and Tensorflow

11 min readSep 8, 2020

Index

Problem Statement
Source of Data
Business Objectives & Constraints
Dataset Description
Performance Metric
Exploratory Data Analysis
First Cut Solution
Binary Classification Model
Segmentation Model
Summary
Conclusion
References

Steel is the world’s most important engineering and construction material. It is used in every aspect of our lives; in cars and construction products, refrigerators and washing machines, cargo ships and surgical scalpels. It can be recycled over and over again without loss of property. To help make production of steel more efficient, this competition will help identify defects.

Steel Defect Detection is a competition hosted on kaggle by one of the largest steel manufacture company Severstal. Please visit kaggle site provided for more details about this competition.

1. PROBLEM STATEMENT:

The company recently created the country’s largest industrial data lake, with petabytes of data that were previously discarded. Severstal is now looking to machine learning to improve automation, increase efficiency, and maintain high quality in their production.

The production process of flat sheet steel is especially delicate. From heating and rolling, to drying and cutting, several machines touch flat steel by the time it’s ready to ship. Today, Severstal uses images from high frequency cameras to power a defect detection algorithm.

In this competition, you’ll help engineers improve the algorithm by localizing and classifying surface defects on a steel sheet.

2. SOURCE OF DATA:

The dataset has been obtained from the kaggle’s website with the following link:

Severstal: Steel Defect Detection

Can you detect and classify defects in steel?

www.kaggle.com

3. BUSINESS OBJECTIVES AND CONSTRAINTS:

Here we will define the objectives of the problem and what should be the constraints to solve the problem:

· Maximize dice score.

· You must segment and classify the defects in the test set.

· Submissions to this competition must be made through Kernels.

· Inference kernel should take <= 1 hours run-time.

· Save model weights to make inference possible anytime.

4. DATA DESCRIPTION:

Files

· train_images/ — folder of training images

· test_images/ — folder of test images (you are segmenting and classifying these images)

· train.csv — training annotations which provide segments for defects (ClassId = [1, 2, 3, 4])

· sample_submission.csv — a sample submission file

Each image is of 256x1600 resolution. “train.csv” contains defect present image details. Its columns are: ImageId, ClassId, and EncodedPixels.

Each image may have no defects, a defect of a single class, or defects of multiple classes. For each image you must segment defects of each class (ClassId = [1, 2, 3, 4]).

5. PERFORMANCE METRIC:

This competition is evaluated on the mean Dice coefficient. The Dice coefficient can be used to compare the pixel-wise agreement between a predicted segmentation and its corresponding ground truth. The formula is given by:

Where X is the predicted set of pixels and Y is the ground truth. The Dice coefficient is defined to be 1 when both X and Y are empty. The leader-board score is the mean of the Dice coefficients for each [ImageId, ClassId] pair in the test set.

6. Exploratory Data Analysis:

Available data is not in the X_train and Y_train format, we need to generate these with the help of getting image names from train_images folder and merging these with train.csv. After doing this, our data will look like this:

6.1 Defective and non-defective classes:

Firstly we will check the number of defective and non-defective images.

As we see the distribution of defective and non-defective classes above we say that this is a well-balanced Binary classification problem

6.2 Steel Mask and Pixel count:

Here we will try to find out the number of different types of defects and their size means which defect is bigger in size than any other defect.

Observations

Mask count will show the count of the defects and pixel count will show the area or size of the defect in an image.
Obviously we have a lot of samples from class 3 and dataset is highly imbalanced. Almost 73% of the all defects are of class 3.
Although class 4 defect are 11.3% of the all defect, if you consider from the total area of defect perspective, they have almost 17% of real-estate. This means that typically defect of class 4 are larger in size.
As defect size for 2 is very small, and class 1 is very small. Class 1 and 2 represents 12.6% and 3.48% of the total defects respectively. However, in terms of pixel count of the defect mask, they only make up 2.39% and 0.51% of the total mask respectively. In terms of sample and especially in terms of area.
Our network may have a hard time finding class 1 and 2 two because of their small size.

6.3 Defect Frequency per image:

Here we will check that how many types of defects are there in one image.

So, we can see that often times two types of defects are reported per image. However, most of the time, there is only one type of defect per image. And we only have two instances where there are three types of defect in a single image.

6.4 Frequent Pattern Mining:

As each image can have more than one class of faults, it brings an interesting question of how frequent different kind of faults occur at once. We will use an algorithm called FP (Frequent Pattern) growth to examine which types of faults occur in pairs.

Observations

From the FP chart above, we can see that the frequency of an image with fault of 3, 1, and 4 is the most frequent scenario single.
The combination of 3 and 4 is actually more frequent than class 2 appearing alone. This is even more interesting as class 3 and 1 is the more frequent sample in the dataset.
We should do some augmentation and increase the number of examples for class 2.

6.5 Visualization of each class defect:

Let us see some images of each class. This visualization can be done easily by masking given encoded pixels on the train data images.

Class -1 Defect

Class-1 defects seems to have less area or size and almost similar to non-defective images.

Class -2 Defect

We can notice that class — 2 defects are similar to class-1 defects and it is somewhat difficult to classify between them.

Class -3 Defect

We can observe that class-3 images are worse in terms of defect compared to class 1, 2.

Class -4 Defect

We can see that these class-4 images are the most damaged images. These can be classified and segmented easily because these unique type of edges and defects rather than other classes.

We finally conclude that classes 1, 2 are similar and less defective while classes 3,4 are less similar but more defective and hence it is easy to classify them.

EDA conclusion:

The dataset is imbalanced thus we will use stratified sampling for splitting the dataset into train and validation datasets.

This is a multi-label image segmentation problem. As there are around 50% of images with no defects, it is equally important to identify images with no defects.

7. First Cut Solution:

Here in this problem we have 3 tasks to do: Firstly we have to check whether an image has a defect or not if it has a defect then what type of defect is this and then we have to determine the location of the defect in the image.

· We will have a binary classification model to filter images with defects from no defect images.

· One segmentation model for all defects which will predict the type of defect and location of defect and generate masks for each image.

· Convert masks to EncodedPixels and filter them as per classification probabilities.

8. Binary Classification Model:

For Binary Classification, I used the Xception model and the weights trained from image-net data.

Input pipeline: for this model I used the tf.data pipeline because tensorflow functions are more efficient and consumes less time.

Metrics: We are using F1_score as the metric because the advantage of the F1 score is it incorporates both precision and recall into a single metric, and a high F1 score is a sign of a well-performing model, even in situations where you might have imbalanced classes.

Model: Xception model

After training the generator for few epochs we got a good performance of the models

As we can see that out binary model got accuracy 94 and recall of 92 by the end of 6 epochs, it means that out binary model is doing good.

Now, let’s see the plot of accuracy and f1_score and loss between train and validation data.

Plot of Train and Validation Accuracy, F1_Score and Loss

After loading best weights for binary classification model, we are getting below accuracy and f1_score for test data.

Now, we will calculate the Accuracy and f1_score at different thresholds.

We know that if we don’t define the threshold value then model will automatically take the 0.5 as the default value but at threshold equals to 0.4 gives the good result, so we will take threshold value equal to 0.4.

Analysing the misclassified images

Now let’s analyse the misclassified images and check whether our model predicted wrong or there is some problem in the images.

Let’s take image 1 & image 3, we can see that these are very small defects but still our model predicted correctly with 25% & 33% probability but due to threshold equals to 0.4, they went into the misclassified points. Now let’s take image 2, image 4 & image 5 we can see some defects but may be due to error or some problem they put these images into the category of no defects and our model predicted them as defected images with very good probability.

9. Segmentation Model:

For Segmentation model, I used the Unet architecture with EfficientNetB5 backbone model and the weights trained from image-net data.

Input pipeline: for this model I used the ImageDataGenerator. In this model it is necessary to use other type of generator rather than keras generator because we have to get EncodedPixels (data_y’s) to our generator to train, which is not possible by using keras Image generator. It can be done by using custom data-generator by Stanford Edu.

RLE (Run Length Encoder): We need to convert RLE’s provided in train data into masks to get fit in train data so we do it as follows:

Data Generator Pipeline:

Metrics: This competition is evaluated on the mean Dice coefficient. The Dice coefficient can be used to compare the pixel-wise agreement between a predicted segmentation and its corresponding ground truth.

Model: Segmentation model — Unet architecture with EfficientNetB5 backbone

After training the generator for few epochs we got a good performance of the models

As we can see that out segmentation model gives validation dice coef of 0.7153 and training dice coef of 0.7652 by the end of 20 epochs, it means that our segmentation model is not getting over fitted because of class imbalance.

Analysis of results:

Let us have a glance at some random mask prediction results.

Class 1 Defect prediction:

Class 2 Defect prediction:

Class 3 Defect prediction:

Class 4 Defect prediction:

Images without defect prediction:

Right hand side images shows the original masks and left hand side images shows the predicted masks that has been predicted by our model with probability. These Results show that performance of our model is good.

10. Summary:

· Images and its masks (in form of EncodedPixels) are provided to train a Deep Learning Model to Detect and Classify defects in steel. The competition is hosted by Severstal on Kaggle.

· Exploratory Data Analysis revealed that the dataset is imbalanced. It was observed that most of the images either contain one defect or do not have a defect.

· 2 model architecture is generated to train and test on this dataset. One binary classifier and one segmentation model are used for the task.

· Image data contains minimal pre-processing. Pixel value scaling for Model training are achieved using Data Generators.

· Minority class priority based stratified sampling is performed on the dataset to split train set into train and validation sets.

· Pre-trained Deep Learning models are used: Xception architecture for Classification and Unet architecture with efficientnetb5 backbone trained on ImageNet dataset for Segmentation.

· Tenosorboard is utilized for saving logs and visualising model performance at each epoch. It has been observed that the models have satisfactory performance on defined metrics.

11. Conclusion:

· Unet architecture with efficientnetb5 backbone trained on ImageNet dataset for Segmentation brought pretty good results. But higher computation power will allow us to include a larger Batch size for training all the models (increasing from 8 to 16 or 32) and we can increased the resolution.

· Different architectures can be experimented such as using the Binary and Multi-label Classifier separately to detect the defects and classify the defects in class -1, 2, 3, 4.

· Improving the quality of training data fed into the Neural Networks defines the performance. Techniques such as Test Time Augmentations can be experimented.

· Resolution of the output from ImageDataGenerators can be varied.

· You can also use data Augmentation techniques for both train and test Augmentation.

12. References:

Linkedin:

Ishan Garg - System Engineer - Tata Consultancy Services | LinkedIn

View Ishan Garg's profile on LinkedIn, the world's largest professional community. Ishan has 3 jobs listed on their…

www.linkedin.com

Thanks everyone for taking the time out and reading my blog. Open to any suggestions for improvement.

Cheers and have a good day.