Classification of fiber production defects

Karoly Szalai
unpack
Published in
6 min readAug 24, 2021

As the final project of my deep learning bootcamp with unpackAI, I wanted to choose a “real-life” use case from my daily work, that might as well evolve into a real production application if things go right.

The author in the fiber factory

So I decided to sharpen my teeth on a “simulated version” of an application I’ve been working on lately, where the goal is to classify defects that occur during fiber production. The inspection object is a molten glass stream (basically a hot fluid that simply looks like a white stripe), and by capturing and analyzing images need to catch defects related to the presence, size or angle of the stream.

As I don’t have a fiber production furnace in my office, I had to simulate the process with a bright LED-stick, for the cameras I used the same industrial models which are applied in production environment.

For the classification I defined 4 classes, to keep things simple:

  • normal (no defect)
  • angle (the stream has angle in left/right direction)
  • big (stream diameter too big)
  • trimmed (stream brightness is too low or missing stream).

For the project I used the fastAI deep learning library, and conducted the experiments inside the Google Colab platform.

Dataset

For the dataset I captured images of the LED stick by 2 different models of industrial cameras, with different settings like with/without automatic light intensity.

Example image — angled stream
Example image — normal stream, turned off automatic light intensity

In total I collected 80–80 images for each categories, and manually uploaded them to my Google Drive folder.

So had to set the path of the dataset to the Google Drive folder where I put the images:

Dataloader

The dataloader is a crucial part for good model performance, as it tells the model what kind of data it gets, how to interpret the labels, how to split the training/validation sets, or what transformations it needs to perform on the data.

Some decisions made for the dataloader:

  • split the training/validation sets randomly, in 75% — 25% ratio;
  • resize the images from the original 1280x960 pixels to 256x256 pixels → given that the images do not contain very fine features, lowering the resolution should not cause much trouble for model training.

First collect the dataloader parameters into a datablock (“stream”), then define the dataloader (“dls”) for our dataset:

Data augmentation

For this application, as we try to detect geometric defects like angle or size, it does not make sense to distort the original object geometries by complex image transformations. Besides, given the relatively easy task for a deep learning model sophisticated augmentation techniques are not necessary.

So I decided to simply do a pad transformation on the images, to make sure that the object geometry is not distorted and the images from 2 different cameras (with different resolution) are transformed into the same format.

Few examples of the transformed images

It’s time to learn!

Now that we have the datasets and the dataloader ready, nothing can stop us (except a Colab meltdown) from training our model!

For our learner I used the resnet architecture, as an ideal pre-trained neural network for image classification tasks. The smallest resnet architecture uses 18 layers, it should do the job for our white stripes. I used the error rate as a single metric, and went for 20 epochs in the first fine-tuning run:

The results don’t look bad at all:

Both the training and validation losses nicely keep falling over all epochs, no signs of overfitting on the horizon. It seems we did not make the neural network sweat too much, from epoch #17 it just can’t fail anymore.

Just as a formality let’s print out the confusion matrix, or in our case we could call it the “over-confidence matrix”:

That’s actually looking pretty healthy, we could even save time by plotting only the diagonal. :)

If we check the 8 top losses, we can see that the model had most of headache with the “normal” class, with a lowest confidence score of 0.85:

That’s all to it, or can we make it better?

Data cleaning

In our case we fed a nicely cleaned and prepared dataset into the model, but the devil never sleeps, I double-checked some of the images to avoid any wrong image or label in the dataset.

For this purpose prepared a cleaner tool, but in the end did not find any suspicious images.

Size matters?

It became clear that resnet18 does a pretty decent job with our white stripes (might even call it an overkill for our task), even by fine-tuning with only 10–20 epochs.

And through 30 epochs we can hammer down the losses to the micro world, no overfitting in sight. What is interesting to see is, that the validation loss goes way below the training loss.

The last few epochs with resnet18 / 30 epochs

But I was still curious, if our light-weight resnet can do so well, what were his mighty big brothers able to achieve?

So first I ran a learning session with resnet34, 10 epochs:

Fine-tuning results with resnet34

We can see that resnet34 could reach zero error after epoch #8, while resnet18 had to hustle until epoch #17 for the same result. Nothing to see here.

But then the surprise came after tossing resnet50 into the fine-tuning game:

Fine-tuning results with resnet50

Wow. Just wow. Resnet50 had a really bad day or a nasty hangover, whatever the case is, it messed up big time. You had one job, resnet50.

Here also used 10 epochs, the losses and error rate reach a minimum around epoch #7, but then start getting worse. It smells like overfitting, not sure what’s the cause for this, but I suspect resnet50 is just too complex for this simple task and learns the patterns too easily. (Or maybe just middle-age crisis, what an 18 or 34 can do, not every 50 can manage… :D)

Deploy the model

For this part I can only give a teaser for now, as it is not yet done. I plan to export the model and embed it into a low-level python program, and run it on an offline production machine.

The goal would be to deploy the model on a PC with as low resources as possible, my Raspberry Pi already started shaking out of fear, but he can’t run away. :)

And if I can confirm that the model is able to run reliably on low resources, I am also going to add regression for measuring the diameter of the stream.

To be continued…

--

--