Improving Gas Cylinder Digits Detection with Synthetic Dataset

Natthasit Wongsirikul
5 min readOct 15, 2021

--

The problem statement was a gas cylinder repair factory wanted to automate screening process to separate cylinder into two groups: enter factory or not-enter factory. The criteria were to inspect the last manufacturing date, if the tank was manufactured at least 5 years ago, then it can enter the repair factory, if not it will go back into circulation. So, an automated method was needed to read these manufacturing dates.

This was your basic OCR task; however, things were not that simple. The main problem was these digits were painted onto the cylinder’s where they got worn down over time. I have tried using Google AI service to read these digits, but it didn’t work.

Here in this post, I will share an example of how generated synthetic dataset can help improve model performance.

Data Collection

A jig was built where the cylinder was laid on its side. A high-def camera was installed above the cylinder to capture the its body, specifically where the manufacturing dates were painted.

The cylinders were slowly rotated while the camera captured the images.

We captured about 967 images of readable digits. To enhance the digits readability, we performed histogram equalization to brighten the image.

Once the data was cleaned and preprocessed, we annotated the digits with bounding boxes and framed this problem as an object detection task.

Dataset Analysis

We realized that we had a bad problem with data imbalance where the digits were not equally distributed. The majority of the digits were 0,1,2,3,5, and 6 while there were barely any 4,7,8, and 9 digit. Below is a frequency plot of digits with evenly distributed training, validation, test set split.

This will result in a bad performing model. Regardless, I went ahead and trained the model. The model I used at that time was YoloV3.

Model Training

Below is the result table describing the object detection performance for each digit. As expected, the performance was not that great for digits without much training data. Surprisingly, the model was not able to detect digit 1 even though there were a lot of examples in the training set.

Red is prediction and green is ground truth

Generate Synthetic Dataset

First, I collected the digits fonts from Microsoft Word which total to 1,630 samples of font evenly distributed among all digits.

Then I collected background images of cylinders as well as metal surfaces via internet image scraping.

Backgrounds

After that I created a script that extracted out only the digits font then overlay them onto the metal surface background. I also applied some image processing techniques such as thresholding to create mask for the digits to make them look worn out.

In addition, I also apply different whiteness level to the fonts as well as different font size. The fonts were randomly selected and their location to overlay on the background were also randomly assigned. Below are some synthetic dataset outputs generated.

A total of 4275 images were generated for training, and 1519 images for validation which were greater in quantity compared to 969 images of real dataset.

Experiments

What I did was to first train the model with only the synthetic dataset then test the model on real-data test set. For the second model, I trained it on the synthetic dataset but then fine-tune the model on the real-data training set. Then, I compared the two models.

Below is the result of synthetic dataset only model.

Green is ground truth and red is prediction

Below is the result of synthetic dataset fined-tuned on real dataset model.

Green is ground truth and red is prediction

For easy comparison, I compare the mAP of 3 models and show the histogram plot of TP, FP, and FN by digits for each model.

Top is real-data only, middle is synthetic-data only, and bottom is synthetic + real-data. The x-axis is digits and the y-axis is counts. Blue bar is True Positive, orange bar is False Positive, and gray bar is False Negative

Afterward, I did a simple hyperparameter search for the optimal confidence threshold that yielded the best F1-score which pushed the best performing model mAP from 44.06% to 47.32%. This is an example of how synthetic dataset can actually help improved model performance when dataset is small and imbalanced.

I want to thank Danuwasin Sittiworachat for preparing and analyzing the dataset. I also want to thank the project manager Sutthipan Techasena for coordinating the collection of this dataset.

--

--

Natthasit Wongsirikul

I'm a computer vision engineer. My interest span from UAV imaging to AI CCTV applications