Ch 1. Data Inspection and Pre-processing for Xray Images

Inspecting object classes, removing duplicates, minimizing white background, and exploring other Xray datasets

8 min readSep 11, 2021

In my previous post, I shared my process of getting started with my computer vision research project for my masters. I was given Xray baggage scan images by an international airport to develop a model that can automatically detect harmful objects in the baggages. With only a small amount of Xray images provided, my supervising professor suggested me to use Domain Adaptation, which involved first collecting a large number of normal (non-Xray) images of dangerous objects from the internet, training a model using only those normal images, then adapting the model to perform well on Xray images. After getting the project overview, I had many questions in my head:

Just how small is the amount of Xray images we are given?
Are there other publicly available Xray image datasets I can use as additional data?

There was only one place to start answering all these questions: looking at the given data.

One most important thing I learned from my previous experiences in ML was inspecting data and performing appropriate data pre-processing as the first step in a machine learning problem. Even if the given data seems to be in a clean, organized format, it is important to carefully inspect it to check if it is in a desired state before diving into modelling and training. Let me highlight an important point I learned:

Optimizing input data is an ITERATIVE process rather than a one-time thing. It’s important to keep going back to inspect and optimize data when debugging model performance.

I will keep mentioning this in my future posts about this project. In the mean time, here is a list of my initial data inspection and pre-processing steps for the Xray images:

Inspecting the Object Classes
Selecting a Subset of Object Classes to work with
Removing Duplicates
Minimizing White Background
Finding Open Source Xray Baggage Scan Image Datasets

Let’s look at each step in detail.

1. Inspecting the object classes

The international airport associated with this project provided us with Xray baggage scan images from their own Xray security scanner. The dataset was given in a clean, organized folder that looked like this :

The given Xray dataset contained 9 classes of threat objects, counting by the number of folders. Each folder had 50 to 500 Xray baggage scan images containing the corresponding object class. Here I say images “containing” the object rather than images “of” the object, since we are dealing with Xray scans of the entire passenger baggages at the airport rather than Xray scans of each individual object. The following Xray image “contains” a knife, rather than “is” a knife. Noticing this difference is crucial when performing domain adaptation.

An Xray baggage scan image containing a knife

We can also notice from the given folders that there are three different types of gun (handgun, airgun, other_firearms) grouped separately. When I looked through the images belonging to each class; however, I did not find a significant difference.

A sample image from “handgun” (left), “airgun” (middle), and “other_firearms” (right) classes

So I grouped the three classes into a single class “gun”, and was left with 6 classes: knife, gun, battery, phone, hard disk, shuriken, and usb.

2. Selecting a Subset of Object Classes to work with

I chose 2 out of the 6 classes to focus on: gun and knife. The reason was that those were the two most dangerous objects in my opinion. In addition, I wanted to simplify the problem by focusing on the two objects initially. When I found a reasonable solution, I planned to adapt the model to a larger number of object classes.

3. Removing Duplicates in Xray Images

I mentioned this briefly in my previous post, but I will do it again briefly here since it gave me the main motivation for this pre-processing step. Some previous students have worked on this project using the same Xray dataset. They used transfer learning without domain adaptation, meaning that they used the small amount of Xray images as training images. From their report, I found a table showing an astonishing performance of their model :

Recall Table: The left column lists the classes of dangerous items and the right column shows the model’s recall for detecting each item.

When I saw this, I doubted if there was anything I had to do to improve such high recalls. But when I took a closer look at the given Xray images, I found that there were many duplicates of the same image that were rotations of each other like the following:

Three different images that are rotations of each other

So if training/validation/test set were split among these duplicates, there could have been a possible data leakage between the three partitions. For example, if the test set contained images that are mere rotations of images from the training set, it was not surprising that the recall was high for the test set since the model had already seen the rotations of some of the test images. Having this kind of data leakage would have made the results seem good but prevented the model from generalizing well to unseen data.

The original dataset folder provided by the airport contained 1,050 Xray images containing a gun and 500 containing a knife. But the number of the rotated duplicates of the same image varied between 5 and 16 for each image, bringing down the number of unique images as 117 for gun and only 31 for knife. These numbers indeed seemed insufficient to train a neural network. Even if I were to increase the number of training data by augmenting the images via rotating, cropping, or resizing, it would be hard to prevent overfitting due to the small number of unique images.

4. Minimizing White Background in Xray Images

Another undesired characteristic I noticed in the given images containing gun and knife was that most of them contained a large blank white background that filled nearly three quarters of the entire image space. Considering image data’s relatively high-dimensional due to its 3D nature containing RGB values for each pixel (e.g. a small 256 by 256 image contains (256²)*3 = 196,608 numerical values), having a large white background is a waste of information. So I cropped all such images to tightly contain only the useful object in the image like this:

I did this by using a simple algorithm that crops out a sequence of white pixels. A computer encodes image pixels as decimals, with RGB values of [0,0,0] being black and [1,1,1] being white. So it would be easy to just crop out all-1 columns and all-1 rows. But the issue with this was that there existed small defects in the images that appeared as purely white to human eyes but were actually a bit darker than pure white. Take a look at the following :

noise-inserted images of width and height dimensions of 10, 50, 100, and 1000

This series of images are generated from randomly inserting noise pixels of RGB values ranging between 0.78 and 0.98 (remember, 0 means pure black and 1 means pure white). Although this noise is visible to us for a low-dimensional image, it becomes harder to detect as the image gets bigger.

Since the Xray images were around 1000 by 1000 pixels, it was easy for the noise pixels to be hidden. If one pixel of a column in the white background had a value of 0.95 instead of 1, the simple method of checking for all-1 columns and rows would not work. So instead, I first checked for columns and rows that are NOT all-1, then checked if any of them are more than 200 pixels apart from the previous not-all-1 column/row. This makes sense since the important content of the Xray images is a SINGLE object (a baggage), not multiple objects dispersed around the image with blank spaces between them. So if two consecutive not-all-1 columns or rows are more than 200 pixels apart (meaning that there are two different objects with more than 200-pixel whitespace between them), one of them must be a trivial defect in the background (e.g. a pixel value of 0.95 instead of 1). So I considered only the largest not-all-1 section of the columns and rows to be the area of the image that contains the object of interest, since the area of trivial defects must be smaller than the object. This worked well as a cropping algorithm.

The code and instructions for this task can be found here as a Google Colaboratory notebook.

5. Finding Open Source Xray Baggage Scan Image Datasets

If the problem was not having enough Xray images containing gun and knife, couldn’t I find some from the internet and use them as additional data? After searching for open source Xray baggage scan datasets, I realized that there really weren’t many. A survey paper about deep learning application in Xray security imaging mentioned that despite much interest, automated Xray scanning is under-researched due to the lack of Xray scanner datasets containing dangerous objects.

I managed to find two free Xray baggage scan datasets named SIXray and GDXray, which included images that contained gun and knife. However, the Xray texture of images differed from one Xray scanner to another as shown in the samples below. Although I was initially insensitive to this texture difference since I grouped all three of them as “Xray images”, I later realized that using all of them together as Xray data could confuse a deep learning model which is highly sensitive to the distribution of the incoming data. Since the main stakeholder for this project was the original airport that provided us with their scanner’s images, I decided to not use the other two Xray datasets for now.

Sample Xray baggage scan images from my project (left), SIXray (middle), and GDXray (right)

So these were some of the main steps I took as the initial data inspection and pre-processing. Although here I presented them in a neat list, at the time of figuring out what to do, it wasn’t always straight-forward. There was no clear guideline or deliverables. Data was just thrown to me with the requested end goal of automatic threat detection. All the rest was up to me. After performing the data inspection, I started to get a sense of how an independent ML research was like. It required curiosity, creativity and diligence.

In the next post, I will share my data collection process for the normal camera images of gun and knife to be used for the domain adaptation task.

Thanks for reading! ♥️