Classification of Chest-Xrays using PyTorch

Maryam Raji
Analytics Vidhya
Published in
5 min readJun 29, 2020
Photo by Adam Nieścioruk on Unsplash

Pneumonia is an infection in one or both lungs. Bacteria, viruses, and fungi cause it. The infection causes inflammation in the air sacs in your lungs, which are called alveoli. The alveoli fill with fluid or pus, making it difficult to breathe.

COVID-19 (coronavirus disease 2019) is a highly infectious disease caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2).

Chest X-ray imaging for COVID The radiological features of COVID-19 on chest X-ray are those of atypical pneumonia or organizing pneumonia. This means that differences between COVID-19 disease and Pneumonia are subtle. Though they both have features of lung consolidation(the air that usually fills the small airways in your lungs is replaced with something else. Depending on the cause, the air may be replaced with: a fluid, such as pus, blood, or water). (Consolidation on chest X-ray appears as an area of ‘white lung’), ground-glass opacities(the lung appears to have a gray glassy and matte veneer) )but in COVID-19 disease, pleural effusion is rare(3 percent). The distribution of lung consolidation on chest radiographs appears to be bilateral, peripheral, and more in the lower lobes in COVID-19.

A pleural effusion is a collection of fluid in the space between your chest wall and lungs. Like lung consolidation, it looks like white areas against the darker air-filled lungs on your chest X-ray. Since an effusion is a fluid in a relatively open space, it will usually move due to gravity when you change your position.

A lung consolidation may also be fluid, but it’s inside your lung, so it can’t move when you change positions. This is one way your doctor can tell the difference between the two.

Ultimately, according to the Center for Disease Control (CDC), even if a chest CT or X-ray suggests COVID-19, viral testing is the only specific method for diagnosis.

Here in the uppermost row, the areas of ‘white lung’(consolidation) are in the lower lobes or portions of both lungs in COVID-19 (and are more peripheral too(check the third image in the uppermost row for clarity.)

However, in other types of Pneumonia, the consolidation appears to be more widespread.

THE DATASET

The data is organized into 2 folders (train, test) and both train and test contain 3 subfolders (COVID19, PNEUMONIA, NORMAL). The data contains a total of 6432 x-ray images and test data have 20% of total images.

Let us import the following relevant libraries

The Kaggle platform was used for this work. Let us load the data and create datasets out of the data.

A Pytorch dataset has been created and transformed by passing the train_dir folder into the ImageFolder function. Then it is transformed using the transform parameter. Here, I resized them for faster training time, converted them to tensors, and normalized them using the mean and std of the dataset.

Let us view a few images.

Let us visualize some of these images

Here in the showexample function, we had to reorder the dimensions of an image,making the number of channels to be last for us to use the matplotlib library.

I am going to split the train dataset in the train_dir folder into training and validation sets by applying a seed generator and the random_split function.

I need to apply some extra transformations to the training set using RandromCrop, RandomHorinzontalFlip, and ColorJitter to create extra features for my image classifier to learn.

A function to determine the counts of the classes present in the train variable

It appears that the dataset is imbalanced. Therefore we are going to use the WeightedRandomSampler to oversample from the minority classes. Weights are calculated and passed into the sampler above and then the train data and validation data is passed into the respective DataLoader functions. I set a batch_size of 28 to prevent session crashes using Kaggle kernels.

Now that we have prepared our data loaders, we can define our model and some helper functions for training, validation, and predicting accuracy.

Let us transfer the selected model to GPU with the aid of some helper functions.

In the above cell, the model and training and validation sets are passed into the DeviceDataLoader

We can see a batch of 28,3 x 128 x128 inputs and get 28,3 outputs. For each of the 28input images, we get 3 outputs, one for each class.

Hyperparameters for the fit function are passed into the optimizer function, the model is trained and validated four epochs and the losses and validation accuracy are recorded.

The validation accuracy is 0.6762 for the convolutional neural network I made. I trained other models (Resnet-50,Resnet-101(0.6491,0.6638).

Resnet-18,wide-resnet-101, and alexnet performed terribly with an average of 20% accuracy.

Here the validation loss was quite higher than training loss but with the use of weight decay and gradient clip,the validation loss began to decline.

The Res-nets variations were observed to have higher validation losses than for the convolutional neural networks.

Training and validation loss for poorly performing wide res-net101
Training and validation loss for well-performing resnet-50(using the CNN val_acc as a benchmark)

Let us make some predictions, shall we? The test data is transformed and passed into the DataLoader.Then a helper function is called to help us make predictions. Note the test accuracy score is close to that of the validation accuracy score.

Here, we see the model predicting pneumonia when the actual class is COVID-19, which could be due to the fact that there are just subtle changes between the chest-Xray of a patient with pneumonia and that with COVID-19.

In the nearest future, I would want to use an ensemble model to boost performance (I am currently working on an ensemble model but the Kaggle and Google Colab sessions keep crashing). I also would try my hands at image segmentation techniques using the Dark-Net Classifier and the object detection framework(YOLO) (available at https://github.com/muhammedtalo/COVID-19)

So this is it. I would post updates to this classification tasks soon.

References

https://www.healthline.com/health/lung-consolidation#vs.-pleural-effusion

Sarkodie BD, Kwadwo Osei-Poku2 and Edmund Brakohiapa(2020)

https://medical-case-reports.imedpub.com/diagnosing-covid19-from-chest-xray-in-resource-limited-environmentcase-report.pdf

--

--

Maryam Raji
Analytics Vidhya

I am a Data Scientist in training, mobile web development enthusiast , writer who loves all forms of creativity. Oh! lest I forget,I am also a medical doctor.