Covid-19 radiology — data collection and preparation for Artificial Intelligence

4 min readMar 28, 2020

--

Case example is taken from the Italian Society of Medical and Interventional Radiology (SIRM) with the author´s segmentations.

COVID-19 poses a new and significant challenge for health care systems all over the world. CT has proven to be a vital form of evaluating COVID-patients in terms of progression. For radiologists and other health care professionals, the sheer number of patients can be daunting. Methods such as AI-based volume segmentation may help to alleviate the health services by providing a faster way of objectively evaluating the radiological CT-images. In this article, we suggest a quick and efficient way of developing an image collection from openly available sources on the internet, usable for artificial intelligence training. Note: We are radiologists (medical doctors) and not formally trained in coding.

Googling COVID-19 radiological images yields a fairly high result rate. We picked the Italian Society of Medical and Interventional Radiology’s excellent collection of about 60 cases with example CXRs and single slice CT-images. A simple download from these cases resulted in 110 usable, axial CT-images of confirmed COVID-19 cases. Images and cases are available from https://www.sirm.org/category/senza-categoria/covid-19/

Downloaded images were in the format of JPG. During the conversion from original DICOM to JPG, the images had gone through cropping and resizing and lost a part of the upper intensity-range. They did, however, contain enough relevant information, but would need to be normalized back to the original Hounsfield Unit scale. The images were resized, greyscaled and compiled into a single NIFTI-file (512 x 512 x 110). They were then reversely intensity-normalized by taking RGB-values from the JPG-images from areas of air (either externally from the patient or in the trachea) and fat (subcutaneous fat from the chest wall or pericardial fat) and used to establish the unified Houndsfield Unit-scale (the air was normalized to -1000, fat to -100).

Example of obtaining the RBG-value from air using MedSeg. In this case, the RGB-intensity is obtained outside the patient using the dim blue circle (ROI, region of interest) yielding an average of 75.766 (rounded off to 76). Pressing the key ‘i’ brings up this chart in MedSeg.

For ease, the RGB-values were “marked” into each image using a corresponding numbered mask label in such manner:

Here, the same image as above has been labeled using “Label 76 (Purple)” to mark the RGB-intensity of air in this particular case. This mask label was further used to normalize the image to correct HU-range in an automated fashion.

The segmentation was done by a trained radiologist using MedSeg. Three labels were used: 1 = ground class opacification, 2 = consolidations and 3 = pleural effusions.

Example of the manual segmentation

All coding was done in Python. The resulting NIFTI-files, both normalized images, and masks are openly available at medicalsegmenation.com/covid19. Further, we used this data to train a U-Net model, which is uploaded and available interactively in MedSeg, through TensorflowJS. We do not describe this last part as this is well documented by many others.

Complete code:

Step 1: Import relevant libraries

import numpy as np
import os
import nibabel as nib
import matplotlib.image as mpimg
from skimage.transform import resize

Step 2: Count your unique jpg-images in your relevant folder (in case some other files have snuck in)

counter = 0
for i in os.listdir("JPG_directory/"):
    if i.endswith(".jpg"):
        counter+=1
print(counter)

Step 3: Create a new array that will become a NIFTI-file. The shape is of 512 x 512 (pixel size) and counter (the number of JPGs you want to include).

new_nifti = np.zeros((512, 512, counter))

Step 4: Add the jpg-images to your array by resizing them to 512 x 512, grayscaling and finally flipping them, so that the orientation is correct when saved as NIFTI.

counter = 0
for i in os.listdir("JPG_directory/"):
    if i.endswith(".jpg"):
        img = mpimg.imread("JPG_directory/"+i)
        img = img.astype("float64")
        
        resized_img = resize(img, (512,512,3), preserve_range=True)
        resized_img = resized_img[:,:,0]        fl_resized_img = np.fliplr(np.rot90(resized_img, k=3))
    
        new_nifti[:,:,counter] = fl_resized_img
        
        counter+=1

Step 5: Save your new NIFTI-file containing the resized JPG COVID-19 images

savefile = nib.Nifti1Image(new_nifti, None)
nib.save(savefile, "JPG_nifti.nii")

Step 6: Time to normalize the images. We create a normalize function, here we use fat as -100 HU and air as -1000 HU:

def normalize_function(img, air, fat):
    air_HU = -1000
    fat_HU = -100
    
    delta_air_fat_HU = abs(air_HU - fat_HU)
    delta_air = abs(air - air_HU)
    delta_fat_air_rgb = abs(fat - air)
    ratio = delta_air_fat_HU / delta_fat_air_rgb
    
    img = img - air
    img = img * ratio
    img = img + air_HU
    return img

Step 7: Obtain the unique intensity values for each image denoting air and fat. We used our own tool on MedSeg to establish an average using an ROI and added two mask-labels to each image, one for fat and one for air (see example above). Save this mask file and load it along with the original NIFTI-file containing the images.

mask = nib.load("MASK_FILE_WITH_FAT_AIR_LABELS.nii")
mask_np = np.array(mask.get_fdata())
rgb_image = nib.load("JPG_nifti.nii")
rgb_image_np = np.array(rgb_image.get_fdata())

Step 8: In case you have chosen not to use all of the images you have compiled, it is best to sort the useless ones out

counter = 0
for i in range(mask_np.shape[2]):
    if len(np.unique(mask_np[:,:,i]))==3:
        counter+=1
print(counter)
new_normalized_nifti = np.zeros((512, 512, counter))

Step 9: Use the normalizing function to prepare the images into one NIFTI-file ready for segmentation

counter = 0
for i in range(mask_np.shape[2]):    unique_values = np.unique(mask_np[:,:,i])    if len(unique_values)==3:        air = unique_values[1]
        fat = unique_values[2]        rgb_slice = rgb_image_np[:,:,i]
        normalized_slice = normalize_function(rgb_slice, air, fat)        new_normalized_nifti[:,:,counter] = normalized_slice        counter+=1

Step 10: Save your new NIFTI-file containing all the converted JPGs which are now normalized

savefile = nib.Nifti1Image(new_normalized_nifti, None)
nib.save(savefile, "COVID-19.nii")

Covid-19 radiology — data collection and preparation for Artificial Intelligence

Written by Håvard Bjørke Jenssen