AI in Medical Imaging for Beginners: III MRI Preprocessing

JC Climent Pardo
9 min readAug 18, 2024

--

Introduction

In the previous blog post we discussed the basic functioning of MRI types and their sequences, but how process these different images? What is the best preprocessing suited for my current images? Why do I even need to process them? No worries, I gotcha. Here we will cover the main reasons for preprocessing and some the different steps that you should do on your images, even with some code implementations in Python.

So why preprocessing?

Well, in order to be able to train AI algorithm on a standard dataset, you first need to curate this dataset. Often we find out-of-the-boy datasets where people already spent a lot of time and effort making them ready for you. But usually, and especially in the medical domain, this is not the case. The raw clinical data is a mess, and it needs a little bit of careful attention to have it in the right format before we develop any algorithm.

Additionally, as the field of medical imaging evolves, so too do the challenges associated with preprocessing, especially when working with multi-centric datasets. Today, studies often combine datasets across multiple institutions to improve generalization. While theoretically sound, this approach presents significant real-world challenges. Data collected using different equipment and methodologies can exhibit huge variability due to differences in scanner models, manufacturers, acquisition parameters, and imaging protocols. Such inconsistencies can lead to significant disparities in image quality, complicating the development of robust AI models that generalize well across different data sources.

And for these reasons, we need preprocessing!

Typical Preprocessing Steps

So, let’s start processing the images in order to make them AI readable. As you may have guessed, the employed preprocessing steps require a little bit of domain knowledge, depending on what type of study is going to be conducted. The traditional steps are listed below (ordered alphabetically) and you should definitely know them before applying them!

Exemplary MRI preprocessing pipeline to get them to formats where AI algorithms.
  • 3D to 2D Conversion: This step is typically used when there is limited computational memory to train a 3D model or when the dataset is small. By converting 3D volumetric MRI data into 2D slices, each slice can be processed independently, reducing the memory overhead and potentially simplifying the model training process. It is important though that the scans are co-registered (aligned to the same coordinate system) before conversion. For co-registration see more below.
import nibabel as nib
import os

def convert_3d_to_2d(nifti_path, output_dir):
img = nib.load(nifti_path)
data = img.get_fdata()

# Save each slice as a separate 2D NIfTI file
for i in range(data.shape[2]):
slice_2d = data[:, :, i]
slice_img = nib.Nifti1Image(slice_2d, img.affine)
output_path = os.path.join(output_dir, f'slice_{i:03d}.nii.gz')
nib.save(slice_img, output_path)
  • Bias Field Correction: Bias field correction is essential for addressing intensity inhomogeneities in MRI images, which can arise due to the non-uniform sensitivity of the radio frequency coils and other factors within the MRI machine. The N4 algorithm is a widely used method for correcting this bias. It works by modeling the bias field as a smooth, low-frequency multiplicative noise and then correcting the voxel intensities to provide a more uniform and accurate representation of the underlying tissue properties. This correction is particularly important for subsequent tasks like segmentation or intensity-based analysis. Used in sMRI.
import SimpleITK as sitk

def bias_field_correction(input_path, output_path):
img = sitk.ReadImage(input_path)
mask_img = sitk.OtsuThreshold(img, 0, 1, 200)
corrector = sitk.N4BiasFieldCorrectionImageFilter()
corrected_img = corrector.Execute(img, mask_img)
sitk.WriteImage(corrected_img, output_path)
Bias filed correction process: from left to right, biased image, segmentation mask and corrected images in column 3 and 4. Source: [2]
  • Binning: Binning is the process of reducing the number of intensity levels in an image, often to simplify the data or reduce noise. For instance, in an image with a 16-bit depth (65,536 possible intensity levels), binning might reduce the depth to 8 bits (256 levels). This quantization process can help in reducing the complexity of the data and in some cases, emphasize certain features by grouping similar intensity levels together. However, be careful to avoid losing critical information during this step!
import numpy as np
import nibabel as nib

def binning(input_path, output_path, num_bins):
img = nib.load(input_path)
data = img.get_fdata()
binned_data = np.digitize(data, np.linspace(data.min(), data.max(), num_bins))
binned_img = nib.Nifti1Image(binned_data, img.affine)
nib.save(binned_img, output_path)
  • Co-Registration: Co-registration aligns images from different modalities or time points to a common reference, ensuring that corresponding anatomical structures overlap correctly. This is particularly important in longitudinal studies, where changes over time are being monitored, or when integrating data from different imaging modalities. This step is essential for comparison and further analyses. Used in sMRI and diffusion-based ones. The exemplary code is SITK, but there are plenty of options!
def co_registration(fixed_image, moving_image):
"""Co-register two images."""
registration_method = sitk.ImageRegistrationMethod()
registration_method.SetMetricAsMattesMutualInformation(numberOfHistogramBins=50)
registration_method.SetOptimizerAsGradientDescent(learningRate=1.0, numberOfIterations=100)
registration_method.SetInterpolator(sitk.sitkLinear)
final_transform = registration_method.Execute(sitk.GetImageFromArray(fixed_image),
sitk.GetImageFromArray(moving_image))
return sitk.GetArrayFromImage(sitk.Resample(sitk.GetImageFromArray(moving_image),
sitk.GetImageFromArray(fixed_image),
final_transform, sitk.sitkLinear, 0.0,
moving_image.dtype))
Registration of native, raw images of patients to a standard template gathered by the average. Source: [3]
  • De-noising: De-noising aims to reduce random noise in the MRI images, which can obscure important details and reduce the accuracy of subsequent analyses. The methods vary from simple Gaussian smoothing to more sophisticated methods like non-local means filtering. The goal is to preserve the true signal while minimizing the noise, enhancing the clarity and reliability of the images. Used in sMRI.
from skimage.restoration import denoise_nl_means, estimate_sigma
import nibabel as nib

def denoise(input_path, output_path):
img = nib.load(input_path)
data = img.get_fdata()
sigma_est = np.mean(estimate_sigma(data, multichannel=False))
denoised_data = denoise_nl_means(data, h=1.15 * sigma_est, fast_mode=True, patch_size=5, patch_distance=3, multichannel=False)
denoised_img = nib.Nifti1Image(denoised_data, img.affine)
nib.save(denoised_img, output_path)
  • DICOM 2 NifTI: This step involves converting MRI data from the DICOM format (which is commonly used in clinical settings) to the NIfTI format, which is more commonly used in research.
import dicom2nifti

def dicom_to_nifti(dicom_dir, output_path):
dicom2nifti.convert_directory(dicom_dir, output_path, compression=True)

def dicom_to_nifti(dicom_directory, output_file):
"""Convert DICOM series to NIfTI format."""
reader = sitk.ImageSeriesReader()
dicom_names = reader.GetGDCMSeriesFileNames(dicom_directory)
reader.SetFileNames(dicom_names)
image = reader.Execute()
sitk.WriteImage(image, output_file)
  • Filtering: Filtering enhances specific features of the MRI images or removes certain types of artifacts. Filters can be applied to remove noise, enhance edges, or suppress background structures. Common filters include Gaussian, median, and Laplacian filters, each serving a specific purpose depending on the analysis goals. Used in fMRI.
from scipy.ndimage import gaussian_filter

def apply_filter(image, filter_type='gaussian', **kwargs):
"""Apply various filters to the image."""
if filter_type == 'gaussian':
return gaussian_filter(image, **kwargs)
elif filter_type == 'median':
return sitk.GetArrayFromImage(sitk.Median(sitk.GetImageFromArray(image), **kwargs))
elif filter_type == 'laplacian':
return sitk.GetArrayFromImage(sitk.LaplacianRecursiveGaussian(sitk.GetImageFromArray(image), **kwargs))
else:
raise ValueError("Unsupported filter type")
  • Motion Correction: Motion correction is critical in fMRI and DTI studies, where even slight movements can introduce artifacts that can mislead the analysis. This step realigns the images to compensate for any movement that occurred during the scan, ensuring that the data accurately reflects the underlying brain structures or functions. Algorithms typically involve realigning each image slice or volume to a reference image, minimizing the impact of motion. Used in fMRI and diffusion-based ones.
def motion_correction(image_4d):
"""Perform motion correction on a 4D image."""
corrected = np.zeros_like(image_4d)
reference = image_4d[:, :, :, 0]
for i in range(image_4d.shape[3]):
moving = image_4d[:, :, :, i]
registration_method = sitk.ImageRegistrationMethod()
registration_method.SetMetricAsCorrelation()
registration_method.SetOptimizerAsGradientDescent(learningRate=1.0, numberOfIterations=100)
registration_method.SetInterpolator(sitk.sitkLinear)
final_transform = registration_method.Execute(sitk.GetImageFromArray(reference),
sitk.GetImageFromArray(moving))
corrected[:, :, :, i] = sitk.GetArrayFromImage(sitk.Resample(sitk.GetImageFromArray(moving),
sitk.GetImageFromArray(reference),
final_transform, sitk.sitkLinear, 0.0,
moving.dtype))
When a person moves in the scanner, the images might be corrupted. They need motion correction. Source: [4]
  • Normalization: Normalization deals with aligning the intensity values or spatial attributes of MRI images to a standard or common reference. It ensures consistency across images, which is crucial for comparative studies, multi-subject analyses, and statistical analyses. Typical normalization techniques include intensity and spatial normalization. Checkout this article for more normalization details!
import ants

def spatial_normalization(input_path, template_path, output_path):
img = ants.image_read(input_path)
template_img = ants.image_read(template_path)
normalization = ants.registration(fixed=template_img, moving=img, type_of_transform='SyN')
ants.image_write(normalization['warpedmovout'], output_path)


def normalize(image):
"""Normalize image intensities to [0, 1] range."""
return (image - image.min()) / (image.max() - image.min())
Normalization brings images into a standardized space or intensity range to enable consistent analysis across subjects or studies. Source: [5]
  • Quality Control: Quality control (QC) ensures that the MRI data is of sufficient quality for analysis, identifying and flagging any issues that could compromise the results. QC might involve some visual inspection trough visualizing tools like 3DSlicer, automated algorithms for detecting outliers or artifacts, and checks for issues like excessive motion or poor signal-to-noise ratio. Ensuring high-quality data at this stage is crucial for reliable and valid downstream analyses.
import nibabel as nib

def quality_control(input_path):
img = nib.load(input_path)
data = img.get_fdata()

# Basic quality checks
if np.isnan(data).any():
raise ValueError("Image contains NaNs.")
if data.max() == 0:
raise ValueError("Image seems to be empty.")
print("Basic QC passed.")

def quality_control(image, snr_threshold=10):
"""Basic quality control check based on SNR."""
signal = np.mean(image)
noise = np.std(image)
snr = signal / noise
return snr > snr_threshold
  • Resampling/ Resizing: Resampling adjusts the voxel size or resolution of the MRI images, often to match a common resolution across datasets or to facilitate comparison with other images. This step is necessary when integrating data from different sources, or when preparing data for specific analyses that require a uniform voxel size. Resampling can also help reduce data size, making it more manageable for processing and storage. A standard resizing is 1x1x1 voxels, also called as isotropic size.
import nibabel.processing as nibproc
from dipy.align.reslice import reslice

def resample(input_path, output_path, new_voxel_dim=(2, 2, 2)):
img = nib.load(input_path)
resampled_img = nibproc.resample_to_output(img, new_voxel_dim)
nib.save(resampled_img, output_path)

def resample(image, new_spacing=(1.0, 1.0, 1.0)):
"""Resample image to new voxel spacing."""
return reslice(image, new_spacing, mode='linear')
Voxel definition and how to adjust it in MRI images. Source: [6]
  • Skull Stripping: Skull stripping is only relevant for brain studies, removing non-brain tissues (e.g., skull, scalp) from MRI images, focusing the analysis on brain structures. This step is particularly important for neuro-imaging studies where the focus is on brain anatomy or function. Accurate skull stripping is essential to avoid including non-brain tissues in the analysis, which could distort the results. Used in sMRI.
import ants
from dipy.segment.mask import median_otsu

def skull_strip(input_path, output_path):
img = ants.image_read(input_path)
brain_img = ants.skull_strip(img)
ants.image_write(brain_img, output_path)

def skull_strip(image):
"""Perform skull stripping using median_otsu method."""
b0_mask, mask = median_otsu(image, median_radius=4, numpass=4)
return image * mask
Skull and scalp tissue removing to only have the brain structure. Source: [7]
  • Slice-Timing Realignment: This step corrects for the fact that different slices of the brain are captured at different times during the scan, particularly in fMRI data. It adjusts the timing of the slices to align them to a common temporal reference, improving the accuracy of the functional analysis. Used in fMRI.
def slice_timing_correction(image_4d, tr, slice_order='ascending'):
"""Perform slice timing correction."""
n_slices = image_4d.shape[2]
slice_times = np.arange(n_slices) * (tr / n_slices)
if slice_order == 'descending':
slice_times = slice_times[::-1]
elif slice_order == 'interleaved':
slice_times = np.concatenate((slice_times[::2], slice_times[1::2]))

corrected = np.zeros_like(image_4d)
for i in range(n_slices):
corrected[:, :, i, :] = np.interp(np.arange(image_4d.shape[3]) * tr - slice_times[i],
np.arange(image_4d.shape[3]) * tr,
image_4d[:, :, i, :], axis=-1)
return corrected
Slice timing correction between two slices in the same image. Source: [8]

Order, Problems & Enhancements

As you may have observed, there are a bunch of possible steps to apply. A standard processing would usually include the following:

  • Dicom2NifTI: to have the images in the right format
  • Co-registration: All images aligned.
  • Voxel Resampling: All images with the same dimensions
  • Bias field: All images with the same intensities.
  • Normalization, filtering or 3D to 2D conversion: these are rather optional

But even with basic 4 steps, we quickly realize the methodologies have a lot parameters to fine-tune them. So how do we choose the best set? Well, unfortunately this comes with practice and domain knowledge. I would strongly suggest to implement some method that visualizes the images after each individual step to know how your images are being transformed.

In order to make your life easier, I developed such an automatic tool! We will discuss it in the next chapter.

Conclusion

I hoe you liked learning about all of the possibilities that can be applied to MRI images. Although a little overwhelming at the beginning, you can easily get results with a few minimal steps. So don’t be shy of trying them out!

References

[1] https://dartbrains.org/content/Preprocessing.html

[2] https://www.sciencedirect.com/science/article/pii/S0730725X14000927

[3] http://jpeelle.net/mri/image_processing/registration.html

[4] https://news.mit.edu/2023/mit-researchers-combine-deep-learning-physics-fix-motion-corrupted-MRI-scans-0817

[5] https://medium.com/@susanne.schmid/image-normalization-in-medical-imaging-f586c8526bd1

[6] https://www.researchgate.net/figure/Voxel-and-slice-in-3D-MRI-data_fig8_280608869

[7] https://jerrylinew.github.io/cs188/

[8] https://mriquestions.com/data-pre-processing.html

--

--