Dissertation + Supercomputer + Brain Imaging

Taylor Hanayik
The Image
Published in
2 min readAug 17, 2018

I’m nearing the end of my days as a PhD student and that means I’ll be publishing my dissertation soon. I’d like to think that what I’m doing is unique. I’m sure every PhD says that about their dissertation.

I do think I have a unique set of brain imaging data to work with, and I am ultimately adding value to the field of stroke neuroimaging with my project. However, what I think is equally interesting is how I’m processing all the data I have.

The Data:

57 T1w brain images from neurologically healthy participants.

193 T1w brain images from stroke survivors.

All images are collected on 3T MRI systems, and have 1x1x1 mm resolution. So I’m working with both high quality and quantity data.

The Processing Problem:

I’m comparing many different methods for normalizing brain images (making an individual’s brain look like a template brain). This is especially important in clinical neuroimaging such as stroke research. I will be comparing the deformations required to normalize each healthy brain to that same brain with each of the 193 different injuries (lesions) injected into the image. Essentially, every stroke image “donates” its injury to every healthy image. The pairwise lesion injection will result in a dataset of 11,001 unique images.

11,001 MRI images results in a lot of data (nearly 1TB). This is then copied a few times over for each method to facilitate parallel processing on a cluster environment. So in all (copies included) I will be processing about 100,000 brain images through the different normalization techniques I’m testing.

This requires a lot of computing resources. More than I have needed for any previous project. Luckily, the University of South Carolina has pretty great supercomputing resources. I’m using the Hyperion cluster, which has 6,760 cpu cores in total. That’s a pretty substantial increase compared to my MacBook Pro (2 cores), and even our best linux machine in the lab (6 cores, 12 hyperthreaded). This means that I can create my own programs that process many datasets in parallel (hundreds or thousands at a time), rather than in serial order if I was processing the data on a typical desktop computer.

Just to illustrate the point further, if I processed the data on my laptop it would take *roughly* 476 days (running continuously). I can’t wait that long, nor should I, for the answers to the questions I’m asking. The same data, processed on the cluster will take about 1 week (running continuously).

That’s pretty impressive.

I have just started all of my jobs running, so I will write an update soon with a progress report. *Hopefully my programs don’t crash*.

I will also put some “how to” guides in the How to… section soon. These guides will cover some of the code, and techniques I’m using to process data using SLURM on the cluster.

--

--

Taylor Hanayik
The Image

Software engineer at the University of Oxford. I design and develop software for neuroimaging research.