Bridging the gap between brain scans and big data

Published in

Code Ocean

5 min readMar 26, 2019

Dr. Oscar Esteban is a postdoctoral researcher in Psychology at Stanford University. He’s also the lead author of the article ‘fMRIPrep: a robust preprocessing pipeline for functional MRI’, recently published in Nature Methods. He’s keenly interested in increasing the transparency and reusability of the data generated by MRI scans; it’s no surprise, then, that the cloud-based computational reproducibility platform, Code Ocean, was involved in his research.

For those of us unfamiliar with how fMRI (functional magnetic resonance imaging) works, Dr. Esteban explains that it measures the level of oxygen in the blood as a proxy for activity within the brain. “Blood oxygen modulates its magnetic properties,” he says. “Therefore, MRI can be used to keep track of the oxygen consumption from the bloodstream. As a general principle, more oxygen is withdrawn at the most active areas of the brain.”

He likens it to a thermal camera: “if you see a peak in heat production moving around a particular location, it is highly likely a living being is sitting there in front of you.”

“We’re witnessing the blooming of neuroimaging big data”

Because fMRI is non-invasive — there’s no need for surgery, and the radiation involved is safely low — and provides such useful information, it’s becoming increasingly popular as a research tool.

Esteban has described the resulting surge in data as “the blooming of neuroimaging big data”.

It has, however, been accompanied by a sudden diversification in the methods used by scientists, who have developed such a wide range of methods of processing and analysing that data, customised for almost every study, that reproducing these experiments is becoming increasingly difficult.

Scientists across a range of disciplines have been increasingly concerned about the reproducibility of research for some time now; for example, researcher Tom Hardwick writes of his attempts to replicate papers in Cognition. [An example of Hardwicke’s work on Code Ocean can be found here].

“The reproducibility crisis underlies every field, and Code Ocean may help a lot with transparency.”

Scientists may fail to reproduce an experiment because they lack access to the original data, to the original methods (e.g. software), or to both. The Open Science movement supported by scientists and institutions therefore proposes transparency as a tool to ensure the reproducibility of findings.

Esteban believes that to address this problem, “it is fundamental to set the adequate infrastructure and relevant incentives for scientists to share their data (especially when the study has received public funds), and to share their methods (which are increasingly software-dependent every day) along with a maximally transparent reporting of the results. The ‘available upon reasonable request’ we usually find in papers in regards data and/or software must not be the default”.

His project, fMRIPrep, is intended to address this problem of reproducibility in fMRI data by enabling large-scale analyses of very different data sets with scientific transparency.

He describes it as “a software instrument that adapts to virtually any input dataset, minimizes the manual intervention, and produces consistent results across studies, thereby improving the reproducibility of results.”

One key reason for doing this is to enable machine learning to play a greater role in analysing this neuroimaging big data. While a human expert might look at a pair of images produced by scanners from different vendors and see similarities, computers currently struggle.

fMRIPrep therefore aims to make the features extracted from images across a range of acquisition sites and protocols as homogenous as possible, so that they can be aggregated in large-scale analyses and made more amenable for machine learning. (The more homogenous the data, the easier it is for a computer to apply what it has learned from one data set to another.)

The fMRI research workflow makes use of code from start to finish. As Esteban explains, “the MR sequences are just programs run by the scanner hardware; as well as the image reconstruction from the MR data recorded, the tasks that participants perform during the acquisition to trigger brain function involve software run in computers synchronously with scanning.”

The next stage involves the formatting of the images and their transfer for storage — through software again — before the original data is archived. Then, he adds, “the original data need cleaning and curation, which is also performed by software. Finally, the analysis also requires substantial programming skills.”

“Code Ocean could transform how we do science.”

With code playing such a central role throughout the fMRI workflow, the ability to run, review, and share that code is important in ensuring both transparency and reproducibility.

Esteban’s own work towards ensuring that neuroimaging data passes both those tests parallels Code Ocean’s efforts to do the same for code. “It’s a great idea,” he says; “the reproducibility crisis underlies every field, and Code Ocean may help a lot in disseminating scientific software, and in improving the transparency by making software part of the peer-review process.” He’s also particularly enthusiastic about its ability to support those beginning work in this area: “If I were starting out, Code Ocean’s capsules (which are software containers) would be very useful.”

The facility to review code is something he really appreciates too: “It’s very helpful — that’s something we need to improve on; we need to assist scientists on this. Currently, code cannot be reviewed unless the authors open-source it before publishing. Code Ocean allows blind peer-review of code, which may be an attractive option for researchers who will open-source their software after publication.”

Ultimately, he believes that such tools could transform how we do science as “more and more journals realise that they need services like this to facilitate scrutiny over scientific software.”

Dr. Esteban’s article ‘fMRIPrep: a robust preprocessing pipeline for functional MRI’ is published in Nature Methods.

Bridging the gap between brain scans and big data

Written by Code Ocean