Validity of countless functional magnetic resonance imaging (fMRI) studies in doubt [Miniseries: 2016’s top 100 journal articles]

This is part 6 of a miniseries reviewing selected papers from the top 100 most-discussed journal articles of 2016.

Functional magnetic resonance imaging (fMRI) has become a popular tool for understanding the human brain, with PubMed listing some 40,000 published papers. However, despite this popularity, the statistical methods used with fMRI have rarely been validated using real data.

International neuroimaging data sharing initiatives have now made it possible to evaluate statistical methods with real data, and a number of studies have started to do this. In one of these studies, a group of researchers analysed 1,484 resting-state fMRI datasets with one specific software package. They found a high degree of false positives, up to 70% compared with the expected 5%. However, it was not clear if this finding would propagate to group studies, or what the statistical validity of other fMRI software packages would be.

The same group of researchers sought to address these limitations in a new study[1] that conducted an evaluation of group inference with the three most common fMRI software packages. The paper reporting the study is article #62 of the top 100 most-discussed journal articles of 2016.

In the new study, 2,880,000 random group analyses were performed to compute the false-positive rates of the three fMRI software packages. The analyses comprised 1,000 one-sided random analyses repeated for 192 parameter combinations, three thresholding approaches, and the five tools in the three software packages.

The researchers found that the three software packages can produce “P values that are erroneous, being spuriously low and inflating statistical significance.” They state that:

This calls into question the validity of countless published fMRI studies based on parametric clusterwise inference. It is important to stress that we have focused on inferences corrected for multiple comparisons in each group analysis, yet some 40% of a sample of 241 recent fMRI papers did not report correcting for multiple comparisons, meaning that many group results in the fMRI literature suffer even worse false-positive rates than found here.

In response to their findings, the researchers advise that:

Due to lamentable archiving and data-sharing practices, it is unlikely that problematic analyses can be redone. Considering that it is now possible to evaluate common statistical methods using real fMRI data, the fMRI community should, in our opinion, focus on validation of existing methods.

They conclude their paper by highlighting the critical role that data sharing played in their work, and the need for study authors to share their statistical results and data:

Although our massive empirical study depended on shared data, it is disappointing that almost none of the published studies have shared their data, neither the original data nor even the 3D statistical maps. As no analysis method is perfect, and new problems and limitations will be certainly found in the future, we commend all authors to at least share their statistical results and ideally the full data.

This support for data sharing contrasts with the controversial views expressed by the editors of the New England Journal of Medicine (NEJM) in article #34 of the top 100 most-discussed journal articles of 2016.

Header image source: 01 Siemens MAGNETOM Trio by Image Editor is licensed by CC BY 2.0.


  1. Eklund, A., Nichols, T. E., & Knutsson, H. (2016). Cluster failure: why fMRI inferences for spatial extent have inflated false-positive rates. Proceedings of the National Academy of Sciences, 201602413.

Originally published at RealKM.