Researchers are shifting their focus: why Research Data Management matters

In the last two decades, there has been a huge amount of effort and investments in the field of neuroimaging to accelerate and improve the techniques, methods, and tools. But all these efforts don’t make sense without well defined, standard best-practices and guidelines to amplify research power.

Marc Ramos
QMENTA Tech Blog
7 min readSep 27, 2018

--

This article was inspired by the study “Data management and sharing in neuroimaging: Practices and perceptions of MRI researchers”, by @JohnBorghi and Ana E. Van Gulick.

Is RDM a new thing?

This question can have different answers. All the research project steps involve Research Data Management (RDM), from the data collection to the publication of the results. It is designed to make the process efficient and to meet the expectations of universities, research funders and legislation.

Since humanity stopped forgetting and started documenting, the first librarians collected, organized and stored in tall shelves and deep rooms the information and data of their era, and because of researching, RDM appeared naturally, in one way or another. RDM-related practices have been used for many years as long as research required it. It includes activities such as data back up, secure sensitive data, organize the data in an understandable folder structure and keep a consistent file naming, documentation about the procedures followed to get to the result, curation and sharing of the results. Science and research have evolved and matured over the years to answer more difficult questions. RDM has become very complex: collected data is very heterogeneous among the different research fields, and new concerns have appeared when thinking about releasing the data used to extract the results.

source

RDM as it is understood today started with the Digital Curation Centre (DCC), where all the steps of the research process (Data collection, data analysis, and data sharing) are performed in the digital reality. But digital records can be at risk of obsolescence because if there is a lack of data curation and documentation for future usage. Data that is not curated and documented is fragile and volatile, and eventually useless and unpractical.

The original function of DCC is described as

“…to provide a national focus for research into curation issues and expertise in the processes of digital archiving, preservation and management. Particular emphasis will be placed on the needs of users of the Centre’s outputs.”

How RDM accelerates research

The emerging perceptions of open science and scholarly communication practices aim to allow researchers a much better and faster access to publications and data. It also wants to improve the way lab groups work, trying to establish good practices and standards so the full research workflow is consistent, homogeneous and re-usable within the research group and (why not) universities.

source

Thanks to coordinated efforts and institutions’ support on writing research data policies and services, they are already measuring improvements in research capabilities. It’s possible to do more with less if you take good care of the RDM of your project.

Measuring the Benefits

37% Projected saving in staff time from moving Oxford University Classics Dept database to centralized virtual service

69% Increase in citations for clinical trial publications associated with making their microarray datasets publicly available

500% Growth in datasets downloaded from Economic and Social Data Service 2003–2008

One-day delay cut to 5 minutes Estimated time-saving for crystallography researchers to access results from Diamond synchrotron, by deploying digital processing pipeline & metadata capture system

Way to go, research!

The neuroimaging scenario

source

Magnetic Resonance Imaging (MRI) is one of the most common techniques in neuroimaging. It is a very powerful tool to analyze the composition and function of the brain. But when it comes to allowing reproducibility over experiments and granting access to some data, the community faces a lot of challenges. One way of proving reproducibility and reliability would be to perform a test-retest: if you scan a person twice with one scanner using the same protocol in 7 days span, you should have identical acquisitions. The same thing happens with data analysis: if two researchers perform an analysis using a similar tool over the same data, the results should come out the same (this is known as “computational reproducibility”).

Data is very complex and heterogeneous. Most of the neuroimaging software is developed to sort out data stored in a specific format and some have narrowed-down requirements in order to achieve great performance. Brain scans can have very different properties, the number of options and settings available in an MRI scan are way more than the ones you would find in your favorite photo sharing app, so developers and researchers have written a large quantity of software to cover all these options and to extract the valuable information from the brain scan. Imagine mastering all of them, it would take years and years, even if its documentation is neat and grandma-proof.

Data management procedures are still immature. Researchers and paper reviewers have arisen awareness on this topic. The focus is now on proper data management and stewardess to support data sharing. When you ask someone in the neuroimaging field about how they are managing the data, how they plan for data management, storage, and documentation or if they share data or code publicly, you will get assorted answers. This is shown in a study where 144 people working in different institutions and centers of the field took a survey answering questions regarding RDM (research data management). Ratings of the maturity of the RDM (from ad-hoc to mature) showed significantly low results during the data sharing phase. The main limitations for researchers to implement RDM in their process are time and lack of best practices. The main motivations are preventing loss of data, ensuring access to collaborators and to enable openness and reproducibility in science. Emerging practices including publishing in open access journals, preregistering the study, data and code sharing, were largely positive although very few demonstrate to be actively putting it in practice.

Ratings of research practice maturity. Average ratings of research practice maturity on a scale from 1 (ad-hoc)
to 5 (refined) between three phases of an MRI research project (data collection, data analysis, and data sharing)

FAIR data for open science.

One of many initiatives towards this goal is called FAIR (Findable, Accessible, Interoperable and Re-usable). The principle was defined by many experts in the field: researchers, librarians, funders, publishers. It states that data needs to be effectively documented, organized and saved before it can be evaluated, shared, and re-used. Something as obvious as that becomes a nightmare in a field where so many stakeholders have their own internal processes to handle the data.

source

Efforts are being put to arrange MRI data in a reproducible way. New file structure standards (like BIDS) are being designed and neuroimaging metadata and brain scan platforms are breaking through the neuroscience research jungle to boost sharing capabilities. Of course, there are many barriers and concerns that can put the process on hold: privacy, private interests, lack of experience, regional data regulations, and so on. More work outside laboratories and scientific facilities has to be done to clear the path for the open data reality.

Are we ready to open research in neuroimaging?

The big question does not have a positive answer at the moment. There are too many fears, concerns, difficulties and interests to publish and release the precious data to the world. Fear of criticism; concerns of people taking unethical advantage of the findings; lack of training on how to share data in a way that can be understood and re-used by other researchers; selfish interests of not wanting to publish something that could be worth lots of money.

Against all odds, new standards for data structure and data analysis are coming up, awareness in reproducibility and re-usability is rising and more and more scientists are embracing open science principles and data sharing. Open data initiatives with freely available data are spreading, mostly on resting-state and task fMRI community but also in structural and diffusion fields. User-friendly software is gaining traction and new technologies enable ways of making the code more stable and reliable. All we need is the coordinated efforts of individual researchers, institutions, journals, professional organizations and grating agencies to accelerate this process towards open science in neuroimaging which will enable the development of transparent, reliable, reproducible and groundbreaking neuroimaging tools.

--

--