What would the perfect mental health dataset look like?

Published in

Wellcome Data

3 min readOct 20, 2022

tl:dr please respond to Wellcome’s survey about how longitudinal datasets can best serve mental health research by clicking here.

Representing my love of data — a matrix of 1s and 0s; most are white but some have been coloured red in the shape of a heart. — Photo by Alexander Sinn on Unsplash

I love cohort studies. I did my PhD using a British-based pregnancy cohort (ALSPAC) and have more recently conducted cross-cohort comparison studies using ALSPAC and another cohort, the Millennium Cohort Study (MCS). When I was a researcher interested in understanding the links between cannabis use and psychosis, longitudinal datasets were the bread and butter for exploring this relationship. They’re a hugely important way to explore patterns in health and how these are related to potential risk or protective factors, even if they’re not the best way to get at causality.

But. These studies are often designed to explore many and varied health behaviours, illnesses and other outcomes. While mental health outcomes are often included in these studies, they are usually very brief. And that’s not surprising. All the researchers involved in designing a cohort are fighting for space — every measure added to investigate one outcome means another one can’t be. Participants only have so much time they’re willing or able to devote to answering surveys or attending a clinic, so difficult priority decisions have to be made.

Mental health is harder to measure than something like blood pressure, or a diagnosis of cancer or heart disease. There are no biomarkers (yet) so we rely on asking someone about their symptoms and the impact they have on them. Often this takes the form of a self-report questionnaire. These take up valuable participant time and space, and so they are often kept short and perfunctory.

What if we could do it differently?

In February this year I started working for Wellcome — a global charitable foundation, with a goal to help everyone benefit from science’s potential to improve health and save lives. Within Wellcome, there are three key challenge areas, one of which is mental health.

As part of my work I’ve been tasked with thinking about how longitudinal data (not just limited to cohort studies) can help us achieve our mental health strategy — namely to gain a better understanding of how the brain, body and environment interact in depression, anxiety and psychosis, so we can spot potential points for early intervention, find better ways of identifying groups for intervention, and find new and improved ways of intervening.

So I’ve been thinking about what a perfect mental health dataset would look like. If we weren’t constrained at all, what variables would we collect, and how often? What would our sample size look like, and who would be our underlying population? Would a perfect dataset for depression look the same as a perfect dataset for psychosis research?

And how do we go about doing it? There are a couple of options for helping to make a perfect (caveat, nothing is perfect) dataset — do we start from scratch (which would mean waiting for years before longitudinal data would be available), or do we enrich existing studies? As an aside, we are currently scoping existing longitudinal datasets as well, and more information about that project can be found here, including how to inform the team about existing datasets.

I’m really keen to hear from mental health researchers who work with longitudinal data — what are the variables you wish were in the datasets you use? Are there occasions where you wish concepts had been measured using different techniques, or at different frequencies? What would your perfect dataset look like?

I’d love to hear from you — you can fill out this survey, or you can send me an email and tell me your thoughts s.gage@wellcome.ac.uk.

What would the perfect mental health dataset look like?

Written by Suzi Gage