Getting started with NHS mental health data

A practical guide for analysts

Sebastien Peytrignet

Published in

The Health Foundation Data Analytics

9 min readOct 14, 2022

Top tips

The Mental Health Services Data Set (MHSDS) is the most detailed data source on specialist, secondary mental health services that the NHS in England offers.
Aggregate statistics are released monthly as open data files. These detailed outputs are good enough for many analyses. You can use our R data data pipeline to compile the monthly releases into a single file, to make the files easier to use.
If you need person- or appointment-level data (for example to link MHSDS to other health care records), you will need to ask for access to a secure version of the dataset, which contains many data quirks. The West Yorkshire Health and Care Partnership has developed code that you can use to deal with many of those data issues.
Do not take the quality of the data in MHSDS for granted. MHSDS is a very useful resource, but many fields such as diagnoses are still poorly populated.
Be aware that this dataset does not cover all services. The Improving Access to Psychological Therapies (IAPT) Data Set is a more complete data source for talking therapies offered to adults with anxiety and/or depression.
Only 1 out of every 8 people with a mental health condition seeks treatment. Therefore, population surveys accessible through the UK Data Service offer more useful insights into the true prevalence of mental health conditions.

Who is this guide for and what does it cover?

This guide is for analysts and data scientists. The Networked Data Lab at the Health Foundation works with analytical teams embedded in health and social care systems across the UK. During the past year, we have been researching children and young people’s mental health. We recently published our findings of this research in a briefing, where we described rapid increases in GP consultations and mental health prescribing, and highlighted stark socioeconomic inequalities.

Working on this topic meant that we needed to get to grips with data we had not worked with before, which came with a unique set of challenges.

In this short guide, we give an overview of the mental health data available to analysts and share what we have learned, particularly around working with the Mental Health Services Data Set (MHSDS) — this is the most detailed source of data on mental health services in England. We also set out some open-source solutions to overcome common challenges.

Note that in this guide we only focus on data on treatment in secondary mental health care — that is, treatment that mental health specialists (as opposed to generalists) give after a referral or a patient’s first contact with another service. The datasets we link to here can be freely used for non-commercial purposes.

How is NHS mental health data collected?

In England, secondary care providers are required to submit data on NHS-funded specialist mental health care to NHS Digital on a regular basis. The bulk of the data feeds into a database called the Mental Health Services Data Set (MHSDS), the scope of which is all health care activity related to mental health needs and wellbeing support for children and adults. It covers services provided in hospitals, outpatient clinics and the community. MHSDS also includes activity related to learning disabilities, autism and other neurodevelopmental conditions. However, data on talking therapies are collected separately by the Improving Access to Psychological Therapies (IAPT) Data Set (not discussed here).

What are the implications for data quality?

Providers keep clinical and operational records on patients and the care they receive for the purpose of direct patient care. This information is re-used for what is often termed ‘secondary uses’, including the commissioning of services, service design and service improvement.

For some providers, collecting and submitting data can be burdensome — particularly for smaller providers (such as charities), which may not have information technology (IT) systems in place to automate these processes. This can have an impact on the quality of the submitted data, for example resulting in under-recording and incompleteness of key variables, such as patient characteristics, diagnoses and outcomes. For instance, we found that it was not possible to analyse the severity of mental health conditions among young people because of poor data quality and missing data in MHSDS.

The barriers to data recording and reporting are not exclusively of a technical nature. Some of our partners’ work showed that providers are sometimes reluctant to record and report mental health diagnoses for young people, due to concerns about the associated stigma.

Once data have been submitted to NHS Digital, NHS Digital then has the difficult task of collating data from hundreds of providers, which often use different IT systems and collection methods. When providers submit data in the wrong format, it can result in those submissions being rejected. For these reasons, NHS Digital has been working with providers to ease data flows between NHS Digital and providers and improve data quality.

How can MHSDS data be accessed?

Once NHS Digital has collated data from different mental health providers into MHSDS, the data are disseminated in two main ways:

· Monthly aggregate statistics

Aggregate data are released publicly on a monthly basis and are available on NHS Digital’s website. These statistics include headline figures such as the number of people with a referral who are waiting to start treatment. Data quality reports accompany them so that data completeness can be assessed.

To create time-series data, analysts need to combine individual monthly files themselves. For data going back years, this can be very time-consuming. This is why we developed a custom R data pipeline that extracts the links to all monthly files and then downloads, appends and cleans each data file. Using this open R code results in a single file, which is easier to use and more amenable to analysing trends over time.

Similarly, IAPT data on NHS talking therapies are also released on NHS Digital’s website on a monthly, quarterly and annual basis. These datasets can be downloaded or visualised on NHS Digital’s data dashboards for a quick glimpse at headline figures.

· Secure-access patient-level dissemination

Record-level versions of MHSDS allow the data to be linked to other health records, such as those on hospital care, A&E attendances and mortality, using the MHSDS patient identifier. While there are no national GP data extracts, MHSDS can be linked to local GP records or other datasets such as the Clinical Practice Research Datalink (CPRD) (see Figure 1).

These data extracts are pseudonymised to protect individuals’ identity, meaning that identifiable information — such as the patient’s name and address — is not included but records from the same patient can still be linked. Access to the data is granted through an application to NHS Digital’s Data Access Request Service (DARS). This process includes ethics approval for each project.

Figure 1: Linking MHSDS to other NHS datasets

Source: NHS Digital and Rowena Jacobs, University of York (used with permission)

Challenges of working with patient-level MHSDS

MHSDS is a large and complex dataset. It has more than 60 separate tables (see Figure 2) and needs a lot of cleaning and pre-processing.

The MHSDS information standards page is a good place to start. It contains descriptions for each table and a data dictionary describing each variable (for example, referrals, diagnoses and appointment history). Users of the dataset may notice that there are different versions, with version 5 being the most recent one at the time of writing. New versions are released when the collection methodology changes, errors need to be corrected or new variables are added. It is not unusual, especially if you are working with patient-level data going back several years, to have to work with multiple versions simultaneously.

Figure 2: MHSDS data model (version 5)

Source: NHS Digital (used with permission)

Another challenge in working with the dataset is that, in order to link MHSDS records to records of the same patient in other datasets, additional so-called ‘bridging files’ may be needed. There are different versions of the bridging file for different versions of MHSDS (see Figure 3).

Figure 3: Working with multiple versions of MHSDS and bridging tables

Source: Souheila Fox, West Yorkshire Health and Care Partnership (used with permission)

There are also several other quirks to be aware of:

· The same patient may have different identifiers in different dataset versions. Patient identifiers are listed in the bridging files, which as noted above are needed to link MHSDS to other datasets. Not resolving these before analysis can result in double counting of individuals.

· Duplication of records is common due to a quirk in the data submissions process: records can be re-submitted each month. For example, records on referrals can be re-submitted each month for as long as the referral is open. This needs to be accounted for, as it may otherwise appear as though a patient has several active referrals. The Structured Query Language (SQL) code described below dededuplicates the dataset. The data processor may also add a specific flag to your data showing which duplicate records need to be removed.

· There may be ‘non-valid’ data points. Up until the end of the fiscal year 2019/20, some data rows from older versions of MHSDS were marked as ‘non-valid’ (for various reasons), using a submission flag in the data. It could be named ‘IC_USE_Submission_Flag’, ‘z_SubmissionFlag’ and so on, depending on how you received these data and how they had been pre-processed. Those observations should be filtered out using the flag.

Our analysts in Leeds (in the West Yorkshire Health and Care Partnership) used SQL code to deal with the issues we set out above and we have uploaded it onto our GitHub. You may also find it useful if you are working with multiple versions of MHSDS.

The University of York runs an annual in-person workshop on MHSDS, which is another great way to get started with this dataset and get to grips with common pitfalls.

What are the limitations of administrative data and what other datasets are available?

In this guide, we have focused on administrative data from specialist, secondary mental health services provided in the NHS but this is not the only source of mental health data.

For those interested in inpatient and outpatient hospital care, Hospital Episode Statistics (HES) include data on hospital admissions and A&E attendances related to mental health problems. As for MHSDS, there are both open and secure-access versions of HES. This is a useful resource to capture activity related to crisis episodes, for example those relating to self-harm or substance abuse. In addition, the Fingertips tool, which Public Health England originally developed, contains useful estimates of the prevalence of different mental health conditions, including depression and anxiety.

It should be noted that the above datasets only cover people who are in contact with services. They miss people who are not receiving help — about 7 out of 8 people with a mental health problem. The data sources also miss out patients receiving privately funded care.

The best way of getting population-wide estimates on people’s mental health is to use large, nationally representative surveys. Most of these survey statistics are available through the UK Data Service. For year-on-year trends, we recommend using longitudinal surveys such as Understanding Society.

We hope that you have found this guide useful, and that it makes working with mental health data from the NHS easier. If you have any comments on the guide or would like to suggest additional open-source solutions, please get in touch with us (@SebastienPeytr2 or ndl@health.org.uk).

We would like to thank Souheila Fox, Ben Alcock and Alex Brownrigg at the West Yorkshire Health and Care Partnership for their work on the MHSDS cleaning pipeline.

Special thanks to Fiona Grimm for her contributions. We are part of the Data Analytics team at the Health Foundation.