On surgery survival research: a simple case for a data-driven NHS
In early 2013 a random conversation — as it often happens here at St George’s — led me to join forces with a team of medical statisticians and cardiovascular clinicians to do a bit of data analysis.
The goal of this analysis was to find confirmation of observations made by medics: that for a type of rare surgery, called “Elective Open Suprarenal Aneurysm Repair” there are meaningful regional variations in death rates and survival. The full paper is available, thanks to Open Access, on PLOS One.
I was recently reflecting about this research experience. We often think of data in medicine as something that will “find the next cure for cancer”. I think, however, that most of the gains in medicine come from the ability to use data to improve performances and outcomes. I found myself thinking that our research provides some good pointers to address a simple point: what data-driven methods could do to make the NHS better.
As I have written elsewhere regarding Open Data, a problem-driven approach is what is needed to make things move; in a medical context, the mere use of Data (which is often not Open) could be what we need to cope with shortage of resources and increased challenges.
A rare type of surgical procedure
I will not overload this article with medical information. All you need to know about this procedure is that it addresses a specific type of aortic aneurysm, a bulging in a large blood vessel that can cause it to burst and is often fatal; it is elective open surgery, which means this is surgery that the patient chooses to have (i.e. it is not administered in emergency) and is done by — sorry to be graphical here — tearing the patient open rather than by endoscopic means.
It is a relatively rare procedure. Between 2000 and 2010, the time frame which our research analysed, 793 Suprarenal Aneurysm Repairs were performed in England. What matters to clinical staff when talking about surgery is how it affects survival and quality of life, and if there are any factors that have an impact on it. Our research dealt specifically with death risk and survival rates.
Here comes the data
Our research was based on a big dataset of in-patients hospital events owned by NHS Digital (then Health and Social Care Information Centre), the Hospital Episode Statistics dataset or “HES”. When I say this is a massive dataset I am not exaggerating: each patient who goes to hospital is recorded in the dataset together with personal data, diagnostic information, prognosis and outcome. There are genuinely millions of records. Our subset had literally hundreds of columns (the structure of HES being a rather horizontal, flat CSV file).
Access to HES is heavily regulated due to the high sensitivity of the data it contains. I had to go trough an “approved researcher” procedure in order to gain access to it, although I was working with a pre-approved team. However, any medic can apply to access subsets of it, assuming they work under IG Toolkit compliance. It is a mine of medical knowledge.
Our research outcomes
The results we obtained were complex, and I will heavily summarise them by saying that
- the location where the surgery is performed has an impact on the patient’s survival in the period immediately following the procedure; in other words, if you have an aneurysm repair in Sheffield, you are more likely to die than if you have it in Oxford. Specifically, death rates are higher in the North East, North West, Yorkshire and the Humber and West Midlands compared to East Midlands, London, South East, South Central and South West.
- in the 30-days window following the procedure, the regional variation is the single most important predictor of survival, much stronger than social deprivation.
- similar results are evident for the situation after five years: location is a predictor of survival more than social deprivation or other factors.
- for patients who survived after the first 30 days, location was not significant.
How could the NHS use such data
What fascinates me of these results is that they trigger loads of questions (some of which we tried to address in the paper):
Why are there regional variations? What causes them? Are there some hidden factors that make the regional variation emerge? Are the variations worrying or just natural? Would other variations (not regional ones) be more acceptable? Do patients who move have better or worse chances? What could we do to improve the situation and make the patients residing in different areas more equal in their chances of survival?
These are examples of questions that a data-driven health service should be asking itself daily. Unfortunately, most of the discussion around health data has been focussed on privacy and data protection, for example in the failed care.data project — very important topics, of course, but I do believe that there is a mine of knowledge to be tapped in data that is already collected and used ethically by researchers around the country.
There is potential in HES, for example, to start questioning situation like the one I have described, which was about a tiny number of procedures over a number of years. Similar examples abound in literature. A future-looking NHS should pretty much follow suit and use this academic knowledge — or, better, start develop in-house skills, to monitor, predict, gather knowledge, take action based on it.
Does it sound impossible? It is not. About 10 years ago I had my first job experience in an Italian company called Noemalife. They developed software for labs in hospital, running massive Oracle databases in order to power the whole analytical system. Among other products, we had devised a system to monitor antibiotic resistance in real-time among in-patients. Alerts would be sent when MRSA was detected.
If this could happen 10 years ago, I am sure that there is massive scope for this now that we have better technology and better understanding.