Predicting missed hospital appointments using machine learning - what are the risks?

Published in

The Health Foundation Data Analytics

10 min readAug 2, 2019

At a time of rising demand for health care but funding shortfalls, the use of artificial intelligence (AI) and machine learning (ML) promise to make much-needed efficiencies in the UK, while maintaining or even improving the quality of patient care. A salient issue they could target is missed appointments — if a patient fails to attend without notice, it potentially wastes NHS staff time and resources. So, using machine learning to identify patients at risk and to prevent no-shows seems like an easy win.

In our team at the Health Foundation, we are working on innovative ways of using data from across the health system, and beyond, to understand and improve quality of health and care. As a consequence, we also want to spend more time thinking about the social impact of data-driven tools. When it comes to missed appointments, what are the potential pitfalls in developing prediction models? And, if we want to maximise attendance, are we even asking the right questions?

Using the example of missed outpatient appointments in the NHS, we’re going to use this post to explore some of the issues around data bias, impact and evaluation using existing frameworks, such as the 20 Critical Questions on ML and AI research for Patient Benefit developed at the Alan Turing Institute.

Current state of outpatient appointments in the NHS

Outpatient appointments are used to provide monitoring or specialist care where a hospital admission isn’t necessary, most commonly for ophthalmology, trauma and orthopaedics, and physiotherapy. In 2017/18, close to 120 million outpatient appointments were scheduled in England. The number of appointments varied by age and sex, as shown below, and we know that some patient groups will have been represented more frequently than others. For example, patients who live in more deprived areas are more likely to have chronic conditions that require outpatient care.

NHS Hospital outpatient appointments in England in 2017/18, shown by patient age and sex. Data source: NHS Digital. The R code used to create both graphs can be found on GitHub.

During the same year, one in 20 outpatient appointments in England were not attended (6.7%) and some estimated this to cost the NHS around £1bn. From an operational point of view, missed appointments are a waste of resources in already overstretched services and contribute to longer waiting times for all patients. For an individual patient, it could also be a missed opportunity to receive a diagnosis or timely care and treatment.

NHS Hospital outpatient appointments in England over time, shown by attendance type. Data source: NHS Digital.

Have there been any successes in reducing missed appointments?

So far, there’s no perfect solution. Many NHS providers use text messages as low-cost appointment reminders. Although some evidence suggests that they do indeed lead to better attendance, their efficacy is very limited. Text messaging has obvious pitfalls: not all patients have mobile phones or are comfortable with sensitive information being sent this way, and some parts of the UK still have poor mobile signal. And mobile phone numbers are not always logged on patients’ records — a UK hospital found that this was only the case for 20% of its patients.

Another common strategy is to offset missed appointments by pre-emptively overbooking outpatient clinics. In practice this is not nearly as straightforward as it sounds, as the fraction of missed appointments varies over time. To work effectively, this approach requires precise estimates of how many appointments will be missed, while doing it blindly can actually lead to even longer waiting times.

Interestingly, while we’ve noticed efforts to implement more flexible appointment booking systems at hospitals, there has been less emphasis on changing practices to prevent missed appointments; such as improving patient transport services or making it easier for patients to rearrange appointments. Better ways to prevent and manage missed appointments in a more targeted and resource-effective manner are therefore still needed.

A fresh approach? Predicting no-shows

A growing number of machine learning models can now assess the individual risk of a patient missing an appointment. Several of these are US examples, including a large study attempting to predict patient no-shows across a range of clinical specialties, with varying degrees of success. Most recently in the UK, a team from University College London Hospital (UCLH) described how an algorithm can accurately predict whether a patient is at risk of missing a diagnostic MRI, to an enthusiastic reception on social media from UK health care leaders and policymakers.

Do these new approaches carry risks?

There are enormous opportunities to benefit both patients and the NHS with these new models. We already know that patients living in deprived areas are more likely to miss appointments and that other personal characteristics, such as age, sex and ethnicity, also play a role. Therefore, if done fairly and equitably by taking into account the wider characteristics of who is missing appointments, predicting and preventing non-attendances is an opportunity to reduce existing inequalities in health and wellbeing.

But we need to be aware that there is also a risk of causing unintentional harm. The real-world data used to train these new models is in fact skewed by existing structural and social inequalities, such as disparities in health status or differences in access to health care, which we will discuss in more detail later. These structural and social inequalities will exist in any system where the algorithm will be used. Therefore, applying these models in practice could inadvertently perpetuate or exacerbate these inequalities.

There are at least two ways in which algorithmic bias could cause harm:

The obvious way in which these models could disadvantage certain patient groups is by unfairly allocating or withholding resources to prevent missed appointments, as this would lead to an uneven distribution of the benefits. In practice, an algorithm might systematically prioritise certain patient or demographic groups for an intervention, such as a phone reminder, while incorrectly flagging patient with real needs as ‘low risk’.

But even if the algorithm is technically working well, labelling certain patient groups as likely no-shows can negatively perpetuate social and cultural stereotypes. As a result, some patients could be penalised, or even stigmatised, while failing to address the underlying issues behind the missed appointment. This is not an entirely hypothetical scenario, as we already know that patients living in rural or deprived areas and patients with mental health conditions are more likely to miss appointments. At the same time, the language used around missed appointments is often negative with an undertone of blame (‘skipping appointments’, ‘unreliable patients’).

How fair and representative is our data, and what might be the sources of bias?

Most data used to train prediction models for patient non-attendance is re-purposed data. Often it was originally collected (and processed, censored and curated) for completely different purposes, such as direct patient care, costing or performance management. Any dataset, including what it does and does not contain, its accuracy and completeness, has therefore inevitably been shaped by (often not so) subtle external factors:

Lack of incentive to provide full data. A good example for this is NHS national outpatient data. We generally know which treatment specialty patients were seen in, but only around 5% of appointments have a diagnosis recorded. As is usually the case, the reason why they might be missing cannot be inferred from the data alone. But it becomes obvious when we think about the main purpose of this data, which is commissioning, or in other words, so hospitals get paid. Diagnoses are not required to calculate payment according to the National Tariff, so there is little incentive to record them in the first place.
Underserved populations. Unlike large proportions of missing data for a specific variable, non-random missingness — in terms of who is included in the dataset — can introduce subtler biases that are harder to detect. For example, to recognise that certain patient groups are underrepresented, possibly due to pre-existing differences in health care access, requires much more careful analysis. Nevertheless, these associations can be learned or amplified during model training and subsequently lead to biased decision making.
Data maintenance processes. The way in which data is curated, stored or archived could introduce variation in completeness or accuracy between patient groups, or non-random inclusion on exclusions of patients. For example, data on patients who died might have been archived and these patients would no longer be included in the data.
Minimised personal information. In some circumstances, sensitive personal or social information is not collected or made available for modelling purposes. This is usually done with good intentions, in an effort to prevent discrimination and protect patients’ privacy. However, we know now that, not only is this unable to prevent biased decisions, doing this can backfire and make discrimination harder to detect. This is because in real-world data, certain personal characteristics tend to be correlated with each other. In this way, other variables present in the training data can contain information about patient characteristics that are not present. To use the recent UCLH paper as an example, a booking for a prostate MRI scan will be highly correlated with being a male patient. So, even though no information about a patients’ gender was present, algorithms might be able to ‘fill in’ this missing info using the scan type as a proxy. In this way, they can learn to discriminate based on information they have not actually seen.

Before creating a model, we should ask whether we have enough information to detect bias and think about proactive approaches to finding potentially discriminating decisions. Guidance on accepted best practices for clinical applications of predictive models is still emerging. But there are efforts to develop methodologies, such as discriminatory aware data mining, where discriminatory patterns are flagged rather than suppressed.

Of course, none of these issues exclusively apply to data on missed hospital appointments. Therefore, in any situation where we want to ensure fair algorithmic decision making, we need a deep understanding of these data limitations.

Could patients benefit from more meaningful performance metrics?

It’s not only the data we feed in, but also how we computationally optimise and analytically evaluate model performance that matters. Ideally, the technical performance metric we choose should be a good proxy for the practical benefit we want to maximise. This leap from computation to practice is a crucial step to make sure predictions are valuable once the model is implemented.

Until recently, most models were mainly evaluated in terms of how accurately they predicted non-attendance, expressed as true positive rate (sensitivity) and true negative rate (specificity). However, cost-effectiveness, the potential for savings and increased operational efficiency are becoming more of a focus. This is a step in the right direction, as it is more reflective of the practical constraints of running clinics. For example, the UCLH study explicitly aimed to reduce the number of patients they needed to call in order to prevent one missed appointment.

Healthcare costs are obviously important, we also need to evaluate whether predictions improve outcomes for patients — a crucial test for any health care intervention. Care quality and health benefits are admittedly more complex to model or forecast, but they matter. We’d like to see people develop new ways to model the likely direct and indirect patient benefit and impact on outcomes. We believe decisions on metrics should be not be exclusively made by analysts but by diverse groups including patients and clinicians, to make sure they are relevant for the operational context and the people involved.

Impactibility and fair outcomes: will an intervention work, and work for everyone?

Knowing that a patient is likely to miss an appointment will not, by itself, make them attend. Therefore, even the most accurate model will only be truly valuable if its predictions allow us to take action and change the outcome in a way that is positive for the patient and the system, or the ‘impactibility’. This will also depend on how the model is used in practice, which needs to be tested in collaboration with health care professionals and patients. Interestingly, a Cochrane review of eight randomized controlled trials (RCTs) assessing mobile phone reminders, found that none had assessed health outcomes, perceptions of safety of the intervention, potential harms or adverse effects.

We also need to learn from past mistakes. In other parts of the health care system, risk prediction algorithms have been used for a while. A prominent example is the Predictive Risk Stratification Model (PRISM), a tool to help GPs identify patients who are at high risk of emergency admissions. A recent evaluation of its costs and effects found it did not produce any benefits to either patients or the NHS, highlighting that risk stratification is only impactful in combination with effective interventions.

We need to be sympathetic to barriers that prevent attendance. For example, patients living in rural or deprived areas and patients with mental health conditions are more likely to miss appointments, so any intervention should seek to understand and address these impediments, not embed them or make them worse.

How will technology help us to address the underlying causes of missed appointments?

While data-driven approaches and prediction models are a step in the right direction, there is room for improvement in how we approach the limitations of the data, how we can measure patient benefit and how we evaluate impact. But we also need to recognise that predictive modelling is not enough to understand the patient experience, and who is not attending and why.

The reasons why people don’t attend their appointments are complex and sometimes outside their direct control. We already know about some contributing factors, such as mental health conditions, lack of transport and accessibility, administrative problems, long waiting times and frustration with outpatient services. Qualitative and mixed-methods research could help us understand the underlying drivers, the effectiveness and the wider social impact of any intervention, and to design evaluation strategies and equity audits to address concerns.

Ultimately, we want every patient to have the right kind of support to enable them to attend their health care appointments. But what if, by trying to predict missed appointments, we are only treating a symptom? Instead, we will need to figure out how technology can serve us to make sure that every patient has access to care that works for them. This might be different for different patients, so we’ll have to figure it out in collaboration with the clinical and patient communities these new tools will serve.

This blog was written jointly with Emma Vestesson (@Gummifot) and Sarah Deeny (@SarahDeeny). All three of us are part of the Data Analytics team at the Health Foundation.