Sickest-first policy & predictive models for liver transplant candidates in the US

Hoang Thien Ly
ResponsibleML
Published in
7 min readNov 10, 2021

Liver (or in Old English lifer) is referring to the heaviest internal organ in the human body that quietly runs for 24 hours a day.

What does the liver do? It performs 500 tasks to keep the body healthy. Anything that is eaten or consumed, whether it’s food, alcohol, medicine or toxins, gets filtered by the liver, says Hellan Kwon, M.D., University of Michigan.

Oh, wait, wait! Am I reading a blog on machine learning? If you have that unusual feeling, my dear mate, be patient and fasten your seatbelt. In the next couple of minutes, we will depart on a journey of innovative ideas in liver transplantation. And how armed you are with basic statistics & ML models and 0 days of training in medical school, can potentially save more liver disease patients per year!

Liver Transplant problem & Sickest-first policy

At a medical center in the US:

In the US, patients with life-threatening liver diseases requiring a liver transplant need to sign up for a waiting list. There is only one waiting list for the entire US, although registration at different medical centers in those 11 regions is allowed. Furthermore, of note, waiting time cannot be predictable (some get transplanted after weeks of waiting, some even after years of waiting). The waiting time may also vary from regions of the country and the illness levels.

UNOS.org (United Network for Organ Sharing)

What is the sickest-first policy?

When a patient is sicker, he/she will get a higher place on the waiting list (reach to the top) and get transplanted earlier when a compatible liver is found and he/she is still eligible for the surgery.

And some other criteria are taken into the consideration to lower the risk of failure:

  • Blood type,
  • Body size (eg: a big body requires a bigger liver),
  • Geographical consideration (the nearer to the deceased donor’s liver, the better liver is conserved),
  • ….

So, that’s the story from the doctor, how the story from data is.

From statistical results, it raises a sequence of questions for us:

  1. How to assess the illness severity to prioritize patients awaiting on the list?
  2. How can we allocate livers more effectively to lessen the number of patients unexpectedly removed from the list?

MELD-Score and optimistic results

Since 2002, UNOS adopted the MELD-Score to assess the severity of patients to rank them on the waiting list.

  • The MELD-Score stands for Model for End-stage Liver Disease that estimates the chance of surviving during the next 3 months of patients with chronic liver disease.
  • MELD-Score ranges from 6 to 40.
  • The higher the score, the sicker his/her position is on the waiting list.
  • It has some variants: MELD, MELD-Na (with Sodium, not missing values :)), MELD-exception (for liver cancer,..) and PELD (for pediatric patients).

And based on MELD Scores, we can predict the 3-Month mortality of patients:

Predicting Mortality

For instance, a patient has the MELD Score of 9, which means only around 2% of mortality of this patient in the next 3 months

Another trait of the MELD Score, it also predicts 1-year survival rate of a patient.

Example: MELD-Score at 10, 90% of survival on the waiting list, but after transplant, the survival rate lessens down to 83% (no room for a benefit to allow this patient to get transplanted). But a patient with MELD Score at 30, after transplant, the survival rate astonishingly leverages from 21% to 71%!

And as the conditions change, the MELD Score changes. Depending on how critical disease is, doctors look at the last MELD Score to order new lab tests. For instance, with MELD Score 25 or higher: every week, but 11–18: every three months [More info at: Recalculation MELD Score].

So, we told a lot about this MELD Score, but how is it calculated in actuality?

MELD Score formulation

And 2002 - the moment of firstly using MELD Score to rank patients on the waiting list marked a milestone in the history of medicine. The use of this score instantly gained significant results:

The MELD-based allocation system was immediately successful, leading to the first-ever reduction in the number of waiting list candidates and a 15% reduction in mortality among those on the waiting list. — Freeman, R., Wiesner, R., Edwards, E., Harper, A., Merion, R., Wolfe, R.: Results of the first year of the new liver allocation plan. Liver Transplant, 10, 7–15 (2004)

A 15% reduction in mortality among awaiting candidates on the waiting list means that over a thousand lives were saved thanks to the use of MELD-score!

But, hold on, are there some drawbacks to this score? What can we expect from a formula with only 4~5 parameters?

The log-transformed values of Bili, INR, Creatinine at 1.0 can be problematic, as a large percentage waiting list candidates possess Creatinine levels below, and values below this threshold can reflect different levels of kidney function. (Sharma, P., Schaubel, D., Sima, C., Merion, R., Lok, A.: Re-weighting the model for end-stage liver disease score components. Gastroenterology, 135, 1574–1581 (2008))

The correlation between MELD and outcome is not equally strong for all patients. For some patients, MELD may not accurately reflect the severity of their condition.

Due to these reasons:

Applying ML Techniques

We will analyze the work mainly related to the use of:

  • Optimal classification tree [1](Bertsimas, D., Kung, J., Trichakis, N., Wang, Y., Hirose, R., Vagefi, P.: Development and validation of an optimized prediction of mortality for candidates awaiting liver transplantation.Am. J. Transplantat. 19, 1109–1118 (2018))
  • Logistic Regression and gradient-boosting ensembles with decision trees [2](Byrd J., Balakrishnan S., Jiang X., Lipton Z.C. (2021) Predicting Mortality in Liver Transplant Candidates. In: Shaban-Nejad A., Michalowski M., Buckeridge D.L. (eds) Explainable AI in Healthcare and Medicine. Studies in Computational Intelligence, vol 914. Springer, Cham. https://doi.org/10.1007/978-3-030-53352-6_31)

Dataset: Organ Procurement and Transplantation Network (OPTN) Standard Transplant Analysis and Research (STAR) dataset.

Target: Probability of patients dying or becoming unsuitable for transplant within 3 months.

Metric: out-of-sample area under the curve (AUC).

Optimal classification tree

Data division: 50–20–30 for train-validation-test.

Observations: 1 618 966 observations, dependent variable was set to 1 if the patient died or was removed due to unsuitability within 3-month from the observation date, and to 0 otherwise.

End result: classification tree predicting the probability of a patient dying or becoming unsuitable for transplant within 3 months.

Example of a classification tree [1]. Full version at: http://www.opom.online/popom_tree.html

Brief simulation result: Allocation of livers based on this model, rather than Match MELD, result in 417.96 (17.6%) fewer death each year. And this model demonstrates a decrease in waiting list deaths/ removals across every disease severity bracket when compared to MELD allocation.

Logistic Regression and gradient-boosting ensembles with decision trees

Dataset: waiting list histories from June 30, 2004 to 2016

Data division: 50–25–25 for train-validation-test.

Number of features: 50 (31 known at registration, 19 updated over time).

Categorical features ➜ dummy variable, numerical features ➜ standardization (zero mean, unit variance in training set).

Missing values:

  • Numerical time-series features: forward-filled by last known value
  • Other numerical missing values: by median from training set

After pre-processing, data has 241 columns.

ROC AUC scores for ranking patients-days by same-day and 3-month mortality [2]

Brief conclusion:

  • Gradient boosting ensembles outperform MELD and MELD-Na for AUC:

* 0.935 (grad-boost) vs 0.831 (MELD-Na) for same-day prediction

* 0.834 (grad-boost) vs 0.730 (MELD-Na) for 3-months prediction

  • Removing demographic features (race, gender,…) and subjective features does not have a large effect on model performance

Summary

From the stage of arbitrarily allocating donors’ livers, there was a huge improvement to save the lives of thousands of patients by the MELD-Score. And we, as ML enthusiasts, can make a big leap in investigating the applicability of ML models in a responsible way in liver allocation system and do better by directly optimizing waitlist mortality?

In future work, there are still lots of debatable questions awaiting us: Is sickest-first a correct policy to allocate livers? Can ML models be an alternative candidate for this MELD score? How can we assess the fairness of created model towards demographic features such as gender or race? How can we mitigate the effect of possible indices due to manipulability by doctors in our ML models? etc.

And probably, a huge room to applying XAI methods to doubled-check the built models with the medical theories provided by physicians should be a worth-considering task for us, seems probable…

If you are interested in other posts about explainable, fair and responsible ML, follow #ResponsibleML on Medium.

References

--

--