Member-only story
Bi-LSTM+Attention for Modeling EHR Data
Essential guide to the diagnosis prediction in healthcare via attention-based Bi-LSTM network
Predicting future health information or disease using the Electronic Health Record (EHR) is a key use case for research in the healthcare domain. EHR data consists of diagnosis codes, pharmacy codes, and procedure codes. Modeling and Interpreting the model on EHR data is a tedious task due to the high dimensionality of the data.
In this article, we will discuss a popular research paper, DIPOLE, published in June 2019, which uses the Bi-LSTM+Attention network.
Contents:
1) Limitations of Linear/Tree-based for Modeling
2) Why RNN models?
3) Essential guide to LSTM & Bi-LSTM network
4) Essential guide to Attention
5) Implementation
1) Limitations of Linear/Tree-based Model:
The previous implementation was a Random Forest model with a fixed set of hyperparameters to model the aggregate member-level claims/pharmacy/demographics features.
- In the case of disease prediction, the output depends on the sequence of events over time. This time sequence information gets lost in the RF model. So the idea is to try time-series model-based event prediction. Candidates can be statistical time series models like ARIMA, Holt-Winters, or Neural Network-based models like RNN/LSTMs or even transformer-based architectures.
- However, the long-term dependencies of the events and information in (irregular) time intervals between events are difficult to capture in an RF model or even in classical time-series models.
- Further, the Random Forest was not able to capture the non-linear associations & complex relationships between time-ordered events. This is also the case with classical TS models. We can introduce non-linearity by including interaction terms (like quadratic, multiplicative, etc) or using kernels (like in SVM), however, that depends on us knowing the actual non-linear dependencies which in current age real data is very difficult to find out.
As such we move ahead with exploring neural network-based time series models like RNN/LSTMs first and later with transformer architectures. The above…