Coronary Artery Disease Prediction

kardiolabsAI
4 min readDec 24, 2021

--

Heart disease is the leading cause of death. In the US, around 659,000[1] and in Canada 77,000 people die from heart disease each year. The spending on Heart disease costs the United States about $363 billion annually[2] and Canada 22 billion annually.

1 Background

American College of Cardiology and American Heart Association (ACC/AHA) 10-year cardiovascular risk calculator has been challenged for its accuracy by several analyses(Lancet 2013; 382:1762 and JAMA Intern Med 2014; 174:1964). Researchers used data from the MESA(Multi-Ethnic Study of Atherosclerosis) study proved that Framingham-based risk scoring systems and the ACC/AHA calculator risk equation substantially overestimated actual 5-year risk in adults without diabetes, overall and across socio demographic subgroups.[3]. Since the calculator is used to select patients for statin therapy, the implications of inaccuracy are substantial.

Thus, the potential of utilizing machine learning to improve prediction of cardiovascular disease and make better medical decisions is significant.

2 Data and methods

2.1 Data source

Data used in this case is the Cleveland Heart Disease dataset from the UCI Repository.

2.2 Methods:

We applied various machine learning methods to unmask the relationship between certain attributes and heart diseases. Machine learning algorithms we used includes:

  • Naives bayes
  • KNN
  • Decision Tree
  • SVM
  • XGB
  • VotingClassifier
  • Logistics regression
  • Random Forest

The process of our study is shown as the following flow chart:

3, Result and Feature importance:

3.1 Machine learning models accuracy

A few models show decent accuracy as shown below. With hyperparameters tuned, Logistics regression and random forest have 88% and 84% accuracy on the test dataset.

3.2 Feature importance:

Feature importance refers to techniques that assign a score to input features based on how useful they are at predicting a target variable.

According to RF, out of 30 variables, the top 5 important features are:

  • Cp0:Typical angina: chest pain related decrease blood supply to the heart
  • Oldpeak: ST depression induced by exercise relative to rest — looks at stress of heart during exercise, unhealthy heart will stress more.
  • Exang1: exercise induced angina (True)
  • thalach — maximum heart rate achieved
  • exang0 — exercise induced angina (False)

The least useful variables includes:

  • Thal_0: thalium stress result
  • Ca_4: ca empty value
  • Fbs_0:(fasting blood sugar > 120 mg/dl) (false)>126' mg/dL signals diabetes
  • Fbs_1:(fasting blood sugar > 120 mg/dl) ( true)>126' mg/dL signals diabetes
  • Restecg_2: having ST-T wave abnormality (T wave inversions and/or ST elevation or depression of > 0.05 mV).

4, Conclusion

Kardiolabs is developing Artificial intelligence based solutions for automated reporting of CT Coronary Angiogram for patients suffering from coronary artery disease. For this study, we have experienced cardiologists in the team to advise on machine learning methods. Next step, more features and records will be introduced to further improve the prediction.

Appendix:

  1. age: age in years
  2. sex: sex (1 = male; 0 = female)
  3. cp: chest pain type
    — Value 0: typical angina
    — Value 1: atypical angina
    — Value 2: non-anginal pain
    — Value 3: asymptomatic
  4. trestbps: resting blood pressure (in mm Hg on admission to the hospital)
  5. chol: serum cholestoral in mg/dl
  6. fbs: (fasting blood sugar > 120 mg/dl) (1 = true; 0 = false)
  7. restecg: resting electrocardiographic results
    — Value 0: normal
    — Value 1: having ST-T wave abnormality (T wave inversions and/or ST elevation or depression of > 0.05 mV)
    — Value 2: showing probable or definite left ventricular hypertrophy by Estes’ criteria
  8. thalach: maximum heart rate achieved
  9. exang: exercise induced angina (1 = yes; 0 = no)
  10. oldpeak = ST depression induced by exercise relative to rest
  11. slope: the slope of the peak exercise ST segment
    — Value 0: upsloping
    — Value 1: flat
    — Value 2: downsloping
  12. ca: number of major vessels (0–3) colored by fluoroscopy, 4, NAN
  13. thal: 0 = normal; 1 = fixed defect; 2 = reversible defect
    and the label
  14. condition: 0 = no disease, 1 = disease

Blog by Mia

Reference:

[1]:Centers for Disease Control and Prevention. Underlying Cause of Death, 1999–2018. CDC WONDER Online Database. Atlanta, GA: Centers for Disease Control and Prevention; 2018. Accessed March 12, 2020.

[2]: Virani SS, Alonso A, Aparicio HJ, Benjamin EJ, Bittencourt MS, Callaway CW, et al. Heart disease and stroke statistics — 2021 update: a report from the American Heart Associationexternal icon. Circulation. 2021;143:e254–e743.

[3]: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5097466/

--

--

kardiolabsAI
0 Followers

Redefining cardiovascular diagnosis