Use of Machine Learning in Predicting Cardiovascular Disease
By: Ronok Ghosal
Westlake High School, Austin, TX
Basic Understandings:
Entropy is a mathematical concept that plays a crucial role in many areas of computer science, including information theory, coding theory, and machine learning. Entropy is most commonly integrated into applications that revolve around making predictions. In information theory, entropy is a measure of the amount of uncertainty or randomness in a message. Entropy(commonly mathematically denoted as H) is defined as the expected value of the self-information, which is the negative logarithm of the probability of a message.
The higher the entropy, the more uncertainty there is in the message. In coding theory, entropy is used to measure the efficiency of a code. A code is efficient if it can represent a message with a low number of bits. The lower the entropy, the purer the collection of samples, with an entropy of 0 depicting a pure or complete categorized group.
Entropy can also be used to make predictions in certain situations. For example, in weather forecasting, entropy can be used to predict the probability of different weather events occurring. Similarly, in financial forecasting, entropy can be used to predict the likelihood of varying stock price movements. Entropy is a mathematical concept that plays a vital role in many areas of computer science. It is used to measure uncertainty, the efficiency of codes, the impurity of sets of samples, and to make predictions in various contexts.
In my project, I used entropy to design an ID3 logistical model to predict if a patient suffers from a cardiovascular illness, given their symptoms. To understand how entropy was used in the project, an understanding of how ID3 works is essential. The ID3 algorithm revolves around the tree data structure. ID3 chooses features that will produce the largest reduction in entropy. It chooses features that will split the data in such a way that the resulting subsets are as pure(consisting of the optimal amounts of homogeneous elements) as possible.
The act of splitting a set into the purest subsets in terms of entropy is known as information gain or minimal entropy. IG(S,A) = H(S) — H(S|A) where S represents the set of all instances in the dataset, A represents an attribute of the instances, and H(S|A) represents the conditional entropy of S given the attribute A.
IG = H(parent set) — (Weighted Average)(H(child sets))
The reason for the utilization of maximum gain is that a pure subset will result in a more accurate prediction when it is used to make a decision at a leaf node in the tree, therefore resulting in more accurate predictions generated by the ID3 algorithm.
The Project:
Training
The first step for the project was to train the ID3 model. From a publicly released dataset, I was able to obtain health metrics of 70,000 patients consisting of characteristics like — height, weight, age, gender, cholesterol levels, glucose levels, systolic BP, diastolic BP, etc… to train the ID3 tree. The dataset consisted of labeled data, meaning that from the symptoms it was revealed whether or not the subjects had a cardiovascular disease.With optimal entropy reductions, the ID3 algorithm was able to create the ideal treeset that predicted from patient symptoms, whether or not the patient had a dieasese.
Testing
In order to test the accuracy and legitimacy of my ID3 decision tree, I used another labeled dataset consisting of 100,000 patients including the same categorical variables used in training(height, weight, age, gender, cholesterol levels, glucose levels, systolic BP, diastolic BP, etc..). My algorithm returned around 95,700 accurate predictions on whether or not a patient suffering from cardiovascular disease from their symptoms. From this single dataset I moved on to larger datasets, consisting of nearly 300,000 labeled subjects, and again the ID3 algorithm created a decision tree with a near 96% accuracy rate.
Key Takeaways:
This has been my first ML project, and I learned my first ML algorithm — ID3. I was very surprised how such a structured and organized, yet simple-to-understand process returned such an accurate output. In this project, I learned the magnitude of the power of basic data structures like trees — used in ID3 prediction, matrices — used in reading/splitting datasets, and linked lists — used in implementing the tree sets.
Sources
- Pathria, R. K.; Beale, Paul (2011). Statistical Mechanics (Third ed.). Academic Press. p. 51. ISBN 978–0123821881.
- Shannon, Claude E. (July 1948). “A Mathematical Theory of Communication”. Bell System Technical Journal. 27 (3): 379–423. doi:10.1002/j.1538–7305.1948.tb01338.x. hdl:10338.dmlcz/101429. (PDF, archived from here)
- Shannon, Claude E. (October 1948). “A Mathematical Theory of Communication”. Bell System Technical Journal. 27 (4): 623–656. doi:10.1002/j.1538–7305.1948.tb00917.x. hdl:11858/00–001M-0000–002C-4317-B. (PDF, archived from here)
- “Entropy (for data science) Clearly Explained!!!”. YouTube.
- MacKay, David J.C. (2003). Information Theory, Inference, and Learning Algorithms. Cambridge University Press. ISBN 0–521–64298–1.
- Schneier, B: Applied Cryptography, Second edition, John Wiley and Sons.
- Borda, Monica (2011). Fundamentals in Information Theory and Coding. Springer. ISBN 978–3–642–20346–6.
- Han, Te Sun & Kobayashi, Kingo (2002). Mathematics of Information and Coding. American Mathematical Society. ISBN 978–0–8218–4256–0.
- Schneider, T.D, Information theory primer with an appendix on logarithms, National Cancer Institute, 14 April 2007.
- Thomas M. Cover; Joy A. Thomas (1991). Elements of Information Theory. Hoboken, New Jersey: Wiley. ISBN 978–0–471–24195–9.
- Entropy at the nLab
- Ellerman, David (October 2017). “Logical Information Theory: New Logical Foundations for Information Theory” (PDF). Logic Journal of the IGPL. 25 (5): 806–835. doi:10.1093/jigpal/jzx022. Retrieved 2 November 2022.
- Carter, Tom (March 2014). An introduction to information theory and entropy (PDF). Santa Fe. Retrieved 4 August 2017.
- Chakrabarti, C. G., and Indranil Chakrabarty. “Shannon entropy: axiomatic characterization and application.” International Journal of Mathematics and Mathematical Sciences 2005.17 (2005): 2847–2854 url
- Compare: Boltzmann, Ludwig (1896, 1898). Vorlesungen über Gastheorie : 2 Volumes — Leipzig 1895/98 UB: O 5262–6. English version: Lectures on gas theory. Translated by Stephen G. Brush (1964) Berkeley: University of California Press; (1995) New York: Dover ISBN 0–486–68455–5
- Życzkowski, Karol (2006). Geometry of Quantum States: An Introduction to Quantum Entanglement. Cambridge University Press. p. 301.
- Sharp, Kim; Matschinsky, Franz (2015). “Translation of Ludwig Boltzmann’s Paper “On the Relationship between the Second Fundamental Theorem of the Mechanical Theory of Heat and Probability Calculations Regarding the Conditions for Thermal Equilibrium””. Entropy. 17: 1971–2009. doi:10.3390/e17041971.
- Jaynes, E. T. (15 May 1957). “Information Theory and Statistical Mechanics”. Physical Review. 106 (4): 620–630. Bibcode:1957PhRv..106..620J. doi:10.1103/PhysRev.106.620.
- Landauer, R. (July 1961). “Irreversibility and Heat Generation in the Computing Process”. IBM Journal of Research and Development. 5 (3): 183–191. doi:10.1147/rd.53.0183. ISSN 0018–8646.
- Mark Nelson (24 August 2006). “The Hutter Prize”. Retrieved 27 November 2008.
- “The World’s Technological Capacity to Store, Communicate, and Compute Information”, Martin Hilbert and Priscila López (2011), Science, 332(6025); free access to the article through here: martinhilbert.net/WorldInfoCapacity.html
- Spellerberg, Ian F.; Fedor, Peter J. (2003). “A tribute to Claude Shannon (1916–2001) and a plea for more rigorous use of species richness, species diversity and the ‘Shannon–Wiener’ Index”. Global Ecology and Biogeography. 12 (3): 177–179. doi:10.1046/j.1466–822X.2003.00015.x. ISSN 1466–8238.
- Massey, James (1994). “Guessing and Entropy” (PDF). Proc. IEEE International Symposium on Information Theory. Retrieved 31 December 2013.
- Malone, David; Sullivan, Wayne (2005). “Guesswork is not a Substitute for Entropy” (PDF). Proceedings of the Information Technology & Telecommunications Conference. Retrieved 31 December 2013.
- Pliam, John (1999). “Selected Areas in Cryptography”. International Workshop on Selected Areas in Cryptography. Lecture Notes in Computer Science. Vol. 1758. pp. 62–77. doi:10.1007/3–540–46513–8_5. ISBN 978–3–540–67185–5.
- Aoki, New Approaches to Macroeconomic Modeling.
- Probability and Computing, M. Mitzenmacher and E. Upfal, Cambridge University Press
- Batra, Mridula; Agrawal, Rashmi (2018). Panigrahi, Bijaya Ketan; Hoda, M. N.; Sharma, Vinod; Goel, Shivendra (eds.). “Comparative Analysis of Decision Tree Algorithms”. Nature Inspired Computing. Advances in Intelligent Systems and Computing. Singapore: Springer. 652: 31–36. doi:10.1007/978–981–10–6747–1_4. ISBN 978–981–10–6747–1.
- Jaynes, Edwin T. (September 1968). “Prior Probabilities”. IEEE Transactions on Systems Science and Cybernetics. 4 (3): 227–241. doi:10.1109/TSSC.1968.300117. ISSN 2168–2887
- Rubinstein, Reuven Y.; Kroese, Dirk P. (9 March 2013). The Cross-Entropy Method: A Unified Approach to Combinatorial Optimization, Monte-Carlo Simulation and Machine Learning. Springer Science & Business Media. ISBN 978–1–4757–4321–0.