A Personalized AI Approach to Alleviating Hypertension with Causal Modeling

Using Causal Modeling and Reverse-Counterfactual Reasoning to Generate Personalized Lifestyle Recommendations for People with Hypertension

Image for post
Image for post

For many people, it starts with simple chest pain.

Nothing to worry about.

They think it’s the type of pain that just comes and goes, like a ghost. A couple weeks later, after a standard checkup, they discover they have hypertension or high blood-pressure. They’re a little worried now, but not too much. They say to themselves, “Oh, my uncle/father/brother had hypertension, and he’s totally fine.

Image for post
Image for post
Many people can’t make the entire switch over. image source.

People who have hypertension know that their lifestyle is a big factor in their condition, but they simply don’t know how to change. Keto diets and daily workouts just seem too much and too far away; they’re big broad strokes that aren’t unique to them.

So these people with hypertension don’t worry about it, continuing to take medication but never trying to eradicate the root cause of their condition.

And so they die a couple years later from heart failure or stroke.

Image for post
Image for post
Hypertension can lead to a stroke, heart failure, aneurysm, or even dementia. image source.

If only people with hypertension knew the minimal actions they could personally take to get rid of their hypertension, they could have changed their lifestyle before it was too late. Most people don’t want to drastically change their diet or health for a problem they can’t fully comprehend. What they want to know is the least amount of actionable change that they must take in order to resolve their situation.

A goal without an actionable plan is just a dream, and for nearly half of the adults in the US (108 million people!), this is their reality.

Nearly 108 million people with hypertension in the US don’t know the actionable, discrete steps they can take that will personally reduce their hypertension. Because of this, they never actually strive towards a healthy life without hypertension: that dream just seems too far away.

With causal modeling, we can turn this dream into a reality.

I created a causal-modeling AI algorithm that outputs the minimum, actionable lifestyle changes a person can individually take in order to recover form hypertension.

Before I describe the specifics of the model, it should be made clear that this AI model does not claim to cure hypertension. Hypertension is a result of genetic, epigenetic, and lifestyle causal factors, and this algorithm only accounts for lifestyle factors. These genetic and epigenetic causal factors have yet to be found, but in the future…? Well, that’s a topic for another article.

The Approach

The approach creates a causal model from medical profile data in order to build and understand causal relationships between certain lifestyle factors (like physical activity, alcohol intake, etc.) and hypertension.

Then, given this causal model, data from an individual medical profile is inputted, and a reverse-counterfactual reasoning approach generates the minimum, numerical changes that needs to be made to the medical profile in order to not be at risk for hypertension.

These numerical changes can then be translated to personal, actionable items for the patient. This approach aims to provide a personalized health plan for people trying to change their lifestyle and recover from hypertension.

A Reflection on Causality

A causal model is the best approach to generating lifestyle recommendations because hypertension and many other preventable chronic diseases are causal conditions. The world itself runs on causality: how much you exercise along with many other factors affects the health of your heart. If we can concretely understand the complex causal relations that make up hypertension, we can cure them.

However, causal modeling is definitely not the easiest approach to generate personal health plans. And the reason causal modeling is so hard is because of data and the complexity of causal relationships.

First, causal modeling needs tons of data. Without data, any causal relationships found will most likely not be accurate or meaningful. And getting this data can be tricky, especially when trying to understand complex conditions like hypertension. For example, there could be a causal relationship between hypertension and stress. But, what’s an efficient way to measure stress?

Second, causal models can get very, very complex. Every time we try to establish a causal relationship, we have to account for all the other variables that affect this relationship in order to avoid bias. For example, when trying to establish the causal relationship between early, interventional treatment and the health of pre-mature twins, I had to account for 26 confounders. This problem is called confounding, and it is central to causality. Confounding is such a hard problem to solve because there are an infinite amount of confounders to account for and screening them involves human intelligence.

Image for post
Image for post
the (visually unpleasant) graph of the causal model between treatment, infant health, and the 26 confounders.

Think about hypertension. Does the amount you sit down everyday affect the health of your heart? It could, so we should probably account for it. On the other hand, we probably shouldn’t account for the number of trees in your city because there is most likely no causal relationship between the two. Researchers have to utilize their own existing knowledge and understanding in order to account for all possible confounders in their causal model, but this gets really tricky because (a there are so many variables they have to account for and (b their existing knowledge could be very incomplete.

Preprocessing and Sourcing the Data

I used the NHANES 2013–2014 Dataset to provide the data for the medical profiles that the causal discovery algorithms would train on. I screened for the variables that are already known to have some causal relationship with hypertension like nutrition, physical activity, and alcohol usage.

Stratification: Confounders are variables that affect both the outcome and the intervention, and in this case, variables like race, gender, and age are confounders. For example, your age could affect how much you exercise which affects the health of your heart. Therefore, we have to account for these confounders by stratifying the dataset to make sure each confounder is equally present in the dataset.

Image for post
Image for post
If we don’t account for confounders, we may end up with a misleading causal relationship like the one above, which shows that the more you exercise, the greater your cholesterol. Once we account for age, we can see the true causal relationship.

I stratified the dataset by age, race, gender, and obesity by making sure each category of each confounder was present enough in the data. This means I made sure there were enough medical profiles representing each confounder so that the algorithm had enough profiles to train on. I also made sure there was an equal amount of people who had hypertension and people who didn’t.

Categorical Variables: A lot of the variables present in the dataset were categorical variables. In order to analyze a dataset with both categorical and continuous variables, the categorical variables need to be “dumbed” down. If a certain variable — for example, race — has 5 categories, that variable needs to have 4 columns representing those 5 categories, with each column representing a different race. The reason 4 columns are used to represent 5 categories is because the 5th category can be represented by the negation of all 4 columns (meaning that all 4 columns have 0 as their data-entry).

Image for post
Image for post
The third row is an example of the 5th category (other, in this case) because all 4columns have 0 as their entry.

Dimensionality Reduction/Latent Variable Modeling: In the data, there were many proxy-variables — variables that try to quantify something that isn’t quantifiable i.e. physical activity or socio-economic status. I used Principal Component Analysis (PCA) to not only reduce the dimensionality of the dataset but also create latent variables that essentially represented the variable that the proxy-variables were trying to model. For example, I used PCA on five variables that represented physical activity to turn them into 2 principal components. These principal components also acted as latent variables that represented physical activity in total.

Image for post
Image for post
I used PCA to turn the five variables representing physical activity into two principal components. (The numbers on the sides are index numbers)

Causal Modeling

I created several causal models from the dataset with three different causal-discovery algorithms: LiNGAM, GES, and PC.

Image for post
Image for post
an (extremely messy) matplot rendering of the causal model discovered by the PC algorithm.

Interestingly, no conclusive causal relationship was found between the hypertension variable and other lifestyle factors. I believe this is because the hypertension variable is a binary variable that fails to encapsulate the continuous condition that hypertension actually is. This belief is supported by the fact that causal relationships were found between systolic and diastolic blood pressure (continuous variables that are the indicators for hypertension) and other lifestyle factors. If your blood pressure is consistently greater than 120/80 mmHg (systolic on top, diastolic on bottom), you have hypertension. The hypertension variable is just the binary classifier that is a result of systolic and diastolic blood pressure (and indeed the only causal relationships found with hypertension were between systolic blood pressure, diastolic blood pressure, and whether the patient took medication for hypertension or not).

With the three causal models for people with hypertension, I aggregated all the causal links found between systolic and diastolic blood pressure. Right now, the links discovered just indicate that there is a causal relationship present, but those links do not indicate their actual weights (with the exception of the LiNGAM algorithm). I can take these causal links and discover their weights through standard linear regression.

The three causal models showed that systolic blood pressure was causally affected by confounders — gender, age, and race — and also the principal components for smoking and physical activity.

Diastolic blood pressure was found to be causally affected by confounders — gender, age, and obesity (but not race) — and also fat consumption, the average amount of salt used during meals, and the principal components for physical activity.

With linear regression, we can generate equations that describe systolic and diastolic blood pressure as they relate to certain lifestyle factors.

However, the equations generated were far from accurate. After trying many different linear-regression approaches, the best R² score achieved for the equation modeling diastolic blood-pressure was 2% and the best R² score achieved for systolic was 12%. The reason we care about the R² score so much is because it is a good indicator of how predictive a model is, and we want to be able to predict systolic and diastolic blood pressure with our causal equations.

The reason this happened is largely due to a lack of data, and not a failure in causal modeling or linear regression techniques. In fact, a correlation matrix of the data shows that hypertension, systolic, and diastolic blood pressure were never that correlated with anything but themselves and confounders (like age and race).

Image for post
Image for post

For example, there is little correlation between systolic and diastolic blood pressure and sodium intake. However, this correlation has already been linked in previous statistical studies. Because a correlation that has already been proven is not found in this dataset, it can be concluded that there isn’t enough data to create valid causal relationships.

The dataset only contained 4408 patients, which is far from enough to create conclusive causal relations. Future steps will include integrating data from other studies, such as future NHANES studies and other longitudinal health studies, in order to establish valid causal relationships.

Recommendation Generation

Even though the causal equations we generated are largely inaccurate, we can still use them as a proof-of-concept for the idea of recommendation generation and reverse counterfactual reasoning.

Standard counterfactual reasoning involves predicting an alternate reality given an intervention. Every time you ask “What would happen if I…?” you are counterfactually reasoning. Counterfactual reasoning is something that is used everywhere in our daily lives and is essential to human planning and rationale.

Reverse counterfactual reasoning involves predicting how to create an alternate reality. In standard counterfactual reasoning, this alternate reality is the ‘dependent variable’: it’s affected by the ‘independent variable’: the intervention. However, in reverse counterfactual reasoning, this alternate reality is the ‘independent variable’ and the ‘dependent variables’ are the interventions: the variables that have to be changed in order to make that alternate reality come true. When you ask “How can I live a better life?” that’s a reverse counterfactual question. You’re setting your alternate reality to be true (a better life) and trying to find the interventions you can make to make that reality come true. Essentially, reverse counterfactual reasoning is a combinatorial problem: it asks the question which combinations of variables will lead to the desired result.

Reverse counterfactual reasoning is quite a bit harder than standard counterfactual reasoning. It not only requires a complete causal understanding of the situation in order to predict alternate realities but also a higher-cognitive ability to choose between all its predictions.

I implement reverse counterfactual reasoning as a proof-of-concept in a pretty dumb way. I essentially find the combination of the lifestyle variables that result in the healthiest blood pressure by creating a search space of all combinations and picking the best one according to some criteria (e.g. the minimum change that leads to lower than 80mmHg for diastolic blood pressure). The search space in this case specifically isn’t too high because the causal equation did not include many lifestyle factors (only physical activity and smoking for systolic blood-pressure!) The results from this implementation were pretty meaningless because the causal equation was not accurate, but the approach seems to work in general.

I am also currently looking into the possibilities of using a genetic algorithm or reinforcement learning approach for reverse counterfactual reasoning.

The Future

This personalized AI approach utilizing causal modeling to generate personal health plans for people with hypertension is just a proof-of-concept for a larger vision.

Hypertension itself does not cause death. Hypertension leads to preventable chronic diseases like heart disease which cause death.

These preventable chronic diseases like heart disease, obesity, and diabetes are the leading cause of death and disability in the United States. And every year, more than 1.7 million people die because of these chronic conditions.

Similar to hypertension, people die from these preventable chronic diseases because they don’t know the personal steps they should take in order to recover from or avoid their condition.

This is the vision of Topi, a moonshot company that aims to create a world where 100% of preventable chronic diseases are prevented.

Image for post
Image for post

Hypertension is just the first step. Eventually we want to create causal models for all of these chronic diseases in order to generate personal health plans.

Topi utilizes causal modeling of complex genomic, epigenetic, and lifestyle data to generate personalized health plans for you. Learn more about Topi here.

This approach towards alleviating hypertension with causal modeling is a proof-of-concept for Topi that solely utilizes lifestyle variables (and does not include genetic and epigenetic factors). I plan to create causal models of all these factors and then integrate them in order to create a complete causal understanding of hypertension.

Stay tuned for the future as I update and continue working on causal modeling and Topi :)

Thanks so much for reading! If you want to talk or connect with me, shoot me an email at kevinrowang@gmail.com

Written by

Hey, I’m Kevin! 15-year old innovator super passionate about Artificial General Intelligence. Interested in both global challenges and philosophical problems ;)

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store