Heart Attack Exploratory Data Analysis & Machine Learning with Zero-True

Vinson Huang
Zero-True
Published in
4 min readMar 3, 2024

Every beat of your heart is a testament to the marvel of life. Your heart works tirelessly, pumping about 2,000 gallons of blood through the body each day, delivering oxygen and nutrients to cells while removing waste products. It’s safe to say that your heart is an incredible organ that sustains your life in more ways than you may realize.

Despite its incredible capabilities, the heart is not invincible. One of the most serious threats it faces is a heart attack. Heart attacks, medically known as myocardial infarctions, are among the leading causes of death worldwide, taking hundreds of thousands of lives each year. However, as technology evolves, we can use it to improve our understanding of heart health and develop better treatments and prevention strategies.

In this article, we will use Zero-True to explore the process of conducting Exploratory Data Analysis on heart attack data and creating a very simple machine-learning model we can use for prediction.

I’ve found a relatively clean and well-maintained data set on Kaggle to explore. It contains many columns:

  • age — Age (years)
  • sex — Sex (1: Male, 2: Female)
  • cp — Chest pain type (0: Asymptomatic, 1: Typical angina, 2: Atypical angina, 3: Non-anginal pain)
  • trestbps — Resting blood pressure (mmHg)
  • chol — Serum cholesterol (mg/dl)
  • fbs — If fasting blood sugar > 120 mg/dl (1: Yes, 0: No)
  • restecg — Resting electrocardiographic results (0: Hypertrophy, 1: Normal, 2: ST-T wave abnormality)
  • thalach — Maximum heart rate achieved (bpm)
  • exang—If the person has exercise-induced angina (1: Yes, 0: No)
  • oldpeak — ST depression induced by exercise relative to rest
  • slope — Slope of the peak exercise ST segment (0: Downsloping, 1: Flat, 2: Upsloping)
  • ca — Number of major vessels colored by fluoroscopy (0–3)
  • thal—Thalassemia (1: Fixed defect, 2: Normal, 3: Reversible defect)
  • target Heart disease (1: No, 0: Yes)

Getting Started

Let's begin by creating a Python cell in our notebook and importing all the needed libraries and the data set.

Note: The heart dataset is located in the same directory as the notebook.

import zero_true as zt
import pandas as pd
import plotly.express as px
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

heart = pd.read_csv("./heart.csv")

Data Visualization

To visualize our data, we can create a histogram for each column in this dataset by copy-pasting code, but why do that when you can use Zero-True’s Plotly Integration and UI components?

Let’s begin by creating an autocomplete input that allows the user to select which column to view:

axis_select = zt.Autocomplete(id="x_axis", label="Select Data to Visualize", items=heart.columns[0:len(heart.columns)-1])

You should notice a dropdown pop-up below your code cell. Pretty neat!

If the user selects a value, then we can convert the column to a string to prevent any weird x-axis scaling issues. After that, we can simply create a histogram using Plotly Express and display it in our notebook using Zero True!

if axis_select.value:

heart[axis_select.value] = heart[axis_select.value].astype(str)

fig = px.histogram(heart, x=axis_select.value)

zt.PlotlyComponent.from_figure(id='plotly', figure=fig)

Now, you can instantly generate a histogram chart for all the columns in this dataset with a few simple lines of code!

There are many other ways to use Plotly to visualize data in Zero-True, but that is a task left up to the reader :)

Training & Testing

We’ll be training a simple Logistic Regression model.

First, begin by preparing the dataset for training:

X = heart.iloc[0:, 0:-1].values
y = heart.iloc[:,-1].values

We can then split the dataset into training and testing sets:

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=42)

Finally, we train the model and test its accuracy

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=42)

model = LogisticRegression(max_iter=10000)
model.fit(X_train, y_train)
pred = model.predict(X_test)

print(accuracy_score(pred, y_test))
> 0.8852459016393442

As you can see here, the Logistic Regression model we trained has an accuracy of about 88%. Not bad!

Try it Yourself!

You can utilize the UI components from Zero-True to create your own inputs to test your model.

Here’s the gist of what we’ll implement:

We’ll use Zero-True’s from_dict method to map values so that the inputs are user-friendly. In this example, we’re creating a dictionary for the person’s sex and displaying the keys in Zero-True’s SelectBox.

sex_dict = {"Female":0, "Male":1}
user_sex = zt.SelectBox.from_dict(id="sex", label="Sex", items_map=sex_dict)

For cases without the need for dictionaries, such as NumberInputs, we can simply create a UI component (ex. age) where the user can input numeric values:

user_age = zt.NumberInput(id="age", label="Age")

After ensuring all the fields are filled, we can finally use the user inputs to make a prediction:

user_inputs = [user_age.value, sex_dict.get(user_sex.value, None)]

filled = all(i is not None for i in user_inputs)
user_data = [float(i) for i in user_inputs if filled]
if user_data:
pred = model.predict([user_data])
if pred[0] == 0:
zt.Text(id="result", text="Heart Disease")
else:
zt.Text(id="result", text="No Heart Disease")

Of course, the example code I provided above won’t work, since you’ll need to make an input for every column. Here’s the link to my finished notebook if you want to check it out for yourself. Please keep in mind you should not use this to actually diagnose someone.

Voilà! We can now test out our own data!

Conclusion

There you have it! We’ve created a dynamic and interactive notebook using Zero-True to visualize and test data from a model! Check out Zero-True on Github if you liked this tutorial!

--

--