Data Analysis of UCI Heart Disease Dataset using Pharo

In this tutorial, we will do simple Data Analysis on a small part of the UCI Heart Disease dataset using Pharo.

First, we need to install DataFrame for Pharo. For this, go to the Playground of your new Pharo image, then type and execute the following script:

Metacello new
baseline: 'DataFrame';
repository: 'github://PolyMathOrg/DataFrame/src';
load.

Here’s the original dataset on Kaggle. For the sake of simplicity, we’ll take the first five attributes and the first three samples.

The attributes are:

age: age in years

sex: (1 = male; 0 = female)

cp: chest pain type

trestbps: resting blood pressure (in mm Hg on admission to the hospital)

chol: serum cholestoral in mg/dl

fbs: (fasting blood sugar > 120 mg/dl) (1 = true; 0 = false).

Let’s create a DataFrame with the above values:

heart := DataFrame
withRows: #(
(63 1 3 145 233 1)
(37 1 2 130 250 0)
(41 0 1 130 204 0)
(56 1 1 120 236 0)
(57 0 0 120 354 0)
)
columnNames: #(age sex cp trestbps chol fbs).
The created DataFrame

Now let’s begin the Data Analysis.

  1. Let’s see how many people are above 50 years old:
columnNames: #(age sex  cp  trestbps chol fbs).
heart select: [ :row |
(row at: #age) > 50 ].

We see that individuals 1,4 and 5 are aged above 50 years.

2. Normally, it’s said that women have higher heartbeat rates than men. Let’s check if this is true. For this, let’s take the median heartbeat rate per second of men and women:

heart
group: #trestbps
by: #sex
aggregateUsing: #average.

Here, we can see that the median heartbeat rate per second is higher for females than males, just as we expected.

3. Let’s look at the maximum and minimum cholesterol levels in men and women:

heart 
groupBy: #sex
aggregate: {
#chol using: #max as: #maxchol.
#chol using: #min as: #minchol}.

We can see that the maximum blood cholesterol level for men is 354db/ml and the minimum is 204db/ml, whereas, for women, the maximum is 250db/ml and the minimum is 233db/ml. We can see that men have both higher and lower serum cholesterol levels.