Using Support Vector Machine (SVM)Classifier in Python to Predict Heart Disease with Framingham Dataset

Abdul Qureshi
4 min readNov 30, 2019

--

In this study we will study the Framingham data set and will try to predict the classification of the TenYearCHD disease with various features and then we will predict how does our model fares based on the various factors.

First we will read the data using Python read.csv() function and then will look at its structure using .info() function and look at the summary using describe() function and look at first few rows using head() function as shown below:-

Next we will look at how the variables correlate with each other using the corr() function.

It looks like that age, sysBP and prevalentHyp has strong positive correlation with TenYearCHD.

Now we will drop the values that have na’s.

In the code below, we will be creating the numeric variables pipelines and also set the missing values to medians and also split the data into train/test split of 80% to 20% and also set a random state so that we can reproduce the results if we want to.

In this step below we will transform and fit the variables based on the parameters as stated above.

Next we will set the categorical variables pipeline and also set the missing variables as the most frequent imputer and also set the categorical variables to use a onehot encoder.

Next we fit the pipeleine for the categorical variables.

In the next step , we combine the numerical and categorical pipelines and combine these and set the train set and pre process the same and set the X and y variables set.

Next, I will run the model on the train and test set using SVC( Support Vector Classifier) and then use the test set to see what kind of prediction results we get using the test data set for the Support Vector Machines as well as the Random Forest Model as well.

As we can see that the Random Forests model is better than that of the SVM Model by a slight margin as shown above and below graphically as well.

Below we can also see that as we group the TenYearCHD vs age, we see as the age group goes up, the chances of TenYearCHD also goes up s well.

References:

  • code reused from the references below.

https://www.amazon.com/Hands-Machine-Learning-Scikit-Learn-TensorFlow/dp/1491962291

--

--