Week #6 Heart Disease Detection

Harun Alperen Toktaş
bbm406f19
Published in
2 min readJan 13, 2020

Introduction:Hello everyone again! Last week, we talked about the decision tree algorithm and results. We said that the decision tree algorithm causes overfitting. This week, to solve this overfitting problem we will talk about random forest algorithm.

Random Forest:

The random forest algorithm is a more flexible version of the decision tree algorithm. It provides a better generalization and more accurate classification of new samples. Therefore, it provides an improvement in the accuracy rate.

At each step of the random forest algorithm, we create a new data set by selecting random samples from the original data set and run the decision tree algorithm on this new data set.

The important thing to note here is that the data set we created contains the same number of samples as the original data set, and we can select the same sample more than once (Completely random).

When constructing a decision tree on these randomly selected data sets, we must use randomly selected feature subsets at each step. The number of features that subsets contain is a parameter and should be optimized.

Randomly creating the decision tree in this way will result in a wide variety of trees.Variety is what makes the random forest algorithm more efficient than the single decision tree algorithm.We repeat this decision tree creating process a number of times (for example, 100).

When testing a new instance to which class it belongs, we test it on all the random trees that we have created and vote for the majority.

Results:

In our 1st data set, random forest algorithm gave 93.55 accuracy.
In our 2nd data set, random forest algorithm gave 86.39 accuracy.
In our 3rd data set, random forest algorithm gave 71.92 accuracy.

For all three data sets, the random forest algorithm yielded better results than the decision tree algorithm

With this latest work, we have completed our project.As a result of the studies we have made, the best result has been given by the random forest algorithm as you can see from the graph.We found this to be normal because the data set contained continuous and discrete features.Generally, you can see the comparative results from the table for 3 data sets.We ended our posts with this blog.See you in other studies.

--

--