Machine Learning Decision Trees Implementation
with ID3 algorithm using Python and XML
In this post, we are going to have a look at a program written in Python3
using pandas
and XML libraries dicttoxml, lxml
. We are going to use XML to properly represent Decision Tree as a hierarchically nested structure.
I will explain the workings of the code part by part, how every part of the code works. Here are the links to the whole code hosted on GitHub along with the dataset used in the example.
We are going to use Entropy as the impurity measure and Information Gain as the decision measure.
→ Entropy measures the impurity of S.
Entropy(S)=0 if all examples are in the same class and
Entropy(S)=1 if the same amount of positive and negative examples is selected.
But in this particular formula, we are using the base of log equal to the number of distinct classes in our dataset. This is done in order to constraint the entropy space between 0 and 1.
→ Gain is expected reduction in entropy due to sorting on a node A.