Machine Learning Decision Trees Implementation

with ID3 algorithm using Python and XML

Tarun Gupta
The Startup

--

In this post, we are going to have a look at a program written in Python3 using pandas and XML libraries dicttoxml, lxml. We are going to use XML to properly represent Decision Tree as a hierarchically nested structure.

I will explain the workings of the code part by part, how every part of the code works. Here are the links to the whole code hosted on GitHub along with the dataset used in the example.

We are going to use Entropy as the impurity measure and Information Gain as the decision measure.

→ Entropy measures the impurity of S.

Entropy(S)=0 if all examples are in the same class and

Entropy(S)=1 if the same amount of positive and negative examples is selected.

But in this particular formula, we are using the base of log equal to the number of distinct classes in our dataset. This is done in order to constraint the entropy space between 0 and 1.

→ Gain is expected reduction in entropy due to sorting on a node A.

--

--

Tarun Gupta
The Startup

A simple fellow writing stories, sharing experiences, sharing his perspective, trying to do his share of humanity.