Decision Trees using Sklearn

Published in

Analytics Vidhya

3 min readJan 24, 2021

“Just as electricity transformed almost everything 100 years ago, today I actually have a hard time thinking of an industry that I don’t think AI will transform in the next several years.” — Andrew Ng

With decision trees, AI can make well-thought decisions without any human interference. Decision trees is a supervised learning algorithm which is used to solve both regression and classification problems. It is a predictive modelling approach that gives a graphical representation of decisions and every possible outcome because of that decision.

The below diagram explains a typical structure of decisions trees:

The major challenge in decision trees lies in attribute selection for root node at each level. This is done by the following two metrics:

Entropy (Information Gain): Entropy is a measure of randomness in data points. Information Gain is change in Entropy. So, lower entropy means greater change and hence more information Gain. It is best suited for partitions with small counts but many distinct values.
Gini Index: Measure of how often a randomly chosen element would be incorrectly identified. Attribute with a lower Gini index is best. Also, it is best suited for larger partitions.

Now, I’ll explain the decision tree classifier with the help of IRIS Dataset

Always get to know about dataset before starting (I’ve used iris.DESCR for this). It helps in understanding what all the data includes and its data types.

Next step is always to figure out your independent and target variables. Once this is done, you can use sklearn library for making a decision tree classifier.

To understand how the above tree works to give predictions let’s use some examples.

Case 1:

Take sepal_length = 2.5 ,sepal_width = 1,petal_length = 1.5 ,petal_width =2 . Root node question is petal length<=2.45 which is True and hence, class is setosa.

Case 2:

Take sepal_length = 2.5 ,sepal_width = 1, petal_length = 2.46,petal_width = 2 . Root node question is petal length<=2.45 which is False and hence, we move to next question petal width <= 1.75 , which is also false. So next question is petal length<=4.85 which is True. Now the question come, sepal length<=5.95 which is also True and hence the class is versicolor.

To test the predictions, I’m using predict formula below.

Hope this helps you. If you have any questions, feel free to add comments or ping me at LinkedIn.

To check my projects and other articles, visit my website at https://anjali001.github.io/ , my bot “Grey” is there to welcome you.

Decision Trees using Sklearn

Written by Anjali Pal