Gentle Introduction to Cost-Sensitive Decision Tree

Zahra Ahmad
Analytics Vidhya
Published in
4 min readMar 10, 2021

--

In this article, I will present how decision tree can be used in imbalanced data set as a cost-sensitive supervised learning algorithm.

Photo by veeterzy on Unsplash

Decision tree is a commonly used algorithm for classification and regression. Decision tree for classification uses tree structure to classify the instances. The merit of decision tree is that it is highly readable.

As a tree structure, it consists of one root node, several internal nodes and several leaf nodes. Lead nodes determine results. Internal nodes are tests for feature values and instances in one node are allocated to its child nodes according to the feature test.

Root node contains all the instances. Therefore, each path from root node to leaf node is a test sequence and the goal is to learn a decision tree that with great generalization ability which deals with new instances never seen before.

Loss Function in Decision Tree

The loss function of decision tree is a maximum likelihood function with regularization. Selecting optimal tree from all possible trees is an NP problem. Therefore, heuristic strategy is always used to tackle this optimization problem to obtain a sub-optimal solution. The process of building a decision tree is a divide-and-conquer strategy, as following:

--

--

Zahra Ahmad
Analytics Vidhya

MSc in Data Science, I love to extract the hell out of any raw data, sexy plots and figures are my coffee