[ML Shot of the Day]: Discretization of Continuous Attributes

Handling Continuous features in Decision Trees

Choosing the optimal splitting point for continuous attributes in Decision Trees

Pritish Jadhav
Geek Culture
Published in
4 min readJun 5, 2021

--

Unsplash

A Crash Course on Decision Trees and Splitting Measures:

  • Decision Trees and its variants, Random Forests, XGBoost, CatBoost are popularly used in the Machine Learning world (including competitions).
  • Training a Decision Tree for a classification problem involves recursively splitting the data into smaller subsets until each node contains data belonging to a single class.
  • Different measures (Information Gain, Gini Index, Gain ratio) are used for determining the best possible split at each node of the decision tree.

Splitting Measures for growing Decision Trees:

  • Recursively growing a tree involves selecting an attribute and a test condition that divides the data at a given node into smaller but pure subsets.
  • The measures used for determining the best split computes the degree of impurity of the child nodes.
  • Computing the impurity of child nodes with respect to that…

--

--