Decision Trees (Part 1)

Dr. Roi Yehoshua
AI Made Simple
Published in
16 min readMar 15, 2023

--

Decision trees are a powerful and versatile machine learning model, which is based on simple decision rules that are inferred from the data set.

Decision trees can be used for various types of learning tasks (e.g., classification, regression, ranking), and they can handle different types of data (e.g., numerical and categorical data). In addition, they are very easy to understand and interpret even by non-experts, since the learned model can be visualized graphically (a “white-box” model).

Since decision trees are a fairly extensive topic, this article is split into two parts. The first part discusses what decision trees are and how to build them from a given data set. The second part will show you how to use the decision tree classes in Scikit-Learn, and explore more advanced topics, such as tree pruning and regression trees.

Decision Tree Definition

A decision tree is a tree in which each internal (non-leaf) node is labeled with an input feature and the edges coming out of it are labeled with each of its possible values. Each leaf node is labeled with one of the classes.

For example, consider the following decision tree for predicting whether an employee would be promoted based on their gender, seniority level, marital status and whether they have an academic degree or not.

A decision tree that predicts whether an employee will get a promotion

--

--

Dr. Roi Yehoshua
AI Made Simple

Teaching Professor for Data Science and ML at Northeastern University | Top Writer in AI | 200K+ Views on Medium | https://www.linkedin.com/in/roi-yehoshua/