
…rs) that are as “pure” as possible. This means that as we are building the decision tree, we always choose the split that maximizes the amount of information we can conclude. More concretely, we choose a value such that each region is largely made up of data points from on…
…its a regression line to the data points of each region, creating a jagged piecewise line. However, trees constructed this way are more prone to overfitting, especially in regions with fewer data points, because noise is weighted more than it should be.