Complexity parameter is essentially the kappa value

The complexity parameter (cp) in decision tree models is not as complex as it seems.

Michael Lai
7 min readFeb 5, 2024

Many people who are relatively new to data science might have been baffled by the complexity parameter (cp) in building a classification or decision tree (CART). This is an important controlling factor in the R package rpart when initially growing the tree before the pruning process. By studying it in detail, we can see that this is analogous to the kappa value used in evaluating the agreement between two observations.

Let’s explore the kappa first, and then come back to cp later.

Kappa in confusion matrix

You should have come across the confusion matrix in R’s caret package while evaluating the performance of any binary classification models. The following confusion matrix and statistics is copied from this bookdown webpage example, where a decision tree is built to predict which of the two orange juice brands (CH vs MM) a certain customer will buy.

The category CH is arbitrarily assigned as positive, with MM being negative. In the…

--

--

Michael Lai

HSP with an obsession with anything mind-challenging and mysterious.