…which features to choose and what conditions to use for splitting, along with knowing when to stop. Decision tree tend to be very complex and overfitted — which means, the error of training set will be low, but high on the validation set. A smaller tree with fewer splits might lead to lower variance and better interpretation at the cost…
…ke a binary split we use different metrics — the most popular one are Gini index and Cross-entropy. Gini index is a measure of total variance across K classes. In regression problem we use variance or mean deviation from median
…ective, it is unjustifiable to use single point-estimates as weights to base any classification on.
Bayesian neural networks, on the other hand, are more robust to over-fitting, and can easily learn from small datasets. The Bayesian approach further offers uncertainty estimates via its parameters in form of probabilit…