Understanding Generalization Error in Machine Learning

3 min readOct 28, 2018

What determines the model’s ability to react to new unseen data?

Definition

Firstly, let’s define “generalization error”.

In supervised learning applications in machine learning and statistical learning theory, generalization error (also known as the out-of-sample error) is a measure of how accurately an algorithm is able to predict outcome values for previously unseen data. wikipedia

Notice that the gap between predictions and observed data is induced by model inaccuracy, sampling error, and noise. Some of the errors are reducible but some are not. Choosing the right algorithm and tuning parameters could improve model accuracy, but we will never be able to make our predictions 100% accurate.

Bias-variance decomposition

An important way to understand generalization error is bias-variance decomposition.

Intuitively speaking, bias is the error rate in the world of big data. A model has a high bias when, for example, it fails to capture meaningful patterns in the data. Bias is measured by the differences between the expected predicted values and the observed values, in the dataset D when the prediction variables are at the level of x (X=x).

In contrast with bias, variance is an algorithm’s flexibility to learn patterns in the observed data. Variance is the amount that an algorithm will change if a different dataset is used. A model is of high variance when, for instance, it tries too hard that it not only captures the pattern of meaningful features but also that the meaningless error (overfitting).

Mathematical Notations

Using regression as an example, we have

Now we could decompose generalization error

Interpretation

Bias measures the deviation between the expected output of our model and the real values, so it indicates the fit of our model.

Variance measures the amount that the outputs of our model will change if a different dataset is used. It is the impacts of using different datasets.

Noise is the irreducible error, the lowest bound of generalization error for the current task that any model will not be able to get rid of, indicating the difficulty of this task.

These 3 components above determine the model’s ability to react to new unseen data rather than just the data that it was trained on.

Bias-Variance Tradeoff

Bias-Variance Tradeoff as a Function of Model Capacity

source of the figure

Generalization error could be measured by MSE. As the model capacity increases, the bias decreases as the model fits the training datasets better. However, the variance increases, as your model become sophisticated to fit more patterns of the current dataset, changing datasets (even if they come from the same distribution) would be impactful. As a data scientist, our challenge lies in finding the optimal capacity — where both bias and variance are low.

Reference

Z. Zhou. (2016). Machine Learning (Chinese Edition). Tsinghua University Press; 1st edition (January 1, 2016).

I am a master student in Data Science at University of San Francisco. I write blogs to explain data science concepts or issues that I think important or interesting.

Welcome to connect with me on LinkedIn!