The Bias-Variance Tradeoff

Dr. Roi Yehoshua
AI Made Simple
Published in
6 min readFeb 27, 2023

--

The bias-variance tradeoff is an important concept in machine learning, which represents the tension that a model has between its ability to reduce the errors on the training set (its bias) versus its ability to generalize well to new unseen examples (its variance).

In general, as we make our model more complex (e.g., by adding more nodes to a decision tree), its bias decreases since the model adapts itself to the specific patterns and peculiarities of the training set (learning the training examples “by-heart”), and consequently the model loses its ability to generalize and provide good predictions on the test set (i.e., its variance increases).

Formal Analysis

The errors in a model’s predictions can be decomposed into three components:

  1. Intrinsic noise in the data itself. This noise may be caused due to various reasons, such as internal noise in the physical devices that generated our measurements, or errors made by humans that entered the data into our databases.
  2. The bias of the model, which represents the difference between the model’s predictions and the true labels of the data.
  3. The variance of the model, which represents how the model’s predictions vary across different training sets.

In the following sections we are going to prove the following statement:

Prediction Error = Bias² + Variance + Noise

--

--

Dr. Roi Yehoshua
AI Made Simple

Teaching Professor for Data Science and ML at Northeastern University | Top Writer in AI | 200K+ Views on Medium | https://www.linkedin.com/in/roi-yehoshua/