Bias Variance Irreducible Error and Model Complexity Trade off

Sanjay Singh
Nov 21, 2019 · 4 min read

Supervised Machine Learning can be summarized as shown below

Training data is fed to algorithm, which results in target function derivation. Test data is fed into this target function to get the prediction.

As an example, for simple linear regression algorithm follows equation h(y)=b0+b1X1.

Fig 1: Supervised Learning Model Summary

Algorithm uses training data to derive coefficients b0 and b1. Coefficient values are added to formula to derive the target function.

After coefficients are derived and target function is formulated, test data is passed to the target function to get prediction.

Below graph is a representation of simple linear regression. Blue stars represent training data and red star represent test/validation data.

Fig 2: Simple Linear Regression

The trend line in red is target function values for feature value X1. It is just a straight line calculated to reduce the loss using root mean square error (RMSE) equation.

This a simple algorithm with various assumptions. Biggest assumption is that the training data follows straight line. While following this assumption algorithm might not consider some of the data points (Ex. two blue start in top left) and mark them as noise or outliers. Such assumptions are called Bias.

Such assumption keep algorithm simple and generalization (straight line) is found easily. As it has not considered few data points, such cases are called Underfitting.

Fig 3: Underfitting

How to reduce Bias?

How about considering each data point and not assuming they follow straight line trend.

This will make the algorithm very complex and it will result in something like below. The problem in this approach is that there is no generalization. Algorithm has considered each and every data point and tried to match predicted values with the actual Y value. But here algorithm has learned too much and could not generalize. Such cases are called overfitting.

That means when it is given the test data (red star) it will not know what to do, because it doesn’t have any generalize trend to follow. This is called Variance.

Fig 4: Overfitting

Now if we redraw the summary shown in Fig 1, it will result as shown in Fig 5.

Fig 5: Supervised Learning Summary

Oh yes, we did not talk about Irreducible Error. These kind of errors are introduced at data source level. Consider one of the data source is an IOT device and it is not working well. It might send data with lot of noise. Such errors are called Irreducible errors.

Below is summary of different kind of errors.

Fig 6: Different kinds of errors

Based on what we saw so far, it looks like

  1. Simple algorithms have high Bias and low Variance
  2. Complex algorithms have low Bias and high Variance

If we were to plot this, it will look something like Fig 7.

Fig 7: Algorithm Complexity , Bias and Variance relationship

Simple algorithms like Linear Regression, Logistic Regression has high Bias but low variance.

Complex algorithms like Decision Tree, KNN, SVM have low Bias but high variance.

How to trade-off Bias and Variance? How to make use of best of both the worlds (Simple algorithm and Complex algorithm).

That’s for next article!!

Reference:

Sanrusha

Data Science, Machine Learning and Artificial Intelligence

Sanjay Singh

Written by

Sanrusha

Sanrusha

Data Science, Machine Learning and Artificial Intelligence

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade