Over-fitting and Under-fitting explained

Shashank Bhagat
Analytics Vidhya
Published in
4 min readJun 22, 2020

In this article, we would be discussing about Bias, Variance, Under-fitting, Over-fitting and their trade-off using both Regression and Classification problems.

In very simple terms, Bias can be summarized as the error calculated from the Training data. Similarly, Variance can be summarized as the error calculated from the Test or unknown data for the Machine Learning model. The Error is the difference between the expected prediction and actual prediction.

Formula for error calculation

A relation between an independent variable, X and the dependent variable, Y can be shown as,

Y = f(X) + c

Where c is the error term.

We can consider f^(X) as the derived model from f(X) using linear regression or any other modelling techniques. The following calculates the Squared Error for a data point, x in the sample space, X.

We would be considering regression models with varying degree of polynomials. So, when degree of polynomial is 1, it a linear regression. For degree of polynomial as 2, it is a quadratic regression and so on.

Let’s consider a few data points on the graph.

Training data points plotted to show over-fitting and under-fitting

With training data, we would be calculating the Bias.

As per the figure for polynomial degree=1, we can see that a line runs through the training data points. The summarized error calculation or the difference between the points and the line would be higher is this case. So, we can term it as High Bias.

For figure with polynomial degree=2, we can that a curve tries to fit very close to the data points. In this case we would have a considerably lower Error value and can be termed as Low Bias.

For figure with polynomial degree=3, the curve fits perfectly with the training data points. So, we would have a very low error value and this would also be termed as Low Bias.

Training and Test data points plotted to show over-fitting and under-fitting

In the above figure, we have added test data points. With test data, we would be calculating the Variance.

Figure with degree=1, we can see that, similar to the training data, the line does not fit with the test data as well. In this case the error value would be high and can be termed as High Variance.

Figure with degree=2, the line fits quite well with the test data. The error value would be low and would a Low Variance as well.

Figure with degree=3 worked perfect with the training data but if it does not fit well with the test data, then the error value for test data would be high. This would be termed as High Variance.

To summarize the above explanation, the model with a high bias and high variance faces with a problem of Under-fitting. Models with low bias and high Variance have to deal with Over-fitting problems. A good model would always have low Bias and low Variance.

This is also called the Bias-Variance trade-off.

From a Classification model perspective, let us consider 3 different models with their error percentages.

Following graph plotted with the degree of polynomials vs the error would also give a clear picture for creating the generalized model.

Degree of polynomial vs Error

Every generalized model having a low Bias and low Variance would have the Error Value calculated as per the following formula,

Here the summarized Error is formed by the Bias, Variance an Irreducible Error also termed as the noise in the data set.

I hope this would give you an insight of one of the most fundamental concept of machine learning models. Thank you for reading!!

--

--

Shashank Bhagat
Analytics Vidhya

A Software Developer and a Machine Learning Enthusiast