UNDERFIT and OVERFIT Explained

Aarthi Kasirajan
5 min readJun 8, 2020
this article is going to cover Underfit and Overfit using Bias and Variance

The main aim in any model is to find the best fit line that satisfies most (if not all ) data points given in the dataset.

In a Regression model(for this case), the main aim here is to find the best fit line that satisfies the data points given in the dataset.

While in the progress of finding the best fit line, it doesn’t necessarily mean the line should cover every single point in the dataset. We, finally want a line that when extrapolated would predict the future data values accurately.

This best fit line is divided into 3 types depending on the precision level- Underfit, Good fit and Overfit.

graphical representation of underfit, overfit and goodfit

This best fit line is divided into 3 types depending on the precision level- Underfit, Good fit and Overfit.

The error produced from the training dataset is known as Bias and the error by testing data set is Variance. The aim of any model will be to obtain a low bias and low variance model.

Let us see one by one. In Underfit, the best fit line doesn’t cover many data points present. Thus, in the training dataset itself, there is a high chance of occurrence of error (high bias). And eventually in the testing dataset also (high variance).

In an overfit line, the best fit line covers every single data points. You might think isn’t that what we want?? But no, this may be well and good in the case of training dataset (low bias). But, when a testing dataset is provided to the same model, there will be a high error in the testing dataset (high variance).

But only a Good fit line, the best fit line will be in such a way that any point to be predicted will be accurately predicted. Consequently, it will still have a low bias and low variance.

finding the best fit through graphical representation

The best to find out when to stop the iteration of the model can be easily explained through a graph. In the above graph as you can see, initially both training and testing data has a high error. As the complexity increase, i.e. as iteration increases the training error reduces to an insignificant or sufficiently low value. And, on the other hand testing error reaches a low and the again starts increasing. The best fit line comes when both these parameters are sufficiently low.

collective understanding of bias and variance wrt Under and Over fitting

In the above figure, in an underfit model the predictions are far from the actual values having high bias and high variance. Whereas, in an Overfit model, the training data is predicted with high level of accuracy. But when a test data is input, the model is not able to predict the values precisely. Only in a best fit model both training and testing data is predicted accurately.

Looking at Underfitting and Overfitting through another aspect with the help of another example…

Let us look at the same scenario in through another example. A machine is trained (Supervised Learning) to learn the what is a ball and what is not? So, the machine is fed with many data where all kinds of ball images are input to the model. Now, the model has to learn what characteristics a ball has and how to recognize it. Let us now see how a Underfit, best fit and Overfit model would look like.

In the case of an Underfit, the model would detect a moon and an apple also as a ball because they both are also round in shape. Hence, the model is not able to identify the object properly.

The same, when the machine is fed with parameters like a ball will always be within a certain diameter in length, it will have lines across its surface. Then, the machine wouldn’t recognize a golf ball, TT ball as balls since the diameter of those balls would be smaller. Such a model would be Overfit model.

Now to understand the Underfit and Overfit much better. Let us look into another situation.

This is a typical classification problem where the two variables (in this case, dots and cross) are two be separated by a best line.

In case of an underfit model, the line is too straight and doesn’t account for many data points (i.e. high bias and high variance). Where as in consideration of Overfit line, the line is so accurate wrt the training data (low bias) that when the same model is input with a test data, the chances of the prediction of test data going wrong is high (i.e. high variance).

But with a good fit line, the training data is also classified with quite high accuracy and the testing data is also predicted with a fair percentage of accuracy (i.e. low bias and low variance).

To overcome this problem of Overfitting model, we will be introducing a penalty term to reduce the bias in training data and thus generalize the best fit line little further.

To do this, we are introducing Regularization methods namely Ridge and Lasso Regression.

--

--