Evaluating a model on overfitting and underfitting

Ofcourse if there is more than one independent variable it is difficult to visualize the underfit or overfit. But then, this can be inferred if you plot the “Learning Curve”.

Learning Curve is the plot of the Jtest and Jtrain wrt the training set. If there is a high bias, there will be very less difference between the plot of Jtest and Jtrain. If there is high variance, there will be big difference between the plot of Jtest and Jtrain.

This is the plot of training set vs the hypothesis. The plot shows a high bias. Number of training set is 1553.

High Bias hypothesis.
Different plots of Jtest, Jcv, Jtrain w.r.t Number of training sets.

In all the three cases, it is obvious that placement of Jtest, Jcv, Jtrain are entirely dependent on the random selection of the feature matrix. We never known how the these will fair against each other. All depends on the nature of the independent variables (features). But one thing will be constant! That is, the Jtest, Jcv and Jtrain will be close to each other for a high bias case.

Following is another hypothesis plot with a high bias. Number of training set is 12.

And following are the Jtest and Jtrain plot for the above data set —

All the above cases are of high bias. Always note that magnitude of the cost values might be misguiding. Only focus on the relative distance between the Jtest and Jtrain. They showing up low, indicates a high bias.

Here is another example with number of training set as 50. This also is high bias.

The plot of the cost functions for the above set is as follows —

Now, consider a case of high variance, where we are using polynomial degree of 8.

Plot of 8th degree polynomial

The learning curve is as follows. The difference between the Jtest and Jcv has an increasing trend. It is not converging. Hence here the there is a big difference between Jcv and Jtrain, whic is a sign of high variance.

Learning Curve for 8th degree polynomial

--

--