Bias and Variance

May 13 · 3 min read

In this blog I will explain the concept of bias and variance

Let’s get clear about overfit and under fit,

Each and every data points in the training data are satisfied by best fit line but the same best fit line cannot satisfy the testing data. The inability of the best fit line to satisfy the testing data while satisfying the training data is called overfitting. In other words, Overfitting is a scenario in which a model perform so well over the training set and just as poorly on the test set

In underfitting, the error is very high with respective training data and testing data. In other words, underfitting is a scenario in which a model perform poor over the training set and test set.

BIAS — Error of training data

VARIANCE — Error of testing data

Bias and variance in regression:

Consider three model with degree of polynomial = 1,2,3 . Degree of polynomial with 1 have a straight best fit line, model with Degree of polynomial with 2 will be curve best fit line and model with DOP with 3 will have a more curvy line that tends to satisfy most of the training points than other two DOP.

When DOP(Degree of polynomial) is 1 ,error is high for both training and testing data

When DOP is 3 ,error is very low for training data and high for testing data.

when DOP is 2 ,accuracy is high for both training and testing data. This means variance(testing error) and bias(training error) both are low.

“Model with low bias and variance is the good model”

Bias and variance in classification:

In model 1, it is clear that testing error(variance) is high but training error(bias) is low so this condition is overfitting.

In model 2, testing error(variance) and training error(bias) are high so this condition is underfitting.

In model 3, testing error(variance) and training error(bias) are low so this will be the best model among all three.

Representation of bias and variance:

From the above diagram while plotting degree of polynomial against error value we got the above graph. As the degree of polynomial increases ,training error reduces and testing error reduces up to certain value and again starts increasing.

We should select the model which has both low training error(bias) and testing error(variance). The above diagram shows how to select the generalized model which has low bias and variance.

Bias and variance in decision tree and random forest:

A decision tree will overfit the data if we keep splitting until the dataset couldn’t be more pure. Initially decision tree will have low bias and high variance. While combining all trees, high variance get converted to lower variance during bootstrap aggregation

“Bias and variance tradeoff is done by opting from decision tree to randomforest”

Nerd For Tech

From Confusion to Clarification

Nerd For Tech

NFT is an Educational Media House. Our mission is to bring the invaluable knowledge and experiences of experts from all over the world to the novice. To know more about us, visit https://www.nerdfortech.org/. Don’t forget to check out Ask-NFT, a mentorship ecosystem we’ve started

Written by

sri hari

Student from Coimbatore Institute of Technology, R and D engineer trainee

Nerd For Tech

NFT is an Educational Media House. Our mission is to bring the invaluable knowledge and experiences of experts from all over the world to the novice. To know more about us, visit https://www.nerdfortech.org/. Don’t forget to check out Ask-NFT, a mentorship ecosystem we’ve started

Linear Regression

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app