Bias and Variance Tradeoff

Amol Chavan
Analytics Vidhya
Published in
3 min readJun 26, 2020

When it comes to Accuracy of model prediction, one must look at the prediction errors. Many of the Data Scientists have already proven that there is always tradeoff between bias and variance. A perfect modeller cannot go parallel with low bias and low variance. Understanding these errors helps in avoiding mistakes of underfitting and overfitting and thus to build meaningful models.

What is Bias and Variance?

Bias and Variance are the prediction errors. Can also be explained as the difference or Gaps between Actual Values and Our Predictions.

Bias

It is the difference between average prediction of our model and the actual value (which we are trying to predict). Below are the inferences that we can take out from Bias.

o High Bias : This means, there is a huge gap between the average of predicted values and actual value, so obviously our model is paying less attention to the given data.

o Low Bias: Gap between the average of predicted values and actual value is minimal, we can say that our model is performing well on the given data.

Okay, think it this way

People with biased opinions are more likely to make the wrong assumptions and thus predictions will be inaccurate. You can imagine a highly biased political person and how his decisions can put the entire nation in danger.

Variance

Variability of our model prediction for a given data point or is the change in prediction accuracy of ML model between training data and test data.

If my model is having a prediction accuracy (A) on training data and prediction accuracy (B) on test data then my Variance wo
If my model is having a prediction accuracy (A) on training data and prediction accuracy (B) on test data then my Variance would be (A — B)

o High Variance: This means, model is paying more attention to the training data and not performing good on the test data (which has not seen before)

o Low Variance: This means the difference between the prediction accuracies between the training and test data is low, so we can consider this model for prediction.

Underfitting and Overfitting

src: https://tutorialspoint.dev/

Graph-1: Gap or difference between the data points and prediction line is considerably high, thus High Bias. Majority or none of the data points are not on prediction line. Hence this is an example of Underfit.

Graph-3: Here we can see that prediction line is passing through each data point. This happens usually when we have High Variance. This model has given a more attention to Training data but chances of getting High error on unseen data. Since all the data points are on prediction line, this is an example of Overfit.

Graph-2: Now, this graph shows that the majority of data points are near to the prediction lines, so obviously there is a minimized error and hence low bias. This can be our perfect model.

Why is Tradeoff between Bias and Variance?

If my algorithm or model is very simple and has only few parameters then there are more chances of me, getting high bias and low variance which would be resulting in underfitting. However if my model is complex and has number of parameters then it is going to be a high variance, low bias which will put my model in overfitting condition. Both these situations can put my model in underfitting and overfitting.

So to avoid any of these situations, I must look for an optimum balance between bias and variance. It should neither carry more complexity and nor it should be more simplified at the same time.

src:Andrew Ngs course

Just have a look at above image, this explains the tradeoff. Bias increases, variance decreases and vice-versa.

This is how, knowing and understanding of prediction errors plays the importance role in Machine Learning.

Thanks for Reading and Happy Learning

--

--