How to make peace between Bias and Variance?

Bias-Variance arguing constantly to destroy your model? Read on to know how to balance both…

Aishwarya Nair
DataDrivenInvestor
Published in
4 min readFeb 23, 2019

--

This post gives a detailed description of all the points mentioned in the InShort story. If you just want to revise or short of time you can view the series here https://medium.com/series/eca8ea2d6653

What is Bias?

Bias refers to erroneous assumptions by the model about the data. A High Bias or Underfitting means that the model is not able to capture the trend or pattern in data. It is usually caused when the hypothesis function is too simple or has very less features.

How to identify High Bias?

It performs poorly on training and test set because it is unable to identify patterns in the data. Evaluation metrics like accuracy, f1 score of such models who suffer from High Bias is very low as the difference between the predicted and actual values are large.

How to Fix High Bias?

We can increase features or perform feature engineering to add more meaningful factors to the data. This can help the model understand data well. Increasing the degree of polynomial in the hypothesis function can also help combat high bias because models with High bias are too simple and increasing the degree of the polynomial can increase the complexity thereby reducing Bias. But only up to a certain point can you increase the complexity, because after that certain point, cross-validation error starts increasing (Read ahead to know why?). You can also try decreasing the alpha parameter of Regularization (https://medium.com/@aishanair21/the-art-of-regularization-caca8de7614e).

What is Variance?

Variance refers to the ability of the model to measure the spread of the data. High variance or Overfitting means that the model fits the available data but does not generalise well to predict on new data. It is usually caused when the hypothesis function is too complex and tries to fit every data point on the training data set accurately causing a lot of unnecessary curves and angles unrelated to data.

How to identify High Variance?

A model with High variance performs very well on training set but poorly on testing or cross-validation set. It is unable to generalise and performs poorly on any data set which it has not seen before. Hence, the training accuracy will be high and test accuracy will be low.

How to Fix High Variance?

You can reduce High variance, by reducing the number of features in the model. There are several methods available to check which features don’t add much value to the model and which are of importance. Increasing the size of the training set can also help the model generalise. Decreasing the degree of the polynomial can help decrease the model complexity and fix the problem of high variance. Regularization is a popular method used to overcome the problem of Overfitting(https://medium.com/@aishanair21/the-art-of-regularization-caca8de7614e).

How to maintain a balance of Bias and Variance?

Increasing the bias can decrease the variance whereas increasing the variance can decrease the bias. How can we achieve the perfect or the optimum point for a good model?

Photo credits: ebc (http://www.ebc.cat/author/eduard-bonadagmail-com/)

As shown in the above figure, there exists a point where the cross-validation error starts going upwards because of the variance increasing and bias decreasing. This is the exact point where the model needs to stop increasing its complexity and use all the parameters defined by that point in the curve. Usually, this is where the Bias and the Variance curves intersect creating the optimal model complexity point. At this point, the model has low bias and low variance without leading to underfitting or overfitting the model.

If you liked my article and are looking for more such posts on Data Science in Layman terms with minimal Math, please clap or follow me on medium. If you have queries you can connect with me on LinkedIn (https://www.linkedin.com/in/aishwarya-nair-21091994/). Thanks for reading all the way till here and stay tuned for more!

--

--

🦄 Data Scientist @trivago. Logical, Rational and Analytical. An ENTJ woman who tries to conquer the world one data point at a time …