It has always been hard for me to understand what this term represents. Whenever I think I came across only one definition, High Bias means underfitting and High Variance means overfitting. It’s necessary to understand Bias and Variance as Its quite an important concept in deciding which algorithm to use to solve the problem.
Before discussing Bias-Variance Tradeoff, we have to know the basic concept of this Topic. So let’s start with basic.
What is Bias?
Bias is the simplifying assumptions made by a model to make the target function easier to learn.
- Low Bias: Predicting less assumption about Target Function
- High Bias: Predicting more assumption about Target Function
Examples of low-bias machine learning algorithms include Decision Trees, k-Nearest Neighbors and Support Vector Machines.
Examples of high-bias machine learning algorithms include Linear Regression, Linear Discriminant Analysis, and Logistic Regression.
What is Variance?
Variance is the amount that the estimate of the target function will change if different training data was used.
- Low Variance: Predicting small changes to the estimate of the target function with changes to the training dataset.
- High Variance: Predicting large changes to the estimate of the target function with changes to the training dataset.
Examples of low-variance machine learning algorithms include Linear Regression, Linear Discriminant Analysis, and Logistic Regression.
Examples of high-variance machine learning algorithms include Decision Trees, k-Nearest Neighbors and Support Vector Machines.
Till now we have some basic knowledge about Bias and Variance, now let's know some other important terminology used in Bias and Variance that are Underfitting and Overfitting.
underfitting happens when a model unable to capture the underlying pattern of the data. These models usually have high bias and low variance. It happens when we have very less amount of data to build an accurate model or when we try to build a linear model with a nonlinear data. Also, these kind of models are very simple to capture the complex patterns in data like Linear and logistic regression.
overfitting happens when our model captures the noise along with the underlying pattern in data. It happens when we train our model a lot over the noisy dataset. These models have low bias and high variance. These models are very complex like Decision trees which are prone to overfitting.
Why is Bias-Variance Tradeoff?
If our model is too simple and has very few parameters then it may have high bias and low variance. On the other hand, if our model has a large number of parameters then it’s going to have high variance and low bias. So we need to find the right/good balance without overfitting and underfitting the data.
This tradeoff in complexity is why there is a tradeoff between bias and variance. An algorithm can’t be more complex and less complex at the same time.
There is no escaping the relationship between bias and variance in machine learning.
- Increasing the bias will decrease the variance.
- Increasing the variance will decrease the bias.
In reality, we cannot calculate the real bias and variance terms because we do not know the actual underlying target function. Nevertheless, as a framework, bias and variance provide the tools to understand the behavior of machine learning algorithms in the pursuit of predictive performance.
In this post, we discuss basic of Bias, Variance, Overfitting, UnderFitting, And Bias-Variance Tradeoff.
You now know that:
- Bias is the simplifying assumptions made by the model to make the target function easier to approximate.
- Variance is the amount that the estimate of the target function will change given different training data.
- Trade-off is tension between the error introduced by the bias and the variance.
Do you have any question about the topic discussed above then Ask it in the comment section. I will do my best to answer.
Thank you for reading! Clap If u like!