# Elucidating Bias, Variance, Under-fitting, and Over-fitting.

O**verfitting, underfitting, and bias-variance tradeoff **are foundational concepts in machine learning. They are important because they explain the state of a model based on their performance. The best way to understand these terms is to see them as a tradeoff between the **bias** and the **variance** of the model. Let's understand the phenomenon of overfitting and underfitting.

**Overfitting** occurs when a statistical model or machine learning algorithm **captures the noise** of the data. Intuitively, overfitting occurs when the model or the algorithm fits the data too well. Specifically, overfitting occurs if the model or algorithm shows **low bias** but **high variance**. Overfitting is often a result of an excessively complicated model, and it can be prevented by fitting multiple models and using validation or cross-validation to compare their predictive accuracies on test data.

**Underfitting** occurs when a statistical model or machine learning algorithm **cannot capture the underlying trend** of the data. Intuitively, underfitting occurs when the model or the algorithm does not fit the data well enough. Specifically, underfitting occurs if the model or algorithm shows **low variance** but **high bias**. Underfitting is often a result of an excessively simple model.

Both overfitting and underfitting lead to **poor predictions** on new data sets.

Well, let's understand the Bias and variance in simpler terms. (**Very Simpler Terms!**)

**What is Bias?**

Bias is the difference between the average prediction of our model and the correct value which we are trying to predict. A model with high bias pays very little attention to the training data and oversimplifies the model.

**Simple definition:** “Resulted Error from Training Data!”

**What is a Variance?**

Variance is the variability of model prediction for a given data point or a value that tells us the spread of our data. A model with high variance pays a lot of attention to training data and does not generalize on the data which it hasn’t seen before.

**Simple definition:** “Resulted Error from Test Data!”

Well, to understand the concepts more clear and better, I have divided concepts into Two parts, Bias and variance in the case of **Regression** as well as **Classification** models.

Considering **Regression** models:

We can see clearly that the Model-1 and Model-3 are **Underfitting** and **Overfitting** respectively.

Model-1 has not captured the trends properly, or the model is too simple, hence it's obvious that the training and test accuracy will be hampered!

As we discussed earlier, **“Bias is Error resulted from Training set, while Variance is error resulted from Test set!”**. The Model-1 will have less train and test accuracy, I.e. Will have High Bias(**High Training error**) and High Variance(**High Testing error**).

Similarly, for Model-3, The model has trained too good on training data, the reason it fails for testing data(**Low test accuracy**). Since the training accuracy for Model-3 is High and Test accuracy is low, Model-3 will have Low Bias( **Low Training error**) and High Variance(**High Testing error**).

Considering Model-2, As the Model-2 is in the “**Just Right**” condition, the model has trained well on training as well as a test set respectively. The reason, model has High training accuracy (**Low Bias-low training error**) and High testing accuracy( **Low Variance-low testing error**).

Now, Let’s consider the condition for **Classification** models, Please have a look at the explained image below!

Here we have 3 Models, which have the following training and testing errors.

As we can see our Classification Model-1 has a Low Training error(2%), while has a high testing error(18%). As explained the concepts earlier we can conclude the model is having Low Bias(Low training error) and High Variance(High testing error), i.e. the **Model-1 is clearly Overfitting**!

Similarly, We can conclude our Classification Model-2 as clearly an underfitting model. Coming towards Model-3, This model shall be considered as the **Most Generalized or Most Recommended** model to train on!

Well, this was the explanation for Underfitting, Overfitting, Bais, and Variance for Regression and Classification Models respectively!

We are done with the explanation part, now let's have a look at the **graphical plotting **of these concepts. Please have a look at the figure below!

Considering Figure 3, the **dotted** line which passes through the points, are the points for which we should design our model, Which would be the “**Most Generalized Model**”.

This was all from my side! If you find this Blog helpful do Like(Clap!) this Blog, also comment on your views, If I missed any point. Because these reviews help me to grow and bring better content next time!

Also, Connect with me on **LinkedIn**! (** I love to connect with amazing people like you!** ).

Also, Follow and check-out **GitHub** for my work and project contributions!

**Thank you** for your precious time given for my blog!