Analyzing Bias — Variance Tradeoff, but theoretically.

Queen

Published in

ml-concepts.com

8 min readMar 11, 2022

This article explains Bias — variance tradeoff technically, but in a non-technical style.

Let’s start this way,

An English professor will define Bias to you as prejudice, or an unfair judgment. This same professor will most definitely define variance to you as friction, conflict, or change. Our English professor would also define tradeoff as balancing opposite factors that can’t be balanced at the same time. There’s nothing wrong with our English professor’s definition.

However, a machine learning engineer might only agree with the definition of ‘tradeoff’, and explain Bias, as the difference between the average prediction of a model and the absolute Value from the training data. Could also define Variance as the difference between the prediction of different models from one data point to another. Our ML engineer’s interpretation of ‘Bias and variance’ will barely differ from the interpretation of a statistician.

We are not about to talk about statistics though. Hold on a minute!

A little Background Story

When a machine learning model is built, the accuracy of the model is measured by the relative closeness of the value predicted by the model, to the actual value. The difference between these values (predicted and actual value) is known as error. However, errors can be used to evaluate a machine learning model. These errors are grouped into reducible error and irreducible error. While irreducible errors occur as a result of natural randomness and variability in the model, which can also be known as noise. Reducible errors are errors that can be reduced to ensure higher accuracy of the model. Quite simple!

Furthermore, bias and variance errors are classified as reducible errors. I hope this is beginning to make more sense?

Let’s see a dummy example of how a supervised machine learning algorithm works, to give us a better understanding. If (A = 1, B = 2, C = 3…) were given to us as a dataset and this was input into our model. The expected response (actual value) should be, (D = 4). But, our model outputs (D = 3) as its predicted value, which is obviously wrong. The difference between the actual value and the value predicted by our model is known as, ‘BIAS’. If this difference is small, (D = 3, from our dummy example above) the model has a low level of bias, but in a case where the difference is significant, (if D = 44) the model has a high level of bias, which draws up a conclusion on the assumption of the simplicity of the model, which nonetheless, has its advantage. Regardless, low-bias models are generally preferable.

https://developers.google.com/machine-learning/crash-course/training-and-test-sets/splitting-data

Bias generally occurs as a result of insufficient flexibility in modeling the training data. This however has its advantage, as certain models with a certain bias towards a certain dataset are better at modeling that dataset, and might do worse at modeling other datasets. You can fit our English professor’s definition of bias as prejudice here and it should be fine.

“While bias error can be associated with the train error. Variance, is more affiliated with the test error”.

Using the same dummy example that was used in the previous paragraph, bias error occurred based on simplicity because the model has insufficient data to train, resulting in the predicted output value, whereas, in the case of variance, the model has too much to learn from the training data, and as a result of that, unable to accurately predict. The more sensitivity in the training data provided, the higher the variance. Hence, the larger the variability. While, the less sensitive the training data is, the less variance error in the system.

This is why the model can work accurately with the training data and result in errors with the test data.

Furthermore,

We, however, cannot talk about Bias and variance without involving the concept of overfitting and underfitting.

There is always a perfect story to explain these concepts. In my attempt to make this as simple as possible. I hope you grab the intent?

So, I went shoe shopping with my siblings at our favorite store. But, on that Saturday afternoon, after my siblings and I unanimously chose and agreed to get matching pairs of 3 sets, the sales attendant returned with only 2 pairs of sneakers (Size 36 and Size 40) instead of 3. Unfortunately, size 38 was out of stock. But, that’s my shoe size. Size 36, my younger brother’s shoe size would have been too small for my legs to fit into except I probably fold my toes in, which would have been uncomfortable (Underfit). Or my elder sister’s size 40, which would still have available room for a third leg if I fit both of my legs in just one of her shoes because the shoe is too big for my leg (Overfit).

Now, let’s relate this to machine learning models, bias and variance. Let’s put the models in our shoes. Literarily!

Technically,

“While bias error can be associated with the training error. Variance is more affiliated with the test error”. The higher the training error, the higher the bias. The lower the test error, the lower the variance.

When a model is provided with a training dataset, and it fits perfectly such that the training error is close to zero(Low Bias). But when different test data is supplied, there are significant differences and variations in their predicted value. (High variance). *If you did not get this. You need to read the ‘little background story’ again*

This model stated above with low bias and high variance is Overfitting. This is a perfect case of, “you don’t want to be in my shoes. It’s not your size. Too wide for your legs to fit in”. Imagine if I had hurriedly worn size 40 in the store and it looked manageably fit. Thus, upon getting home and wearing it, I see how uncomfortable wearing a bigger size would really be. In relation to the machine learning concept here, the model did so well with the training data, like I thought size 40 could manageably fit. The model learned so much from the training data, in fact, too wide to learn from. But, with the test data, it performed very poorly.

But, when a simple model is provided with insufficient training data (which leads to High Bias), and because this model is unable to properly learn from the training data, the predictions of the test data will vary inaccurately as well (High Variance). Such a model is Underfitting. In another case, if I decide to just follow my guts and go for size 36, my younger brother’s shoe size, without even trying it on to see if it fits, it shouldn’t come as a surprise to me if I am unable to wear the shoe for the planned function because my leg couldn’t fit into the small size. I did not even try wearing it at all before buying it. Again, in relation to our concept of underfitting models here, because the model is supplied with insufficient training data, the training isn’t enough to learn from, as a result, the test data cannot predict accurate results.

Of what use is learning from different training datasets, if the result still won’t be accurate (Overfitting)?

Why learn at all if there is insufficient training data to learn from (Underfitting)?

Tradeoff…

Now, where is the balance?

How can Bias and Variance be ever balanced? Increasing bias decreases variance, and decreasing bias increases variance, they seem to be on opposite sides. Well… not totally.

If I cannot get my perfect shoe size (38). It’s either I reduce my elder sister’s shoe from size 40 to size 38 to fit me, or I add 2 feet to my younger brother’s to make it a perfect fit for me.

Obviously, an overfit or underfit machine learning model isn’t an appropriate algorithm to model datasets, and predict accurate output value. Rather, a model with a balanced fit.

A model that has sufficient training data to learn from, such that the training error is close to zero (Low Bias), and while this same model prediction of different test data is not scattered, or inaccurate, doesn’t vary significantly. (Low Variance). Such a model is perfect and accurate. A Balanced fit.

This is how bias and variance can both be balanced in a supervised machine learning model when they are lowered. There are also different methods in ensuring our model is a balanced fit.

Using the significant proportion of dataset, such that the training data is neither too cumbersome for the model to properly learn from, nor too simple for the model to learn enough.
Using an appropriate algorithm depending on the dataset.
Because ensemble methods is the combination of multiple models to give a higher predictive model, it is a suitable method of ensuring a balanced fit.
The choice of model has to be suitable. Can't be too simple to prevent the model from making simple assumptions (Bias). Can't be too complex, to avoid variance error in our machine learning model.

Points to note:

Bias is measured by how close a model’s prediction to the truth is.
Variance is measured by the difference in the scattered values predicted by a model’s test data.
Low bias is directly proportional to more flexibility in a model.
High variance is a result of models that are too simple, which results in underfitting.
In the case of simple models, adequate and diverse training data should be supplied.
Overfitting is a result of models that are too complex.
Ensemble methods like bagging reduce variance.
High Variance and Low Bias = Overfit Model.
Low Variance and High Bias = Underfit.
Low variance and Low Bias = Balanced fit.
An accurate machine learning model is a model with a balanced fit.

I hope I was able to simplify bias, variance, bias-variance tradeoff, overfitting, and, underfitting?