The Bias-Variance Tradeoff

Published in

The Startup

7 min readOct 13, 2020

How to reduce Bias and Variance error in your model

In the process of building a Predictive Machine learning model, we come across the Bias and Variance errors. The Bias-Variance Tradeoff is one of the most popular tradeoffs in Machine Learning. Here, we will go over what Bias error and Variance error are, sources of these errors and how you can work to reduce these errors in your model.

How does Machine Learning differ from traditional programming?

The high school definition of a program was simple. A program is a set of rules that tells the computer what to do and how to do it. This is one of the main difference between traditional programming and Machine Learning.

In traditional programming, the programmer defines the rules. The rules are usually well defined and a programmer often has to spend a good amount of time debugging code to ensure that the code runs smoothly.

In Machine Learning, while we still we still code, we do not define the rules. We build a model and feed it our expected results (supervised ML) or we allow the model come up with its own results (unsupervised ML). The main focus in Machine Learning is to improve the accuracy of the initial guess the model makes.

Due to the fact that there are no set rules and the model has to make guesses, Bias and Variance error creep in, depending on how simple or complex these guesses or assumptions are.

The types of errors in the prediction of a model are:

Bias error
Variance error
Irreducible error

The predictive error of a model can be calculated as seen below:

Prediction error = Bias error + Variance error + Irreducible error

We can work to improve our model by reducing the Bias and Variance error, but not the irreducible error.

Bias — Variance Tradeoff

The Bias-variance errors are indirectly proportionate. Increasing the Bias will decrease the Variance and vice versa.

Can you guess which of the images has Bias error and which has Variance error?

If you guessed A has high Bias and B has high Variance, you’re right.

If our target value is Y and our features are denoted by x:

Y= f(x) + e
where e is the error which is normally distributed.

The error is the difference between the predicted value and the actual value. We square the errors so that negative errors do not cancel out positive errors i.e. the Sum of Squared Errors.

If we compare the errors for the training data for A & B using sum of squared errors, A will have a higher error on the training data. B has zero error or very little as the line(model) fits too closes to the training data.

However, when we compare the errors for A & B, this time on the testing data, the error of B increases drastically while the error of A remains fairly constant. Our ideal model is C. It represents the model just well enough without having too much variance or bias for both the training and testing data.

Bias Error

When a Machine Learning model is unable to capture the true relationship between the features and target of the data, we have an error called Bias.

The Machine Learning model makes assumption based on the available data. If the assumption is too simple, then the model may not be able to accurately account for the relationship between the features and target of the data thereby producing inaccurate predictions.

Mathematically, Bias can be defined as the difference between the Predicted values and the Expected values.

Linear models such as Linear Regression and Logistic Regression which make simple assumptions have high bias. While models such as Decision Trees and Support Vector Machines have low bias.

When the model is too simplistic such that it does not consider even the important features of the training data, the model can be said to be underfitted.

Underfitting occurs when the model cannot accurately fit the training data and therefore performs poorly on training data.

Underfitting occurs when the model has high bias but low variance, a result of an excessively simple model.

Reducing Bias

Change the model: One of the first stages to reducing Bias is to simply change the model. As stated above, some models have High bias while some do not. Do not use a Linear model if features and target of your data do not in fact have a Linear Relationship.
Ensure the Data is truly Representative: Ensure that the training data is diverse and represents all possible groups or outcomes. In the event of an imbalanced dataset, use weighting or penalized models. There has been discussion on the poor accuracy of facial recognition models in identifying people of color. One possible source of such error is that the training dataset was not diverse and the model did not have enough training data to clearly identify persons of color.
Parameter tuning: This requires an understanding of the model and model parameters. Algorithms documentations are a good place to start. Every model has a list of parameters which it takes as inputs. Tweaking these parameters may give you the desired results. You can also build your own algorithms from scratch.

Variance Error

In a bid to avoid Bias error, be careful not to fall into another error called Variance error. The Bias error and Variance error are polar opposites.

While the Bias error is caused by simplified assumptions or the model being unable to account for effect of features on the target; Variance error is as a result of the model making too complex assumptions.

In other words, the model has fit too closely to the training data. While it may return a high accuracy on training data, it does not necessarily translate into a good predictive model. In fact, it may even result in higher errors.

The problem with the model fitting too closely to the training data is that it also captures noise or features which should in fact not be taken into consideration.

This can also result in a phenomenon known as Overfitting.

Overfitting occurs when a model has high variance and low bias. When a model fits too well with the training dataset such that it captures noise, it is said to have Overfit the training data. This will negatively impact the predictive power of the model.

The training data refers to the data which the model is built on. Remember, in machine learning we carry out a form of history matching. The model is trained on past data — labeled or unlabeled — and when our model is ready, we introduce new data to it.

For validation, we split the available data into training and testing set. The testing data is used to validate the performance of the model.

If our model returns a high accuracy on training data but performs poorly on testing data, we can denote that the model has fit too closely to the training data and can therefore not generalize on new data.

Reducing Variance Error

Ensemble Learning: A good way to tackle high variance is to train your data using multiple models. Ensemble learning is able to leverage on both weak and strong learners in order to improve model prediction. In fact, most winning solutions in Machine Learning competitions make use of Ensemble Learning.
Train Model with More Data: This sounds tricky. Why add more data when the variance is high? More data increases the data to noise ratio which reduces the variance of the model. Also, when the model has more data, it is better able to come up with a general rule which will also apply to new data.

Conclusion

Now we have come to an end. We have defined Bias and Variance errors, their sources and steps you can take to reduce these errors in your model.

Remember, in simple terms:

Bias is as a result of over simplified model assumptions
Variance occurs when the assumptions are too complex

Written by Anita Igbine