Bias-Variance Tradeoff: A quick introduction

Shaayan Hussain
Analytics Vidhya
Published in
5 min readAug 19, 2020

The bias-variance tradeoff is one of the most important but overlooked and misunderstood topics in ML. So, here we want to cover the topic in a simple and short way as possible.

Let’s start with basics and see why it is important and how this concept is to be used. We want to keep this crisp so we’ll talk in pointers at times. By the end of this, you would know:

  • Bias
  • Variance
  • Their relationship
  • importance of their tradeoff
  • how to analyze the model condition and take necessary steps

So what this Bias- Variance tradeoff exactly has to do with performance?

You build a model. The model doesn’t perform well. You want to improve the performance but don’t know where to start.

A diagnosis is important as it pin-points the areas of improvement. You need to clearly identify the components which are leading to a poor model.

Issue: Bad model performance

Focus area for the fix: Prediction error

Before jumping to the topic, just know this.

Total Error = Bias^2 + Variance + Irreducible error

a. Total error = Prediction error that we are trying to minimize

b. Bias error = Difference between the average prediction of model and the correct prediction

c. Variance = Variability of a model prediction for a given data point (difference in results for the same data point if training data is changed).

d. Irreducible error = It is the inherent error of data which is caused by the distribution of data and other specification. It is just the way the data is and basically nothing can be done about it.

Okay, these are formal definitions. How to visualize them and understand them in normal terms.

Goal — Low Bias, Low Variance

Fig 1: Bias-Variance representation

Let’s see each of the possible combinations and understand each of them practically with the above representation.

a. High Bias, high Variance: Worst Case- Results not close to the target(High Bias) and not even consistent in any direction(High variance).

b. High Bias, low variance: Results not close to the target (High Bias) but consistent in one direction(Low Variance).

c. Low Bias, high Variance: Results close to the target (Low Bias) but not consistent around the target(High Variance).

d.Low Bias, low variance: Best Case- Results close to the target (Low Bias) and consistent around the target(Low Variance).

Now the question is why it is a tradeoff. Why not simply go and get low bias low variance. This is because of the way bias and variance are related, each comes at the cost of other. When you try to improve one, the other gets worse. Like if you cook on low flame, it takes forever. You increase the flame, food starts burning. You have to find a point where both are balanced.

Fig. 2

Ideal model: Learns the underlying patterns in training data just optimally and creates a generalized algorithm that can work with similar unseen data as well.

Overfitting: The model makes a very highly fitting algorithm tailored for the training data specifically. Thus, it cannot stand variations that come with unseen data.

An overfitting model can be understood as a “Frog in the well” who became too comfortable in the present scenario(training data) but its present understanding won’t help to survive a different surrounding(test data).

Underfitting: The model makes a very loose-fitting algorithm that can’t even work for the training data as it couldn’t learn the patterns as it oversimplified everything. Thus it cannot give correct answers.

An underfitting model is a person who thinks he learned a skill by just taking the intro session and learning buzz words or he became a cricket player just because he knows how to hit a ball.

You can read the detailed explanation below:

https://medium.com/analytics-vidhya/understanding-how-machine-learning-is-just-like-the-human-learning-process-801a0bca3e56

The goal was to build a model that gives-

  • Right results most of the times.

Models with Overfitting have high variance and ones with Underfitting have a high bias.

What do I keep in mind regarding these to solve them in real-time?

  • Identify whether your model suffers from overfitting or underfitting. Use the train-test accuracy of the model for this.
  • Take measures as follows once the issue is identified.

a. Problem: High Variance(This will be solved the way overfitting is solved)

Let’s see each solution and how exactly it is solving the issue.

  • Add more training data: You have learned very data specific. Here’s more data for increasing your general understanding so that it is no longer data specific.
  • Data augmentation: I don’t have much data. Let me modify current data to create more variations and present them to you for your better understanding.
  • Reduce the complexity of the model: You have learned unnecessary stuff. These specific details are not required. Retain only what can be applied everywhere and let go of rest to simplify.
  • Bagging(stands for Bootstrap Aggregating): You are giving different answers every time I change the training data a little. Let me randomly sample the data and give to you all the samples. You create predictors and train on each sample and get all the different results you can. Put together all learning by aggregating all the results and give me one final answer which will remain consistent.

Note: The different predictors need to have minimum correlation so that they make “different errors”(not to be confused with the model, we have 1 model having different predictors that gives results of different samples).

b. Problem: High Bias(This will be solved the way underfitting is solved).

  • Add features: You gave a result that Person A won’t be able to repay the loan because he is old(feature). You are saying this because an old Person B couldn’t repay it. But you also need to see their annual income, past history, etc(other features) and then decide.
  • Use a model with higher complexity: We need to replace you with someone who can understand the relation between different parts of data and how they work together better than you.
  • Boosting: I don’t trust you. You create predictors and ask them each to answer. We’ll ask each predictor about the logic they used to get their partially right answers. Whenever we get some part right, we’ll add that logic in the rule. Each one will have their shortcoming but together, they will cover up for each other. They’ll work as a team to finally create a well-fitting complex rule.

Note: The team of weak learners should have a minimum correlation between them, otherwise everyone would have the right answers for the same sections and some sections will be left answered incorrectly.

Hope this helped to understand the topic and gave the understanding to leverage the concept as well.

Let us know your feedback. Thanks for reading!

Sources:

Fig 1, Fig 2: http://scott.fortmann-roe.com/docs/BiasVariance.html

--

--