Bias-Variance TradeOff

Abhigyan
Analytics Vidhya
Published in
3 min readMay 24, 2020

In machine learning, the bias–variance tradeoff is the property of a set of predictive models whereby models with a lower bias have a higher variance of the parameter across dataset, and vice versa.

To understand this clearly, Let’s see what bias and variance means.

What is Bias?

  • Bias is an error introduced in the model due to oversimplification of the machine learning algorithm.
  • It can lead to under-fitting. When you train your model at that time model makes simplified assumptions to make the target function easier to understand.
  • Bias is basically the difference between your model’s expected predictions and the true values.

Low bias machine learning algorithms — Decision Trees, k-NN and SVM

High bias machine learning algorithms — Linear Regression, Logistic Regression

  1. Low Bias: Suggests less assumptions about the form of the target function.
  2. High-Bias: Suggests more assumptions about the form of the target function.

Refer this article to understand what over-fitting and under-fitting models mean.

What is Variance?

  • Variance is error introduced in your model due to complex machine learning algorithm, your model learns noise also from the training dataset and performs badly on the test dataset.
  • It can lead to high sensitivity and over-fitting.

low-variance machine learning algorithms include: Linear Regression, Linear Discriminant Analysis and Logistic Regression.

high-variance machine learning algorithms include: Decision Trees, k-Nearest Neighbors and Support Vector Machines.

  1. Low Variance: Suggests small changes to the estimate of the target function with changes to the training dataset.
  2. High Variance: Suggests large changes to the estimate of the target function with changes to the training dataset.

Normally, as you increase the complexity of your model, you will see a reduction in error due to lower bias in the model. However, this only happens until a particular point. As you continue to make your model more complex, you end up over-fitting your model and hence your model will start suffering from high variance.

Bias, Variance trade-off:

The goal of any supervised machine learning algorithm is to have low bias and low variance to achieve good prediction performance.However,you cannot reduce both at once.

  • Increasing the bias will decrease the variance.
  • Increasing the variance will decrease the bias.

But it can be tackled by few algorithms-

  1. The k-nearest neighbor’s algorithm has low bias and high variance, but this can be changed by increasing the value of k which increases the number of neighbors that contribute to the prediction and in turn increases the bias of the model.
  2. The support vector machine algorithm has low bias and high variance, but this can be changed by increasing the C parameter that influences the number of violations of the margin allowed in the training data which increases the bias but decreases the variance.

Like my article? Do give me a clap and share it,as that will boost my confidence.Also,I post new articles every Sunday so stay connected for future articles of the basics of data science and machine learning series.

Also,Do connect with me on linkedIn.

Photo by Frederick Tubiermont on Unsplash

--

--