Lightbulb Moment: Bias vs. Variance in Statistics
My only formal statistics background was an introductory course at my university that blazed through material faster than most students could truly learn the concepts. So, as I have been trudging along on my journey to learn more about machine learning, I have come across — as I’m sure anyone else would have too if he or she is on the same learning path.
So, when I saw an article titled WTF is the Bias-Variance Tradeoff (Infographic)?, I thought “finally — something more my style.” If you are a visual learner [and interested in the bias-variance tradeoff, as strange as it might sound], I highly recommend checking out the article I linked above.

If most elementary science classes were like mine, you would’ve heard the concept of accuracy vs. precision explained at least once a year for ~8 years. If you’re not familiar with the concept, it’s talking about what is being depicted in the Figure 1 — just with different vernacular. So, when I saw the familiar dartboard images, the lightbulb suddenly clicked for me.
The relation to machine learning stems from the fact that there is always a trade off between bias and variance for various models in machine learning — think linear regression, neural networks, clustering algorithms.
The high bias models are ones that can be very precise (high levels of bias) while missing out on a certain level of accuracy (low levels of variance). This is because the models are not sensitive enough to the data. Soooo, what does that actually mean… how can a model lack sensitivity? Well, algorithms, like linear regression and naive bayes, tend to be on toward the less complex end of the machine-learning-algorithm-complexity-scale (err, if there was one). This is because an algorithm like linear regression would fail to predict any curvature in the data because it’s underlying assumptions are that it’s operating on linearly correlated data!
The high variance models are ones that can be quite accurate (low levels of bias) but lack the necessary precision (high levels of variance). These types of models can be too sensative to the noise in data and tend toward overfitting of training data. For these types of algorithms, think along the lines of random forests and nearest neighbors — much more complex than the run of the mill linear regression algorithm. Heck, even the names sound so much more intimidating! These types of algorithms would be on the more complex end of that magical machine-learning-algorithm-complexity-scale.
Soooo, now that we know these phenomena exist, how do we account for them? Well, truth be told, I don’t really know. It depends on the problem, the data, and the data scientist ;) However, there are a certain number of actions one can incorporate into a workflow to reduce both bias and variance, which contribute to overall total error. Think about trying to match the most appropriate algorithms, feature engineering model parameters, cross-validation, and tuning hyperparameters. Unfortunately for you and I, there is no one answer fits all, but then again, that would take the fun out of it.
Cheers!
