Machine Learning Crash Course: Part 4 — The Bias-Variance Dilemma


Source: Brian Stacey, Fukushima: The Failure of Predictive Models
Notice there’s no kink this time, so the line isn’t as steeply sloped on the right.


While the linear model (the blue line) follows the data (the purple X’s), it misses the underlying curving trend of the data. A quadratic model would have been better.

The Bias-Variance dilemma

Our example of underfitting from above. The blue line is our model, and the purple X’s are the data that we are trying to predict.
Approximating data using a very complex model. Notice how the line tends to over-interpolate between points. Even though there is a general upward trend in the data, the model predicts huge oscillations.
An example of high “variance” in everyday life from xkcd. ‘Girls sucking at math’ in this case is an overgeneralization based on only a single data point.

Explaining the Dilemma


Our “perfect” function added to noise is what we end up measuring in the real world.
This noise manifests as disorganized data points. What we want to do is recover the perfect function (in green) from these data points.

Resolving the Dilemma

Cross validation with k=3. We divide the entire training set into three parts, and then train our model three times using each of the three parts as the validation set with the remaining two parts making up our actual training set.


Machine Learning @ Berkeley

Written by

A student-run organization at UC Berkeley working on ML applications in industry, academic research, and making ML education more accessible to all

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade