Understanding Bias and Variance

Deepika Rana
Udacity-Bertelsmann AI Track Scholars
5 min readFeb 11, 2020

Applied deep learning is a highly iterative process where a cycle is repeated many times to find them a good choice of network for your application. One of the things that determine how quickly progress is made is the efficiency around the cycle. And splitting the data in terms of train, development and test sets can increase the efficiency.

Suppose we have a data set as shown in the figure. If a straight line is fit to the data, maybe logistic regression fit to that. This is not a very good fit to the data. And so this is class of high bias, so this is underfitting the data as shown figure(a). On the opposite end, if an incredibly complex classifier, maybe deep neural network, or neural network is made to fit, with all the hidden units, maybe the data is fit perfectly, but that doesn’t look like a great fit either. So there’s a classifier of high variance and this is overfitting the data as shown in figure(c). And there might be some classifier in between, with a medium level of complexity, that maybe fits it correctly like that. That looks like a much more reasonable fit to the data, so that is just right. It’s somewhere in between as shown in figure(b).

Considering an example of dog picture classification, where a dog is a positive example and a cat is a negative example, the two key numbers to look at to understand bias and variance will be the train set error and the dev set or the development set error.

For the sake of argument, assuming that recognizing dogs in pictures, is something that the human eye can do nearly perfectly, right?

Hypothetically let’s say, the training set error is 1% and the dev set error is 11%. So in this example, the algorithm performs well on the training set but doing relatively poorly on the development set. This is overfitting the training set, as it does not generalize well, to this whole cross-validation set in the development set. This implies that the assumed example has a high variance. Hence, by looking at the training set error and the development set error, one would be able to render a diagnosis that the algorithm has high variance.

Now, let us say for a different algorithm, training set error is 14% and dev set error is 15%. In this case, assuming that humans achieve roughly 0% error, then humans can look at these pictures and just tell if it’s a dog or not, then it looks like the algorithm is not even doing very well on the training set. This is not even fitting the training data that well, then this is underfitting the data. This implies that the algorithm has a high bias. But in contrast, this is actually generalizing at a reasonable level to the dev set, whereas performance in the dev set is only 1% worse than performance in the training set. But still, the problem of high bias persists.

Assuming another example, there is a 14% training set error, so that’s pretty high bias, but evaluating the dev set error it does even worse, around 30%. In this case, this algorithm is having a high bias, because it’s not doing that well on the training set, and high variance. So this has really the worst of both worlds.

And, if there is a 0.5% training set error, and 1% dev set error, then the dog classifier with only 1% error. This implies that the algorithm has low bias and low variance.

This analysis is predicated on the assumption, that human level performance gets nearly 0% error or, more generally, that the optimal error, sometimes called base error, so the base in optimal error is nearly 0%.

Looking at training set error and dev set error helps in diagnosing whether the algorithm has a bias or a variance problem, or maybe both. It turns out that this information helps in much more systematically going about improving the algorithms’ performance.

When training a neural network the path to be followed is :

After having trained an initial model, first thing to consider is does the algorithm have high bias?

If it does have a high bias, it does not even fit in the training set that well, some things to try is to pick a network, such as more hidden layers or more hidden units, or training it for a longer time. This may work, maybe it won’t. But there are a lot of different neural network architectures and maybe a new network architecture can be used that’s better suited for this problem. Maybe it works, or maybe not. Whereas getting a bigger network almost always helps, training longer doesn’t always help. So when training a learning algorithm, these things can be tried at least to get rid of the bias problems and go back after this and it is done until it fits, at least, fit the training set pretty well. And usually, if there is a big enough network, it usually fits the training data. But if the image is very blurry, it may be impossible to fit it. But if at least a human can do well on the task, if the base error is not too high, then by training a big enough network, then the algorithm performs well at least on the training set. To at least fit or overfit the training set.

Once the bias is reduced to acceptable amounts then consider, does the have a variance problem?

And so to evaluate development set performance is to be evaluated. If there is high variance, well, the best way to solve a high variance problem is to get more data. But sometimes more data is not available. Or regularization can be tried to reduce overfitting. But finding a more appropriate neural network architecture, sometimes that can reduce the variance as well as reduce the bias can help. It’s harder to be systematic. So these things need to be tried and kind of keep going back, until there is an algorithm with both low bias and low variance, whereupon it is done.

This gives the basic structure of how to organize a machine learning problem to diagnose bias and variance, and then to select the right operation to make progress on the problem.

References:

Coursera (deeplearning.ai)

Udacity

--

--