Machine Learning — Bias-Variance Tradeoff #4

Ufuk Çolak
Nerd For Tech
Published in
6 min readMay 19, 2021

Hello to everyone! In my previous article, I examined the methods we use to evaluate the success of machine learning models. In this article, we will examine the bias and variance relationship.

Bias-Variance Tradeoff

While creating models, we aim to evaluate the prediction success of the model more accurately. We try to achieve this by balancing variance and bias. At the point where we achieve this balance, we fit a good-fit model.

We talked about Overfitting in the previous article. What is overfitting? We fitted our model with the training data of our data set, which we separated as training and testing. If we over-optimize the model we have set up, the model would have learned our training data very well, even memorized it. Later, when we asked the data to make predictions on new data that it had never seen, the success of the model prediction was falling. This situation was not acceptable for us either. To get rid of this, it is necessary to examine the model error parameters.

When evaluating the success of the model, we have two types of mistakes.

  • Training Error: It is the error we obtained from the data set used to build the model.
  • Test Error: It is the error that occurs for the data set we use to test the fitted model.

There is a bias between training and test errors.

We will use the image below that we frequently encounter to see if the model that we have fitted is a good fit.

Before examining the image, let’s look at the concepts that will help us interpret the image. We always try to make the fitted model more flexible. So what does flexibility mean? It is the proper interpretation of the functional structure of the data. We can also consider the flexibility expression as variance/variability.

So what does it mean if the variance is high? It means to represent the structure within the data set flexibly, in a highly representative way.

When we examine the Underfitted and Overfitted models in the graphic; The most flexible to the structure of the data is the Overfitted model because it has matched itself with the pattern of the data. This situation shows that it is a function with high variance. This function represents the data set very well. It has learned the structures in the data set perfectly.

Another concept, bias, expresses the difference between actual ​​and predicted values. Accordingly, the Underfitted model is biased. Why is that? Because it could not represent all the points of the data. In other words, we can say that the model is not successful enough in predicting actual values. We can say that Underfitted models unable to capture the underlying pattern of the data. These models have a high bias and low variance. Because while it can represent some observations very well, some of them not.

The correct model is the one with low bias and low variance. It is not always a situation where we want to have high flexibility and estimate the structure in the data set very well. Because, generally, as the flexibility of the prediction function increases (Overfitted), that is, as the ability to represent the structure in the data set increases, its bias decreases. However, the generalizability of the function decreases.

How are low bias and low variance, which we call the good-fit model, chosen?

It is chosen based on the test error. In other words, if the average test error is high, we prefer a more flexible model. When the model becomes flexible, variance increases, and also bias decreases. If this does not stop at a certain point, this time again, as the flexibility increases, the variance will increase, and there will be a memorization process.

In short, this situation is called the bias-variance trade-off.

Training vs Test Error

If we examine the graph, the X-axis represents model complexity, and the Y-axis represents estimation error. Our error starts to fall while training the model we have fitted. Since we are optimizing the model, it is normal for the model error to drop, but at this point, the model complexity increases.

The decrease in the error in the test set also indicates that we made successful predictions. However, at the point where the error increased, it shows that he learned the training set very well but could not explain the test set anymore. As the model complexity increases and the error in the training set decreases (indicating that our success increases), we see that after a point, our test set error does not decrease as in the training set. In summary, we have to make a new decision at the point where the training set error falls, and the test set error starts to increase. At this point, we try to decide where we should stand with the help of the Learning Curve visual. In short, we try to find the optimum point.

Model Tuning

We will evaluate our model, which we have established with various machine learning algorithms, with the help of model success evaluation criteria (such as MSE, RMSE, Accuracy Rate), which we mentioned in the last article. While evaluating these successes, we know our error evaluation approaches (such as Cross Validation). While fitting a model, we now know another critical issue that will affect the performance of the model, such as bias-variance trade-off (Overfitting or Underfitting). Finally, we will try to optimize the predictive performance of the models.

Model Parameters: These are the parameters that are included in the model and whose values are learned during the training phase. As an example, it is the beta coefficients included in the formula in Regression models.

Model Hyperparameters: These are the parameters that we cannot obtain from the data, determined by the user, and optimized with the data. In other words, these are values that must be specified outside of the training procedure. These are the parameters that need to be adjusted in the process of obtaining a model with optimum performance. For example, the Learning Rate we use in Gradient Boosting methods, K value in the closest neighbors, etc.

Our goal is always to try to predict with fewer errors. For example, when we try to predict the house price, we aim to estimate the value of 600K as 605K. There will be internal Model Parameters and external Model Hyperparameters belonging to dozens of machine learning algorithms that we will use for this purpose.

Next article, see you on the Data Processing steps.

--

--