Fitting

Sushmita warvate
3 min readFeb 29, 2024

--

Whenever machine is going to learning with the help of an algorithms and the help with a data we are trying to find out a function that function is nothing but the fit function. this function where we are trying to capture the relationship between x & y.

when you are going to changing the any algorithm then fit will be mainly two problems that called as overfitting or underfitting.

Overfitting :

We need to algorithm divide into two parts 1st D Train and 2nd D Test Whenever your algorithms coming from the D Train and the query point is also coming from d Train it means your model are learning very very less.

when the model does not make accurate predictions on testing data. When a model gets trained with so much data, it starts learning from the noise and inaccurate data entries in our data set. And when testing with test data results in High variance. Then the model does not categorize the data correctly, because of too many details and noise. The causes of overfitting are the non-parametric and non-linear methods because these types of machine learning algorithms have more freedom in building the model based on the dataset and therefore they can really build unrealistic models. A solution to avoid overfitting is using a linear algorithm if we have linear data or using the parameters like the maximal depth if we are using decision trees.

whenever your train accuracy is very high and test accuracy is very less that point is called as overfitting. it means your model it will learn only D train point not D Test. it is not capturing the relationship between x & y.

Underfitting :

whenever your algorithm coming from D Train and the query point coming from D Test that point is called as Underfitting. Underfitting refers to a model that can neither model the training data nor generalize to new data. Underfitting is easy to figure out because it leads to poor train error.

machine learning algorithm is said to have underfitting when a model is too simple to capture data complexities. It represents the inability of the model to learn the training data effectively result in poor performance both on the training and testing data. In simple terms, an underfit model’s are inaccurate, especially when applied to new, unseen examples. It mainly happens when we uses very simple model with overly simplified assumptions. To address underfitting problem of the model, we need to use more complex models, with enhanced feature representation, and less regularization.

Whenever your D Train accuracy is low as well as D Test accuracy is low that point is called as underfitting. it means your model didn’t learn the both D Train and D Test.

Best fit :

your training accuracy will be high as well as test accuracy will be high that point is called as best fit function.

A good fit model is a sweet spot between underfitting and overfitting. This model is what we aim to get.

As far as a machine learning algorithm is concerned, a good fit is when both the training data error and the test data are minimal. As the algorithm learns, the mistake in the training data for the modal is decreasing over time, and so is the error on the test dataset.

Train Error :

Whenever your query point coming from the D Train then we will get the some error that error called as the training error. The training error is defined as the average loss that occurred during the training process.

The training error is defined as the average loss that occurred during the training process. It is given by: Here, m_t is the size of the training set and loss function is the square of the difference between the actual output and the predicted output.

Test Error :

All algorithms going from the D Train and query point going from the D Test that time will get the error that error called as test error.

The test error decreases with the increase in the model complexity up to a certain point and then start increasing

The test error decreases with the increase in the model complexity up to a certain point and then start increasing: In the above figure, if we compare the model 1 and model 2 then definitely the model 1 is better because test error in model 2 is very high as compared to model 1.

--

--