Grasping the main concept of overfitting and underfitting:

Solving machine learning problems

Alamin Musa Magaga
Analytics Vidhya
3 min readJul 23, 2020

--

Almost every person practicing machine learning has come across overfitting and underfitting, many of us where stuck behind,trying figure out a way to have a good fit or good machine learning model.if you are part of them then congrats because I will surely impact you with short, precise and straight to the point concept of understanding underfitting, overfitting and good fit.

What is underfitting ?

From the word under, it clearly manifests something lesser or not up to standard, requirement, expectation or not enough.

When the machine learning model perform poorly on the test data is called -underfitting

  • Underfitting is when the model exhibit a low variance and high bias
  • Underfitting may be caused as result of smaller dataset and lesser neural network
  • When the model perform poorly on both the testing and training dataset,it is called underfitting

In underfitting, the machine learning model does not have enough variable and parameters to solve and a more advanced model will perform better

Solutions to avoid underfitting?

  • A robust and more advance model with more parameters and variables
  • Much larger datasets and neural network(note:too much neural network may result to overfitting)
  • Increasing the complexity of the model
  • Increase the number of training duration or epochs

What is overfitting ?

When loss initially decrease on both the training and validation data But after some time the training loss will continue to decrease while the validation loss will begin to increase,when this happens the model is Overfit.

In overfitting the model memorize the answers on the training data and does not generalize to the test data

  • Overfitting happens when the machine learning model performs very well on the training data but poorly on the test data
  • In overfitting,the model exhibit high variance and low bias

In Overfitting,the model becomes too specialized on solving for the training data and starts to perform worse when validated on the test data.

Solutions to avoid overfitting?

  • Regularization techniques such as L1 and L2 regularization also called lasso and ridge regularization which are commonly used.
  • Set target or Early stopping rate for the training model.
  • Reducing the neural networks.

Understanding good fit

Good fit is the target and the result everyone is expected on his/her machine learning model.

The learning curve are widely and generally used as display, analysis and diagnosis tool in machine learning to evaluate the training and validation data and shows thier performance

The learning curve shows if the machine learning model is underfit, Overfit or good fit

Conclusion

From this precise write-up you understand that a machine learning model that is underfit will have high training and testing error while an Overfit model will have low training error but high testing error,and part of the recommended techniques of reducing overfitting are: to reduce the complexity of the model, regularization (Lasso and rigde) and also for underfitting: increase the number of parameters, dataset and complexity of the model will prove efficiently in tackling it.

--

--

Alamin Musa Magaga
Analytics Vidhya

Data Scientist | Developer | Embedded System Engineer | Zindi Ambassador | Omdena Kano Lead | Youth Opportunities Ambassador | CTO YandyTech