Overfitting and Underfitting in Machine Learning

Anju Rajbangshi
Analytics Vidhya
Published in
5 min readApr 22, 2020

--

In this article, we are going to indulge in two of the most discussed about and important concepts in machine learning which is related to the performance of a model.

How do we know a model is performing better? Which model should we choose?

Generally, we try out different models on the training dataset and based on few of the below points, we could have an idea which model to choose:

· Low delta value: Delta value is the difference between the accuracy of the model when applied on train and test dataset.

For eg: I applied linear regression and decision tree algorithm on the train dataset of a classification problem.

From above table, we can see that delta value from decision tree (5%) < delta value from linear regression (20%) , hence Decision would be perform best in this scenario.

Note: Lower the delta value, higher the performance of the model.

· How well it does the predictions.

· How well it performs on independent test dataset or unknown data.

We expect the model to learn from the training set and then perform well with good accuracy even when unknown independent data is fed to it.

What is under fitting and over fitting and how is it related to Machine learning algorithms.

--

--