Training, validation, and test set in Machine Learning

Why do we need the three of them?

Published in

Analytics Vidhya

5 min readAug 26, 2020

If we think about what a Machine Learning model does, we can see how its main job is that of finding those rules governing the relationship between input and output. Once found those rules, the idea is that of applying them to new data and make predictions about their related output.

Henceforth, being predictions the final goal of an ML algorithm, it is pivotal for it to be properly generalized and not too adapted on data it trained on.

In this article, we are going to examine different options you have whenever training an ML model.

Train and evaluate the model on the whole dataset

Needless to say, this first approach will lead to a biased result. If we evaluate the model in the very same dataset it trained on, we will probably face the curse of overfitting, which happens whenever the model is too adapted to training data. Regarding the evaluation phase, we will probably get a very high score, yet it is not the score we are looking at: it probably derived from the fact that the algorithm learnt…

Training, validation, and test set in Machine Learning

Why do we need the three of them?

Train and evaluate the model on the whole dataset

Written by Valentina Alto