Training, validation, and test set in Machine Learning

Why do we need the three of them?

Valentina Alto
Analytics Vidhya
Published in
5 min readAug 26, 2020

--

If we think about what a Machine Learning model does, we can see how its main job is that of finding those rules governing the relationship between input and output. Once found those rules, the idea is that of applying them to new data and make predictions about their related output.

Henceforth, being predictions the final goal of an ML algorithm, it is pivotal for it to be properly generalized and not too adapted on data it trained on.

In this article, we are going to examine different options you have whenever training an ML model.

Train and evaluate the model on the whole dataset

Needless to say, this first approach will lead to a biased result. If we evaluate the model in the very same dataset it trained on, we will probably face the curse of overfitting, which happens whenever the model is too adapted to training data. Regarding the evaluation phase, we will probably get a very high score, yet it is not the score we are looking at: it probably derived from the fact that the algorithm learnt…

--

--

Valentina Alto
Analytics Vidhya

Data&AI Specialist at @Microsoft | MSc in Data Science | AI, Machine Learning and Running enthusiast