Use of Cross Validation in Machine Learning
Data is costly but setting aside some data for cross validation is a must. Here’s how and why one must do it!
Data is the currency modern organisations run on. For companies that actively deploy machine learning algorithms data is even more important — for them it is oil.
To understand the need for techniques like cross-validation let us first see all the buckets where data goes.
We all understand that data is used to train models on and the more data we have the better are these models trained. But no company can dare release the model they’ve built without testing it first. So one needs to set aside some data for testing. Beyond training and testing most people have not heard the term called validation.
Validation is the process of making sure that the model generalizes well. Generalization is when model is built using one set of data and it performs well on a completely different set of data. Validation set is that other bucket of data on which we improve the generalization error.