Use of Cross Validation in Machine Learning

Data is costly but setting aside some data for cross validation is a must. Here’s how and why one must do it!

Rishi Sidhu
AI Graduate

--

Data is the currency modern organisations run on. For companies that actively deploy machine learning algorithms data is even more important — for them it is oil.

Photo by Tyler Nix on Unsplash

To understand the need for techniques like cross-validation let us first see all the buckets where data goes.

Data Buckets

We all understand that data is used to train models on and the more data we have the better are these models trained. But no company can dare release the model they’ve built without testing it first. So one needs to set aside some data for testing. Beyond training and testing most people have not heard the term called validation.

Validation is the process of making sure that the model generalizes well. Generalization is when model is built using one set of data and it performs well on a completely different set of data. Validation set is that other bucket of data on which we improve the generalization error.

The use of validation set

--

--

Rishi Sidhu
AI Graduate

Blockchain | Machine Learning | Product Management