What Is Cross Validations? and Its Importance In Data Science!

SagarDhandare
Geek Culture
Published in
2 min readJun 23, 2021

“Cross-Validation is a statistical method of evaluating and comparing learning algorithms by dividing data into two parts, one was used to learn or train our model and the other was used to validate our model.”

Have you understood the above lines?

Let’s see in simple words,

When using some dataset, we are creating a machine learning model, we frequently split our dataset into training and testing sets. The training dataset is usually used for learning/training our model and the testing dataset used for validating our model. Suppose we train our model on the given dataset using some algorithm and we tried to find out the accuracy of our model. The accuracy of our model might be 90% or 95% or maybe 100%. what does it mean…

Is our model good?

Is our model ready for predicting future data?

The answer is NO.

Why our model is not good, Why our model is not ready for predicting future data after giving us 100% accuracy. Because our model has trained itself on the given dataset, it already knows the data and has generalized over it very well. When we will try to predict new data, it gives us very bad accuracy, because it has not seen the new data before. It will fail to give us good accuracy and a generalized model.

When the training dataset gives us good accuracy and whenever the new data comes then it’s not able to give us good accuracy then, in that case, our model will be overfitted.

To handle this type of problem, Cross-Validation comes into the picture. It divides the dataset into two parts (train and test). On one part i.e on the train part, it will try to train the model, and on the second part i.e on the test part, it will make the prediction which is unseen data for our model. After that, we will check our model how well it works. If the model gives us good accuracy on test data, it means that our model is good and we can trust it.

https://www.google.com/url?sa=i&url=https%3A%2F%2Ftowardsdatascience.com%2Fwhat-is-cross-validation-60c01f9d9e75&psig=AOvVaw1CiVe0FwLwJgTh-MeG0TFD&ust=1624598260566000&source=images&cd=vfe&ved=0CAoQjRxqFwoTCIiRt6fCr_ECFQAAAAAdAAAAABAD

Cross-Validation is a very powerful tool as it helps us to use our data better.

Also, there is one of the most common reasons to use cross-validation that it did parameter tunning.

Types of Cross-Validation:

  1. Hold Out Method
  2. Leave One Out Cross-Validation (LOOCV)
  3. K-Fold Cross-Validation
  4. Stratified Cross-Validation
  5. Time Series Cross-Validation

In our next article, we will see Types of Cross-Validation.

Conclusion:

In this article, we have learned about the importance of Cross-Validation in Data Science, and the different types of Cross-Validation techniques.

Please feel free to drop your comments, advice, or any mistakes.😊

Connect me on: LinkedIn | GitHub | Email

HAPPY LEARNING!!! ❤🥀

--

--