Day 50 of 100DaysofML

Charan Soneji
100DaysofMLcode
Published in
2 min readAug 5, 2020

K-Fold. It is one of the concepts we almost always use during our testing phase and it is part of our cross validation whereby we verify the efficiency of our ML model.

Watch the below given video for an overall gist on the topic.

That video basically covers the important stuff you need to know about K-Folds. Some things that we need to keep in mind for K-Fold are:

  • That k-fold cross validation is a procedure used to estimate the skill of the model on new data.
  • There are common tactics that you can use to select the value of k for your dataset.
  • There are commonly used variations on cross-validation such as stratified and repeated that are available in scikit-learn.

Essentially, in cross-validation we divide our entire datasets into different segments and we train our model using some of these segments while we test using the remaining segments.

Have a look at the diagram below:

K fold conceptually

Here, we can see how we are taking a specific segment for training and assuming the rest to be a part of the testing dataset and we do this for all the segments. This way, the model gets trained in a better way because this was the model has come in contact with every segment at least once. The algorithm may be a little inefficient in terms of time complexity but it does a good job on the overall.

K-Fold can be easily implemented using Sklearn and i’d suggest checking the documentation whose link I’ve mentioned below.

Have a look at the example you can create using sklearn whereby we see 4 different samples and how they have been split for training and testing whereby almost each of the 4 values gets trained twice each time in the loop.

Training example.

Covered a pretty easy topic today, I shall be writing a bit more on DL tomorrow. Thanks for reading. Keep Learning.

Cheers.

--

--