Why Trying One Train/Test Split is Not Enough?

Amjad El Baba
3 min readFeb 8, 2022

--

Machine Learning is not only about preparing, splitting & fitting data into a model to predict certain output(s) but also an important part is to choose the best model acting on your data. Like anything you make or any task you accomplish, the most crucial step is how right and effective is your work. So, we should make sure that we get rid of the problems along the way.

Long story short, a main problem faced in ML projects is overfitting, and K-Fold CV is one of the ways that helps in solving such issues, so what we are going to explore is what K-Fold CV really is and the theory of how this algorithm work.

Hey, allow me to answer a question hovering now in your mind if you don’t have prior knowledge about what is overfitting, it’s the case where the model work great on data seen before (training set) but worse on the one not seen yet (testing set), you can explore it more here.

K-Fold Cross Validation Definition

Fig. 1

Everyone will take his turn in the loop, in ML lingo, K-Fold CV is all about evaluating the performance of a ML model over a number of folds/parts, so the theory behind this concept is that the training set will be splitted to K folds, so each fold will be the testing set at a certain point.

So rather than worrying about which part of the data will be better for testing part, CV solves this issue by using them all (all the parts) one at a time, and sums up the results at the end.

As shown in the above figure, where we have a 4 splits (its called here four-fold cross validation) and at each split a part of the dataset is meant to be the testing set.

Cross Validation Example

Fig. 2

Let’s say we applied K-Fold CV to choose which algorithm acts on our data the best. Clearly, in Fig.2, Linear Regression works the best since it has the highest number of correct predictions.

In case you don’t know what these 2 mentioned algorithms are, please check my previous blog posts explaining 1 & 2.

Some useful resources:

Thanks for your time and let’s boost our knowledge!

--

--

Amjad El Baba

An AI engineer with a passion for writing, always curious and eager to share what I learn. I enjoy taking ideas and turning them into something relatable.