“3 Cross-validation techniques for evaluating machine learning models”

Pasquale Di Lorenzo
6 min readDec 27, 2022

--

Cross-validation is a common technique used in machine learning to evaluate the performance of a model. It involves dividing the data into a number of folds or partitions, training the model on some of the folds, and evaluating it on the remaining folds. This process is repeated multiple times, with each fold serving as the test set in turn.

One of the main benefits of cross-validation is that it allows us to get a better estimate of the model’s performance on unseen data. By training and evaluating the model on different folds of the data, we can get a better sense of how well the model is likely to generalize to new, unseen data. This is especially important when the dataset is small, as in these cases the model may have a high variance, meaning that it is highly sensitive to the specific training data used.

There are several types of cross-validation techniques, including:

1- K-fold cross-validation

This is the most common type of cross-validation. It involves dividing the data into K folds, where K is typically set to 10. The model is trained on K-1 folds and evaluated on the remaining fold. This process is repeated K times, with each fold serving as the test set in turn. The final performance measure is the average performance across all K iterations.

Example:

Here is an example of K-fold cross-validation using a simple classification dataset:

Suppose we have a dataset with 1000 examples, 500 of which belong to class 0 and 500 of which belong to class 1. We want to evaluate a binary classification model using 10-fold cross-validation.

First, we divide the data into 10 folds, with 100 examples in each fold.

Next, we perform the following steps 10 times:

Set aside one of the folds as the test set and use the remaining folds (9 folds) as the training set.

Train the model on the training set.

Evaluate the model on the test set.

Record the performance measure (e.g., accuracy, precision, etc.) for this iteration.

After completing these steps for each of the 10 folds, we calculate the average performance measure across all 10 iterations. This gives us an overall estimate of the model’s performance on the dataset.

For example, if the model achieved an accuracy of 80% on the first iteration, 75% on the second iteration, and so on, the average accuracy would be calculated as (80 + 75 + … + 85)/10 = 79%. This would be our overall estimate of the model’s performance on the dataset.

In this way, K-fold cross-validation allows us to get a better estimate of the model’s performance on unseen data by training and evaluating it on different folds of the data. It is a useful technique for evaluating the robustness and generalizability of a machine learning model.

2- Stratified K-fold cross-validation:

This is similar to K-fold cross-validation, but it is used when the data is imbalanced (i.e., when there are significantly more examples of one class than the other). In this case, the folds are stratified, meaning that they are constructed in such a way as to ensure that each fold contains a representative proportion of each class.

Example:

Here is an example of stratified K-fold cross-validation using a simple classification dataset:

Suppose we have a dataset with 1000 examples, 200 of which belong to class 0 and 800 of which belong to class 1. We want to evaluate a binary classification model using 10-fold stratified cross-validation.

First, we divide the data into 10 folds, ensuring that each fold contains a representative proportion of each class (i.e., 20 examples of class 0 and 80 examples of class 1 in each fold). This is important because the data is imbalanced, with significantly more examples of class 1 than class 0.

Next, we perform the following steps 10 times:

Set aside one of the folds as the test set and use the remaining folds (9 folds) as the training set.

Train the model on the training set.

Evaluate the model on the test set.

Record the performance measure (e.g., accuracy, precision, etc.) for this iteration.

After completing these steps for each of the 10 folds, we calculate the average performance measure across all 10 iterations. This gives us an overall estimate of the model’s performance on the dataset.

For example, if the model achieved an accuracy of 80% on the first iteration, 75% on the second iteration, and so on, the average accuracy would be calculated as (80 + 75 + … + 85)/10 = 79%. This would be our overall estimate of the model’s performance on the dataset.

In this way, stratified K-fold cross-validation allows us to get a better estimate of the model’s performance on unseen data by training and evaluating it on different folds of the data, while also ensuring that each fold contains a representative proportion of each class. This is especially important when the data is imbalanced, as it helps to ensure that the model is evaluated on a balanced set of examples.

3- Leave-one-out cross-validation:

This is a special case of K-fold cross-validation where K is equal to the size of the dataset. In this case, the model is trained on all but one example in the dataset, and evaluated on the remaining example. This process is repeated for each example in the dataset, resulting in a separate performance measure for each example.

Example:

Here is an example of leave-one-out cross-validation using a simple classification dataset:

Suppose we have a dataset with 1000 examples, 500 of which belong to class 0 and 500 of which belong to class 1. We want to evaluate a binary classification model using leave-one-out cross-validation.

First, we set aside one example as the test set and use the remaining 999 examples as the training set. We train the model on the training set and evaluate it on the test set, recording the performance measure (e.g., accuracy, precision, etc.) for this iteration.

Next, we repeat this process for each of the 1000 examples in the dataset. This means that the model is trained on 999 examples and evaluated on 1 example 1000 times, resulting in 1000 separate performance measures.

Finally, we calculate the average performance measure across all 1000 iterations. This gives us an overall estimate of the model’s performance on the dataset.

For example, if the model achieved an accuracy of 80% on the first iteration, 75% on the second iteration, and so on, the average accuracy would be calculated as (80 + 75 + … + 85)/1000 = 79%. This would be our overall estimate of the model’s performance on the dataset.

Leave-one-out cross-validation is a time-consuming process, as it requires training and evaluating the model on all possible combinations of training and test sets. However, it can be useful in cases where the dataset is small, as it allows us to get a more accurate estimate of the model’s performance.

In summary, cross-validation is a useful technique for evaluating the performance of a machine learning model and estimating its ability to generalize to new, unseen data. It allows us to get a better estimate of model performance, particularly when the dataset is small or imbalanced, and helps us choose the best model for a given task.

See you in the next article on Machine learning !!!

--

--

Pasquale Di Lorenzo

As a physicist and Data engineer ishare insights on AI and personal growth to inspire others to reach their full potential.