Comparing K-Fold Cross-Validation Methods: Strategies for Effective Model Evaluation in Diverse Data Scenarios

5 min readMar 26, 2023

K-fold cross-validation is a widely used method for assessing the performance of a machine learning model by dividing the dataset into multiple smaller subsets, or “folds,” and training and testing the model iteratively. This technique helps to reduce the bias and variance in the model’s performance estimates. There are several variations of k-fold cross-validation, and I’ll discuss a few of the most common ones and compare them:

Regular K-Fold Cross-Validation: In this method, the dataset is randomly partitioned into k equal-sized subsets or “folds.” The model is trained k times, with each fold being used as the testing set once and the remaining k-1 folds being used as the training set. The performance metrics from each fold are averaged to obtain a final performance estimate for the model. This method is simple and effective but may not work well when the data distribution is imbalanced or has a skewed class distribution.
Stratified K-Fold Cross-Validation: This variation is designed to handle imbalanced datasets or those with a skewed class distribution. In stratified k-fold cross-validation, each fold is made by preserving the same percentage of samples for each class as in the complete dataset. This ensures that each fold has a similar class distribution, leading to more accurate performance estimates. Stratified k-fold cross-validation is generally preferred when working with classification problems with imbalanced datasets.
Time Series K-Fold Cross-Validation: Time series data has a temporal ordering, and this should be preserved when performing cross-validation. In this method, the dataset is split into k folds, maintaining the temporal order of the data. In each iteration, the model is trained on the first n-1 folds and tested on the nth fold. Time series k-fold cross-validation is particularly useful when working with time series data, as it prevents data leakage and provides more realistic performance estimates.
Group K-Fold Cross-Validation: This method is suitable for datasets where there are logical groupings within the data, such as data points collected from different subjects or experiments. In group k-fold cross-validation, data points are grouped based on these logical groupings, and the folds are made such that each fold contains data from different groups. This ensures that the model is evaluated on data from different groups and prevents overfitting to any specific group.

In summary, each of these k-fold cross-validation methods has its advantages and is suitable for different types of data and problem domains:

Regular K-Fold: General-purpose, works well with balanced datasets
Stratified K-Fold: Preferred for classification problems with imbalanced datasets
Time Series K-Fold: Suitable for time series data
Group K-Fold: Useful when data has logical groupings that need to be considered

When selecting a cross-validation method, it’s essential to consider the dataset’s characteristics and the problem domain to ensure accurate and reliable performance estimates.

In this example, I will show how to implement the mentioned k-fold cross-validation methods using scikit-learn library in Python. We’ll use different datasets and models for each method to demonstrate a suitable application.

First, let’s import the necessary libraries:

import numpy as np
from sklearn import datasets
from sklearn.model_selection import KFold, StratifiedKFold, TimeSeriesSplit, GroupKFold
from sklearn.linear_model import LogisticRegression, LinearRegression
from sklearn.metrics import accuracy_score, mean_squared_error

Regular K-Fold Cross-Validation:

# Load the Iris dataset
iris = datasets.load_iris()
X, y = iris.data, iris.target

# Create a logistic regression model
model = LogisticRegression(solver='liblinear', multi_class='auto')

# Apply regular K-Fold cross-validation
kf = KFold(n_splits=5)
accuracies = []

for train_index, test_index in kf.split(X):
    X_train, X_test = X[train_index], X[test_index]
    y_train, y_test = y[train_index], y[test_index]

    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)
    accuracies.append(accuracy_score(y_test, y_pred))

print("Regular K-Fold accuracies:",  [f"{accuracy:.3f}" for accuracy in accuracies])

Regular K-Fold accuracies: ['0.917', '0.917', '0.889', '0.943', '1.000']

Stratified K-Fold Cross-Validation:

# Apply stratified K-Fold cross-validation
skf = StratifiedKFold(n_splits=5)
accuracies = []

for train_index, test_index in skf.split(X, y):
    X_train, X_test = X[train_index], X[test_index]
    y_train, y_test = y[train_index], y[test_index]

    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)
    accuracies.append(accuracy_score(y_test, y_pred))

print("Stratified K-Fold accuracies:", [f"{accuracy:.3f}" for accuracy in accuracies])

Stratified K-Fold accuracies: ['0.917', '0.944', '0.944', '1.000', '1.000']

Time Series K-Fold Cross-Validation:

# Load a time series dataset (Boston housing)
boston = datasets.load_boston()
X, y = boston.data, boston.target

# Create a linear regression model
model = LinearRegression()

# Apply time series K-Fold cross-validation
tscv = TimeSeriesSplit(n_splits=5)
mse_values = []

for train_index, test_index in tscv.split(X):
    X_train, X_test = X[train_index], X[test_index]
    y_train, y_test = y[train_index], y[test_index]

    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)
    mse_values.append(mean_squared_error(y_test, y_pred))

mse_v = [f"{mse_val:.3f}" for mse_val in mse_values]
print("Time Series K-Fold mean squared errors:", mse_v)

Time Series K-Fold mean squared errors: ['1.509', '0.588', '0.420', '0.734', '0.444']

Group K-Fold Cross-Validation:

# Load the Wine dataset
wine = datasets.load_wine()
X, y = wine.data, wine.target

# Create a logistic regression model
model = LogisticRegression(solver='liblinear', multi_class='auto')

# Define groups based on the assumption that wines are sorted by producers
groups = np.array([i // (len(X) // 5) for i in range(len(X))])

# Apply group K-Fold cross-validation
gkf = GroupKFold(n_splits=5)
accuracies = []

for train_index, test_index in gkf.split(X, y, groups):
    X_train, X_test = X[train_index], X[test_index]
    y_train, y_test = y[train_index], y[test_index]

    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)
    accuracies.append(accuracy_score(y_test, y_pred))

print("Group K-Fold accuracies:", [f"{accuracy:.3f}" for accuracy in accuracies])

Group K-Fold accuracies: ['0.974', '0.971', '0.771', '0.971', '0.914']

For a broader understanding of various performance assessment methods in machine learning, we encourage you to read our insightful article titled “Evaluating Machine Learning Models: A Guide to Selecting the Right Performance Assessment Method for Your Dataset and Task.” This comprehensive guide covers not only k-fold cross-validation but also other popular techniques, such as the holdout method, leave-one-out cross-validation, leave-p-out cross-validation, repeated random subsampling, and bootstrapping. By delving into each method’s strengths, weaknesses, and ideal use cases, the article aims to help you choose the most suitable evaluation method for your specific task and dataset. Don’t miss this opportunity to expand your knowledge on machine learning model evaluation and refine your approach to building accurate and reliable models.

👏 Don’t forget to give this article some claps and share it with your network to support my work! Feel free to follow my Medium profile for more insightful content on machine learning and data science. Thank you for being so supportive! 🚀

Comparing K-Fold Cross-Validation Methods: Strategies for Effective Model Evaluation in Diverse Data Scenarios

Written by Sahel Eskandar