What is Cross-Validation & Hyperparameter Tuning?

Published in

AlmaBetter

6 min readJun 6, 2021

For any machine learning model we built, we need to validate the stability of our model. We may face a situation in deciding the right choices about predictive variables to use, what types of models to use, what arguments to supply those models, etc. We make these choices in a data-driven way by measuring model quality of various alternatives. Train test split which splits the dataset, is one method to measure model quality on the test data. Cross-validation extends this approach to model scoring (or “model validation.”) Compared to train-test-split, cross-validation gives you a more reliable measure of your model’s quality, though it takes longer to run.

Let’s try to understand the term ‘Cross-validation’. It is a technique for validating the model efficiency by training it on the subset of input data and testing on previously unseen subset of the input data. It is a popular method because it is simple to understand and because it generally results in a less biased or less optimistic estimate of the model skill than other methods, such as a simple train/test split. It is commonly used in applied machine learning to compare and select a model for a given predictive modeling problem because it is easy to understand, easy to implement, and results in skill estimates that generally have a lower bias than other methods.

How Cross-validation works?

The procedure has a single parameter called k that refers to the number of groups that a given data sample is to be split into. As such, the procedure is often called k-fold cross-validation. When a specific value for k is chosen, it may be used in place of k in the reference to the model, such as k=5 becoming 5-fold cross-validation.

We split the data-set into k number of subsets(known as folds) then we perform training on the all the subsets but leave one(k-1) subset for the evaluation of the trained model. In this method, we iterate k times with a different subset reserved for testing purpose each time.

Following are the steps to perform a Cross Validation in our datasets-

Randomly split your entire dataset into k number of folds or subsets.
For each fold in your dataset, build your model on k — 1 folds of the dataset. Then, test the model to check the effectiveness for kth fold.
Repeat this until each of the k-folds has served as the test set.
The average of your k recorded accuracy is called the cross-validation accuracy and will serve as your performance metric for the model.

Let’s try to compare Cross-validation and train test split in Machine Learning.

Train/test split:

The input data is divided into two parts, that are training set and test set on a ratio of 70:30, 80:20, etc. It provides a high variance, which is one of the biggest disadvantages. The training data is used to train the model, and the dependent variable is known. The test data is used to make the predictions from the model that is already trained on the training data. This has the same features as training data but not the part of that.

Cross-Validation:

It is used to overcome the disadvantage of train/test split by splitting the dataset into groups of train/test splits, and averaging the result. It can be used if we want to optimize our model that has been trained on the training dataset for the best performance. It is more efficient as compared to train/test split as every observation is used for the training and testing both.

Disadvantages of Cross-Validation:

Below are some limitations of the cross-validation technique-

For the ideal conditions, it provides the optimum output. But for the inconsistent data, it may produce a drastic result. So, it is one of the big disadvantages of cross-validation, as there is no certainty of the type of data in machine learning.
In predictive modeling, the data evolves over a period, due to which, it may face the differences between the training set and validation sets. Such as if we create a model for the prediction of stock market values, and the data is trained on the previous 5 years stock values, but the realistic future values for the next 5 years may drastically different, so it is difficult to expect the correct output for such situations.

Applications of Cross-Validation:

Some of the applications are-

This technique can be used to compare the performance of different predictive modeling methods.
It has great scope in the medical research field.
It can also be used for the meta-analysis, as it is already being used by the data scientists in the field of medical statistics.

In short ,Cross Validation is a very useful technique for assessing the effectiveness of your model, particularly in cases where you need to mitigate overfitting. It is also of use in determining the hyper parameters of your model, in the sense that which parameters will result in lowest test error.

Now let’s discuss about Hyperparameters, which are the parameters that cannot be directly learned from the regular training process. They are usually fixed before the actual training process begins. These parameters express important properties of the model such as its complexity or how fast it should learn. Unlike parameters, hyperparameters are specified by the practitioner when configuring the model.

Some examples of model hyperparameters include:

The penalty in Logistic Regression Classifier i.e. L1 or L2 regularization.
The learning rate for training a neural network.
The C and sigma hyperparameters for support vector machines.
The k in k-nearest neighbors.

We are not aware of optimal values for hyperparameters which would generate the best model output. So, what we tell the model is to explore and select the optimal model architecture automatically. This selection procedure for hyperparameter is known as Hyperparameter Tuning. Each model has its own sets of parameters that need to be tuned to get optimal output. For every model, our goal is to minimize the error or say to have predictions as close as possible to actual values. This is one of the cores or say the major objective of hyperparameter tuning.

There are different approaches to Hyperparameter tuning. Some of them are

Manual Search-While using manual search, we select some hyperparameters for a model based on our gut feeling and experience. Based on these parameters, the model is trained, and model performance measures are checked. This process is repeated with another set of values for the same hyperparameters until optimal accuracy is received, or the model has attained optimal error.
Random Search-Here, we provide a statistical distribution for each hyperparameter from which values may be randomly sampled. It goes through only a fixed number of hyperparameter settings. It moves within the grid in random fashion to find the best set hyperparameters. This approach reduces unnecessary computation.
Grid Search-It is arguably the most basic hyperparameter tuning method. With this technique, we simply build a model for each possible combination of all of the hyperparameter values provided, evaluating each model, and selecting the architecture which produces the best results. Each model would be fit to the training data and evaluated on the validation data. One of the drawback for this method is that it will go through all the intermediate combinations of hyperparameters which makes grid search computationally very expensive. The Grid Search technique will construct many versions of the model with all possible combinations of hyperparameters, and will return the best one. As in the image, for C = [0.1, 0.2, 0.3, 0.4, 0.5] and Alpha = [0.1, 0.2, 0.3, 0.4]. For a combination C=0.3 and Alpha=0.2, performance score comes out to be 0.726(Highest), therefore it is selected.

With the Hyperparameters Tuning and Cross Validation, we can improve performance of our machine learning models. We need to select a best method to tune the parameters so that our model’s performance is desirably high.

What is Cross-Validation & Hyperparameter Tuning?

Written by Nurul Huda