What is Hyperparameter Tuning (Cross-Validation and Holdout Validation)?

[ML0to100] — S1E16

Sanidhya Agrawal

5 min readJun 10, 2020

Suppose you are hesitating between two types of models (say, a linear model and a polynomial model): how can you decide between them?

One option is to train both and compare how well they generalize using the test set.

Now suppose that the linear model generalizes better, but you want to apply some regularization to avoid overfitting.

How do you choose the value of the regularization hyperparameter?

One option is to train 100 different models using 100 different values for this hyperparameter.

Suppose you find the best hyperparameter value that produces a model with the lowest generalization error — say, just 5% error.

You launch this model into production, but unfortunately, it does not perform as well as expected and produces 15% errors. What just happened?

The problem is that you measured the generalization error multiple times on the same test set, and you adapted the model and hyperparameters to produce the best model for that particular test set. This means that the model is unlikely to perform as well on new data.

A common solution to this problem is called holdout validation:

Holdout validation-

In this, the dataset is split into 3 parts:

Training Set, Validation Set, and Holdout Set

What is a Training Set?

A training set is the subsection of a dataset from which the machine learning algorithm uncovers, or “learns,” relationships between the features and the target variable.
The sample of data used to fit the model.
A training dataset is a dataset of examples used for learning, that is to fit the parameters (e.g., weights) for a model

What is a Validation Set?

A validation dataset is a dataset of examples used to tune the hyperparameters of a model. Also known as development set, or dev set

What is a Test or Hold out (Same Thing) dataset?

A test dataset is a dataset that is independent of the training dataset. If a model fit to the training dataset also fits the test dataset well, minimal overfitting has taken place. And find out generalization error.

The Validation dataset is used during training to track the performance of your model on “unseen” data. I wrote the unseen in quotes because although the model doesn’t directly see the data in validation set, you will optimize the hyper-parameters to decrease the loss on the validation set (since increasing val loss will mean over-fitting).

However, by doing so, you may over-fit the hyper-parameters to validation set (So that the loss will be low on that specific validation set, but will become worse on any other unseen set). That’s why you usually keep another 3rd set, called test set (or held-out set), which will be your truly unseen data, and you will test the performance of your model on that test set only once, after training your final model.

source- https://www.kdnuggets.com/2017/08/dataiku-predictive-model-holdout-cross-validation.html

You simply hold out part of the training set (NOT test set) to evaluate several candidate models and select the best one. The new held-out set is called the validation set (or sometimes the development set, or dev set).

You train multiple models with various hyperparameters on the reduced training set (full training set minus the validation set), and you select the model that performs best on the validation set.
After this holdout validation process, you train the best model on the full training set (including the validation set), and this gives you the final model.
Lastly, you evaluate this final model on the test set to get an estimate of the generalization error.

This solution usually works quite well. However,

if the validation set is too small, then model evaluations will be imprecise: you may end up selecting a suboptimal model by mistake.
if the validation set is too large, then the remaining training set will be much smaller than the full training set. Why is this bad?

Well, since the final model will be trained on the full training set, it is not ideal to compare candidate models trained on a much smaller training set. It would be like selecting the fastest sprinter to participate in a marathon.

One way to solve this problem is to perform repeated cross-validation, using many small validation sets.

Cross-validation-

Each model is evaluated once per validation set after it is trained on the rest of the data. By averaging out all the evaluations of a model, you get a much more accurate measure of its performance.

There is a drawback, however: the training time is multiplied by the number of validation sets.

source- https://en.wikipedia.org/wiki/Cross-validation_(statistics)

Summary Cheat Sheets, Notes, Flash Cards, Google Colab Notebooks, codes, etc will all be provided in further lessons as required.

Read through the whole ‘S1’ [ML0to100] series to learn about-

What Machine Learning is, what problems it tries to solve, and the main categories and fundamental concepts of its systems.
The steps in a typical Machine Learning project
Learning by fitting a model to data
Optimizing a cost function
Handling, cleaning and preparing data
Selecting and engineering features
Selecting a model and tuning hyperparameters using cross-validation
The challenges of Machine Learning, in particular, underfitting and overfitting (the bias/variance trade-off)
The most common learning algorithms: Linear and Polynomial Regression, Logistic Regression, k-Nearest Neighbors, Support Vector Machines, Decision Trees, Random Forests, and Ensemble methods
Reducing the dimensionality of the training data to fight the “curse of dimensionality”
Other unsupervised learning techniques, including clustering, density estimation, and anomaly detection Part II, Neural Networks and Deep Learning, covers the following topics:
What neural nets are and what they’re good for building and training neural nets using TensorFlow and Keras
The most important neural net architectures: feedforward neural nets for tabular data, convolutional nets for computer vision, recurrent nets and long short-term memory (LSTM) nets for sequence processing, encoder/decoders, and Transformers for natural language processing, autoencoders and generative adversarial networks (GANs) for generative learning
Techniques for training deep neural nets
How to build an agent (e.g., a bot in a game) that can learn good strategies through trial and error, using Reinforcement Learning
Loading and preprocessing large amounts of data efficiently
Training and deploying TensorFlow models at scale

Disclaimer — This series is based on the notes that I created for myself based on various books I’ve read, so some of the text could be an exact quote from some book out there, I’d have mentioned the book but even I don’t know which book a paragraph belongs to as it’s a compilation. It’s best for the reader as they get the best out of the all promising books available in the market for ML compiled in one place.