Fundamental concepts for Model Selection and Model Evaluation — Part2

The intuition behind core concepts for addressing the central issues in machine learning.

Published in

Analytics Vidhya

7 min readMay 4, 2020

If you have not gone through the Part1 yet then I will recommend going through it first. In Part1, we talked about the basic concepts needed for selecting a better class of model. In this article, we will first discuss hyperparameters and then we will focus on concepts for Model Evaluation.

I am dropping mind map here for a visual summary, but go through the article first and come back here later.

Model Selection & Model Evaluation concepts — Mind Map

We will cover below topics in this article.

Meaning and use of Hyperparameters.
How hyperparameter is different from the model parameters.
Hyperparameter Tuning.
Cross-Validation strategies(K-fold CV).
GridSearchCV.

Hyperparameter Tuning

As we discussed in the last article, overfitting of a model can be reduced by Regularization and Hyperparameter Tuning. We covered Regularization and in this article, we will see hyperparameter tuning.

Hyperparameter Tuning is the process of finding the optimum value of hyperparameter for controlling the model complexity so that it won’t overfit. We will see how to achieve this, but before moving forward let’s clear some basic concept first.

Parameters & Hyperparameters

Parameters are the values which a model needs to learn for making predictions. For example, in the case of regression, it’s the coefficients of variables which we need to learn and these values are called as parameters.

But Hyperparameter is different and do note that it’s not going to be a part of your final model, instead hyperparameter is something which will be used by a learning algorithm to build a model.

Let’s see exactly what are the hyperparameters for a different class of models.

For regression models, recall that their complexity increases with an increase in the number of variables. Here the number of variables is our hyperparameter and we need to find it’s optimum value.
For tree models such as Decision trees, it’s the depth of tree which affects the complexity of the model. So here the depth of the tree is hyperparameter which needs to be optimised.
For neural networks, it’s the number of connection which affects the complexity hence dropout is used and what percentage of neurons needs to be dropped can be considered as a hyperparameter.

This is some of the examples of hyperparameters, please be clear that hyperparameter is not something which can be known in advance or its value is fixed, it must be found out by trying out different values and see which fits best.

We emphasise on the fact that we need to find an optimal hyperparameter value, Let’s see how we can optimise it.

Cross-validation strategy

Wikipedia definition

Cross-validation, sometimes called rotation estimation or out-of-sample testing, is any of various similar model validation techniques for assessing how the results of a statistical analysis will generalize to an independent data set.

Let’s understand it by taking an example, assuming we are solving a linear regression problem where we want to predict “y” based on 50 independent variables we have. Here the number of variables to use in our model is a hyperparameter and our goal is to find out its optimum value.

For model building, we need to figure out two things

number of variables to use.
Which variables to use?
- For this, we can use RFE, we won't cover RFE in this article, however, you can go through the links in the “Further reading” section.

One approach could be to use the Hold-Out strategy for splitting the data into training and test set. And try building models for different values of hyperparameters.

But there are problems in this approach.

Problem 1 — We will break the basic rule of peeking in to test set by evaluating our model on the same test set for different values of hyperparameters.

Problem 2 — The results will be dependent on a specific test and train split, what if our data distribution is not uniform i.e our test split could have a different distribution than the training set.

Problem 3 — We are doing this process manually, a lot of effort will be needed for testing our different models for a large range of hyperparameters.

Let’s see how can we tackle these problems.

Solving Problem 1 (peeking in to test set) & Problem 2 (dependency on specific data split)

The problem of peeking in to test data can be avoided by breaking the dataset into train, validation and test set. So that iteratively we can validate our model on the validation set instead of a test set. But do notice that by taking this approach we will end up with less data for training, and rarely we have abundant data available. So we have another problem to solve(remember only if we have less data otherwise you are good to go with using validation set).

This is where the Cross-Validation strategy can be used to avoid more consumption of training data. We will look at the most common technique in cross-validation which is K-Fold Cross-validation. Refer links in the “Further Reading” section for finding out other validation strategies.

In K-Fold Cross-validation, we will divide the dataset into training and test set only and the training set is further divided into K folds, let’s take K as 4 here.

Illustration of K-Fold Cross-Validation, Do note that the test data shown above is actually a validation set sampled from training data.

Now instead of creating a single model using the same test set, we can create multiple models using a different test set iteratively. The results of a k-fold cross-validation run are often summarized with the mean of the model scores as shown in the above image.

Peeking is a consequence of using test-set performance to both choose a hypothesis and evaluate it. The way to avoid this is to really hold the test set out — lock it away until you are completely done with learning and simply wish to obtain an independent evaluation of the final hypothesis. (And then, if you don’t like the results … you have to obtain, and lock away, a completely new test set if you want to go back and find a better hypothesis.)

— Jason Brownlee mentioned above paragraph in his article from - Stuart Russell and Peter Norvig, page 709, Artificial Intelligence: A Modern Approach, 2009 (3rd edition)

Using K-Fold Cross-validation we have solved both the problems.

Problem 1 — we avoided peeking in to test set, by iteratively training and testing our model on multiple samples of training data set.
Problem 2 — our model is trained and tested on different data splits which avoided dependency of model results on a specific data split.

Solving Problem 3 (manual efforts in hyperparameter tuning)

We know how to avoid peeking into a test set and remove the dependency on specific data split, now we need some way to automate the hyperparameter tuning process, clearly when we have variables in the range of 100–150 then it won't be possible to carry out this process manually.

This is where we can use GridSearch technique, it will automate the process of finding the optimum hyperparameter value for your model.

GridSearchCV is a term used to refer both GridSearch and Cross-Validation technique, here now we need to provide GridSearchCV with a range of values for our hyperparameter.

See above image for more clarity which is showing the hyperparameter tuning process for a range of 1–50 using GridSearchCV.

That’s all Folks, now I will recommend you to go through the visual summary(mind map) once.

And I hope you got the intuition behind the fundamental concepts and understood how we can solve the common challenges while solving machine learning problems.

Summary

We saw what do we mean by hyperparameter and how it is different from model parameters
Hyperparameters are not a part of our final model but they will be used by learning algorithm for regularising our model.
Hyperparameters needs to be tuned and we saw how we can do that while avoiding some fundamental problems.
Cross-validation technique can be used to solve the problem of peeking into the test set.
GridSearch technique can be used with the Cross-Validation technique to automate the tedious process of validating our model on different values of hyperparameters.