HYPERPARAMETERS-LETS TUNE

Published in

ADGVIT

8 min readAug 3, 2021

One would think a Machine Learning model consists of just the 2 elements -

1. Input data (also called training data) which is a collection of individual records (instances) containing the features important to your machine learning problem.
2. The parameters which are basically the variables that your chosen machine learning technique uses to adjust to your data.

Hyperparameters are often NOT given the importance they deserve!

Hyperparameters are the variables that govern the training process itself. For example, part of setting up a deep neural network is deciding how many hidden layers of nodes to use between the input layer and the output layer, and how many nodes each layer should use. These variables are not directly related to the training data. They are configuration variables.

One thing to note is that parameters change during a training job, while hyperparameters are usually constant during a job.

💡The dataset used is Breast Cancer Wisconsin (Diagnostic) Data Set.

It is a dataset that requires us to use any binary classifier learning algorithm on the given features to predict whether the cancer is Benign(B) or Malignant(M).

📖Dataset link —

Breast Cancer Wisconsin (Diagnostic) Data Set

Predict whether the cancer is benign or malignant

www.kaggle.com

You might want to go through the information in the aforementioned link if u want a clear understanding of what are the features of the dataset and what are we trying to achieve here.

For simplicity, we have already pre-processed the data which includes — encoding the data, splitting of dependent & independent variables and splitting it into training & test dataset.

Data Preprocessing

Next, for the model, we used the Random Forest classification and Logistic regression algorithm (yes, both of them) through the Scikit-Learn library. We used 2 models to get a better clarity while we tune the hyperparameters using GridSearchCV and RandomSearchCV.

Visualization of Random Forest Classification

Where the model parameters specify how to transform the input data into the desired output, the hyperparameters define how our model is actually structured.

Model Training using Random Forest and Logistic Regression

But unfortunately, there’s no one universal way to calculate “which way should one update the hyperparameters to reduce the loss and increase the efficiency?” (For e.g. The learning rate(α) of gradient descent, a method typically used in linear regression algorithm) in order to find the optimal model architecture. Thus, we generally resort to experimentation to figure out what works the best.

Let’s have a look at some tuning methods:

Grid Search. Define a search space as a grid of hyperparameter values and evaluate every position in the grid.
Random Search. Define a search space as a bounded domain of hyperparameter values and randomly sample points in that domain.
TPOT: TPOT is a Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.
Bayesian Optimization: Bayesian optimization is an approach to optimizing objective functions that take a long time (minutes or hours) to evaluate.

📝GridSearchCV:

Grid search is arguably the most basic hyperparameter tuning method. With this technique, we simply build a model for each possible combination of all of the hyperparameter values provided, evaluating each model, and selecting the architecture which produces the best results.

To use grid search, all parameters must be of type INTEGER, CATEGORICAL, or DISCRETE.

Syntax for GridSearchCV

It runs through all the different parameters that is fed into the parameter grid and produces the best combination of parameters, based on a scoring metric of your choice (accuracy, f1, etc.).

The exhaustive search identified the best parameters for our Random Forest Classifier to be n_estimators=50. The classification report for Random Forest Classifier is:

GridSearchCV for Random Forest Classifier

Classification Report for Random Forest Classifier using GridSearchCV

Similarly when we apply GridSearchCV on Logistic Regression we identify the best parameters to be C=5. The classification report for Logistic Regression is:

GridSearchCV for Logistic Regression

Classification Report for Logistic Regression using GridSearchCV

A break down what GridSearchCV did in the block above.

1. estimator: estimator object being used
2. param_grid: dictionary that contains all of the parameters to try
3. scoring: evaluation metric to use when ranking results
4. cv: cross-validation, the number of cv folds for each combination of parameters

The estimator object, in this case Random Forest Classifier and Logistic Regression, must be scaled accordingly, based on the distribution of the dataset as well as the type of classifier being used. The scoring metric can be any metric of your choice. However, just like the estimator object, the scoring metric should be chosen based on what type of problem the project is trying to solve. The other two parameters in the grid search are where the limitations come in to play.

Each model would be fit to the training data and evaluated on the validation data. As you can see, this is an exhaustive sampling of the hyperparameter space and can be quite inefficient.

📝RandomSearchCV:

Random search differs from grid search in that we no longer provide a discrete set of values to explore for each hyperparameter; rather, we provide a statistical distribution for each hyperparameter from which values may be randomly sampled.

Syntax for RandomSearchCV

In contrast to GridSearchCV, not all parameter values are tried out, but rather a fixed number of parameter settings is sampled from the specified distributions.

This when applied on Random Forest Classifier we get our best parameter to be n_estimator=50. The classification report for Random Forest Classifier is:

RandomSearchCV for Random Forest Classifier

Classification Report for Random Forest Classifier using RandomSearchCV

And when applied on the Logistic Regression we get our best parameter to be C=100.

The classification report for Logistic Regression is:

RandomSearchCV for Logistic Regression

Classification Report for Logistic Regression using RandomSearchCV

One of the main theoretical backings to motivate the use of random search in place of grid search is the fact that for most cases, hyperparameters are not equally important.

This search method works best under the assumption that not all hyperparameters are equally important. While this isn’t always the case, the assumption holds true for most datasets.

📝TPOT

It makes use of the popular Scikit-Learn machine learning library for data transforms and machine learning algorithms and uses a Genetic Programming stochastic global search procedure to efficiently discover a top-performing model pipeline for a given dataset.

To use this, we are going to install a python library known as TPOT and from that library we would be importing TPOTClassifier

Now we fit the TPOTClassifier model onto our independent and dependent variable. Then the classifier will directly give us the best possible set of parameters that could be used on the dataset such that we get the best accuracy.

TPOT

The classification report of the classifier given by TPOT is:

📝Bayesian Optimization

Bayesian Optimization is used in machine learning to tune the hyperparameters of a given model on a validation dataset.

To understand Bayesian Optimization, we need to understand what global optimization mean does mean.

Global Optimization is a problem of finding a set of input that can give us the minimum or the maximum cost of a given objective function.

Bayesian Optimization provides a well-defined technique based on Bayes Theorem to initiate a search of a global optimization problem that is efficient and effective. It works by building a probabilistic model of the objective function, called the surrogate function, that is then searched efficiently with an acquisition function before candidate samples are chosen for evaluation on the real objective function.

Formula for Bayes Theory used for Bayesian Optimization

There are many libraries that can help you out in order to perform the Bayesian Optimization, but here we chose to use the HyperOpt-Sklearn library. In order to use this library we’ve got to install this library. You can use the link below to install the library.

💡HyperOpt-Sklearn library link-

GitHub - hyperopt/hyperopt-sklearn: Hyper-parameter optimization for sklearn

Hyperopt-sklearn is Hyperopt-based model selection among machine learning algorithms in scikit-learn. See how to use…

github.com

Now once installed we use the hpsklearn, we import HyperEstimator, any_classifier and any_processing. Now from Hyperopt library we do import tpe.

So now we apply Bayesian Optimization on the dataset and extract the best classifier.

Bayesian Optimization

The classification report of the classifier obtained after using the Bayesian Optimization is:

⚔Conclusion

As we go through the different tuning methods, the Bayesian Optimization comes out to be best in the above dataset, as the Score comes out be 1.0 and there is no false negative or false positive values in the classification report (0% error). However, such might not be the case on every dataset. Even here where, in general, TPOT classifier finds the tuning method which should give a good prediction, we notice that random and grid search methods give a better tuning. In the end, As the time passes & new and better technologies methods come into existence, that make the tasks easier, never should the learning process stop!

For basic understanding of the terms True Negative, True Positive, False Positive and False Negative

💡If you wish to go through the whole code do follow the below given Github Repo link:

GitHub - Ankit-Sidana/Hyperparameter-Tuning

Contribute to Ankit-Sidana/Hyperparameter-Tuning development by creating an account on GitHub.