#07 Hyperparameter Tuning: how to improve model accuracy drastically

ハイパーパラメータ最適化問題

Akira Takezawa

Published in

Coldstart.ml

6 min readFeb 10, 2019

Photo by Mariana Vusiatytska on Unsplash

Hola! Welcome to #ShortcutML Series! Cheat Note for everyone!

Target is who wanna know …

Reason: Model Applying isn’t the end of ML
Big Picture: Summarize too many validation metrics
Code: The simplest python code for each preprocessing

— — —

Why you have to read this?

In short, hyperparameter is something need to be decided manually by humans in the ML modeling process.

More clearly explained, for someone are familiar with scikit-learn code, hyperparameter is arguments in parents bracket:

# example
model = DecisionTreeClassifier(max_depth=5) <-This one!

Depends on the combination of hyperparameters, results of decision boundary by your estimator with ML changes like this:

https://qiita.com/sz_dr/items/f3d6630137b184156a67

I hope you grasp a little bit more clearly the importance of hyperparameter tuning. Don’t worry it’s not a complicated idea, and you should remember only 3 methods with few lines of code!

Let’s get started.

— — —

1. Comprehensive List of Hyperparameter in ML

In broad meaning, there are so many Hyperparameter in our ML implementation. Here is examples:

Regularization in SVM
Depth in Decision Tree
The number of trees in Random Forest
The algorithm of Gradient Descent
Kernel for SVM

2. 3 ways to optimize Hyperparameter

https://www.rco.recruit.co.jp/career/engineer/blog/44/

What is Regularization?

There are two hyperparameters for SVM:

C: Soft margin cost function which controls the influence of each individual support vector and involves trading error penalty for stability.
gamma: Gaussian radial basis function. Technically, large gamma leads to high bias and low variance models, and vice-versa.

And now we have 3 ways to tune these two hyperparameters for SVM:

Grid search
Random search
Bayesian optimization

1. Grid search

Step 1. Model Selection: choose an ML model as an estimator. This time I use SVM.

from sklearn.svm import SVC
estimator = SVC()

Step 2. Hyperparameter Listing: list all value for hyperparameter whatever you want to test.

# 'C=1.0', "kernel='rbf'", '"gamma='auto_deprecated'"
hparams = {"C": [0.1, 1, 10, 100], 
                   "kernel": ["linear", "rbf", "poly"], 
                   "gamma": [0.001, 0.0001]}

Step 3. Grid Search Time: put A and B into GridSearchCV.

Here you can also choose “cv” which is splitting numbers for cross-validation, and “scoring” as an evaluation metrics.

from sklearn.model_selection import GridSearchCV
GS_estimator = GridSearchCV(clf, hparams, cv=5, scoring="accuracy")
GS_estimator.fit(X_train, y_train)
print(GS_estimator.best_params_)>>> {'C': 10, 'gamma': 0.001, 'kernel': 'rbf'} # best parameter

2. Random search

from sklearn.model_selection import RandomizedSearchCV
RS_estimator = RandomizedSearchCV(estimator, hparams, cv=5, scoring="accuracy", random_state=1)
RS_estimator.fit(X_train, y_train)
RS_estimator.best_params_>>> {'C': 10, 'gamma': 0.001, 'kernel': 'rbf'} # same result!

3. Bayesian optimization

Shallow Understanding on Bayesian Optimization

This is kind of Next Level method. The bad aspect of above-mentioned two hyperparameter optimization methods is computationally expensive. In machine learning, every time computational efficiency is a big issue. For solving this problem, the alternative method is Bayesian optimization.

Bayesian optimization is just imitating the way of human inference.

Keyword:

Local Optimization
Global Optimization
Sequential Optimization

For the reason of simplicity, I wanna use not scikit-learn but, bayesian-optimizationlibrary. You can install from here.

pip install bayesian-optimization

Again I use Support Vector Machine as a model for digits classification task.

It’ll need several steps but don’t forget our final purpose is to get the best hyperparameter for our classification machine learning model.

Here is a breakdown of the logical process:

Goal: Create an estimator(model) with the highest accuracy
How: Get the best set of hyperparameters
How: Try multiple combinations of hyperparameters and observe accuracy score
How: Select a set of hyperparameters with the best accuracy

Firstly, to get the best accuracy score, I define the estimator function with SVM. And also for the purpose of generalization, we use the cross-validation split method with 10-Fold. This function will return the mean of each 10 times cross-validation result.

from sklearn.model_selection import cross_validate
def estimator(C, gamma):
    # initialize model
    model = SVC(C=C, gamma=gamma, degree=1, random_state=0)
    # set in cross-validation
    result = cross_validate(model, X, y, cv=10)
    # result is mean of test_score
    return np.mean(result['test_score'])

Next, we’ll define the range of two hyperparameters for SVM.

hparams = {"C": (0.0001, 10000), "gamma": (0.0001, 10000)}

Now we attach a hyperparameter optimization algorithm to our model.

from bayes_opt import BayesianOptimization
# give model and hyperparameter to optmizer
svc_bayesopt = BayesianOptimization(estimator, hparams)

In case, visualize the trial process. We have to define the following parameters for our hyperparameter tuning implementation:

init_points: number of function parameter sets to pick up at first
n_iter: number of estimation trials
acq: acquisition function to estimate the most likely next step

# maximize means optimization
svc_bayesopt.maximize(init_points=5, n_iter=10, acq='ucb')>>> #output
|   iter    |  target   |     C     |   gamma   |
-------------------------------------------------
|  1        |  0.4      |  9.138e+0 |  7.418e+0 |
|  2        |  0.4      |  4.77e+03 |  718.1    |
|  3        |  0.4      |  3.526e+0 |  2.886e+0 |
|  4        |  0.4      |  5.459e+0 |  5.718e+0 |
|  5        |  0.4      |  9.996e+0 |  1.432e+0 |
|  6        |  0.4      |  36.27    |  9.986e+0 |
|  7        |  0.9267   |  7.203    |  10.88    |
|  8        |  0.4      |  1.681    |  938.9    |
|  9        |  0.4      |  5.272e+0 |  1e+04    |
|  10       |  0.4      |  1e+04    |  1e+04    |
|  11       |  0.9867   |  2.096e+0 |  0.0001   |
|  12       |  0.4      |  1.948e+0 |  7.072e+0 |
|  13       |  0.4      |  1e+04    |  4.46e+03 |
|  14       |  0.98     |  1.096e+0 |  0.0001   |
|  15       |  0.4      |  2.549e+0 |  1e+04    |

Finally, we got the answer which is the most suitable hyperparameters with scored the best accuracy.

print(optimizer.max)
>>> {'params': {'C': 9402.684872249694, 'gamma': 0.0001}, 'target': 0.9800000000000001}

We’ve done it! After identifying the best hyperparameters for our model, we should just apply them to our model. Then you’ll get your final estimator hopefully.

# example
estimator = SVC(C=9402.684872249694, gamma=0.0001)
SVC.fit(X_train, y_train)
y_pred = SVC.predict(X_test)# measure accuracy of estimator by RMSE metric
np.sqrt(mean_squared_error(y_test, y_pred))
>>> 0.05..

— — — — —

References

Overview of Hyperparameter Tuning | Cloud ML Engine for TensorFlow | Google Cloud

This page describes the concepts involved in hyperparameter tuning, which is the automated model enhancer provided by…

cloud.google.com

Hyperparameter tuning for machine learning models.

When creating a machine learning model, you'll be presented with design choices as to how to define your model…

www.jeremyjordan.me

Automated Machine Learning Hyperparameter Tuning in Python

A complete walk through using Bayesian optimization for automated hyperparameter tuning in Python

towardsdatascience.com

Demystifying Hyper-Parameter tuning

What it is and why it’s natural