#07 Hyperparameter Tuning: how to improve model accuracy drastically

ハイパーパラメータ最適化問題

Akira Takezawa
Coldstart.ml
6 min readFeb 10, 2019

--

Hola! Welcome to #ShortcutML Series! Cheat Note for everyone!

Target is who wanna know …

  • Reason: Model Applying isn’t the end of ML
  • Big Picture: Summarize too many validation metrics
  • Code: The simplest python code for each preprocessing

— — —

Why you have to read this?

In short, hyperparameter is something need to be decided manually by humans in the ML modeling process.

More clearly explained, for someone are familiar with scikit-learn code, hyperparameter is arguments in parents bracket:

# example
model = DecisionTreeClassifier(max_depth=5) <-This one!

Depends on the combination of hyperparameters, results of decision boundary by your estimator with ML changes like this:

https://qiita.com/sz_dr/items/f3d6630137b184156a67

I hope you grasp a little bit more clearly the importance of hyperparameter tuning. Don’t worry it’s not a complicated idea, and you should remember only 3 methods with few lines of code!

Let’s get started.

— — —

Menu

  1. Comprehensive List of Hyperparameter in ML
  2. 3 ways to optimize Hyperparameter in ML
  3. References

1. Comprehensive List of Hyperparameter in ML

Photo by Lucas Boesche on Unsplash

In broad meaning, there are so many Hyperparameter in our ML implementation. Here is examples:

  • Regularization in SVM
  • Depth in Decision Tree
  • The number of trees in Random Forest
  • The algorithm of Gradient Descent
  • Kernel for SVM

2. 3 ways to optimize Hyperparameter

https://www.rco.recruit.co.jp/career/engineer/blog/44/

What is Regularization?

There are two hyperparameters for SVM:

  • C: Soft margin cost function which controls the influence of each individual support vector and involves trading error penalty for stability.
  • gamma: Gaussian radial basis function. Technically, large gamma leads to high bias and low variance models, and vice-versa.

And now we have 3 ways to tune these two hyperparameters for SVM:

  1. Grid search
  2. Random search
  3. Bayesian optimization

1. Grid search

Step 1. Model Selection: choose an ML model as an estimator. This time I use SVM.

from sklearn.svm import SVC
estimator = SVC()

Step 2. Hyperparameter Listing: list all value for hyperparameter whatever you want to test.

# 'C=1.0', "kernel='rbf'", '"gamma='auto_deprecated'"
hparams = {"C": [0.1, 1, 10, 100],
"kernel": ["linear", "rbf", "poly"],
"gamma": [0.001, 0.0001]}

Step 3. Grid Search Time: put A and B into GridSearchCV.

Here you can also choose “cv” which is splitting numbers for cross-validation, and “scoring” as an evaluation metrics.

from sklearn.model_selection import GridSearchCV
GS_estimator = GridSearchCV(clf, hparams, cv=5, scoring="accuracy")
GS_estimator.fit(X_train, y_train)
print(GS_estimator.best_params_)
>>> {'C': 10, 'gamma': 0.001, 'kernel': 'rbf'} # best parameter

2. Random search

from sklearn.model_selection import RandomizedSearchCV
RS_estimator = RandomizedSearchCV(estimator, hparams, cv=5, scoring="accuracy", random_state=1)
RS_estimator.fit(X_train, y_train)
RS_estimator.best_params_
>>> {'C': 10, 'gamma': 0.001, 'kernel': 'rbf'} # same result!

3. Bayesian optimization

Shallow Understanding on Bayesian Optimization

This is kind of Next Level method. The bad aspect of above-mentioned two hyperparameter optimization methods is computationally expensive. In machine learning, every time computational efficiency is a big issue. For solving this problem, the alternative method is Bayesian optimization.

Bayesian optimization is just imitating the way of human inference.

Keyword:

  • Local Optimization
  • Global Optimization
  • Sequential Optimization

For the reason of simplicity, I wanna use not scikit-learn but, bayesian-optimizationlibrary. You can install from here.

pip install bayesian-optimization

Again I use Support Vector Machine as a model for digits classification task.

It’ll need several steps but don’t forget our final purpose is to get the best hyperparameter for our classification machine learning model.

Here is a breakdown of the logical process:

  1. Goal: Create an estimator(model) with the highest accuracy
  2. How: Get the best set of hyperparameters
  3. How: Try multiple combinations of hyperparameters and observe accuracy score
  4. How: Select a set of hyperparameters with the best accuracy

Firstly, to get the best accuracy score, I define the estimator function with SVM. And also for the purpose of generalization, we use the cross-validation split method with 10-Fold. This function will return the mean of each 10 times cross-validation result.

from sklearn.model_selection import cross_validate
def estimator(C, gamma):
# initialize model
model = SVC(C=C, gamma=gamma, degree=1, random_state=0)
# set in cross-validation
result = cross_validate(model, X, y, cv=10)
# result is mean of test_score
return np.mean(result['test_score'])

Next, we’ll define the range of two hyperparameters for SVM.

hparams = {"C": (0.0001, 10000), "gamma": (0.0001, 10000)}

Now we attach a hyperparameter optimization algorithm to our model.

from bayes_opt import BayesianOptimization
# give model and hyperparameter to optmizer
svc_bayesopt = BayesianOptimization(estimator, hparams)

In case, visualize the trial process. We have to define the following parameters for our hyperparameter tuning implementation:

  • init_points: number of function parameter sets to pick up at first
  • n_iter: number of estimation trials
  • acq: acquisition function to estimate the most likely next step
# maximize means optimization
svc_bayesopt.maximize(init_points=5, n_iter=10, acq='ucb')
>>> #output
| iter | target | C | gamma |
-------------------------------------------------
| 1 | 0.4 | 9.138e+0 | 7.418e+0 |
| 2 | 0.4 | 4.77e+03 | 718.1 |
| 3 | 0.4 | 3.526e+0 | 2.886e+0 |
| 4 | 0.4 | 5.459e+0 | 5.718e+0 |
| 5 | 0.4 | 9.996e+0 | 1.432e+0 |
| 6 | 0.4 | 36.27 | 9.986e+0 |
| 7 | 0.9267 | 7.203 | 10.88 |
| 8 | 0.4 | 1.681 | 938.9 |
| 9 | 0.4 | 5.272e+0 | 1e+04 |
| 10 | 0.4 | 1e+04 | 1e+04 |
| 11 | 0.9867 | 2.096e+0 | 0.0001 |
| 12 | 0.4 | 1.948e+0 | 7.072e+0 |
| 13 | 0.4 | 1e+04 | 4.46e+03 |
| 14 | 0.98 | 1.096e+0 | 0.0001 |
| 15 | 0.4 | 2.549e+0 | 1e+04 |

Finally, we got the answer which is the most suitable hyperparameters with scored the best accuracy.

print(optimizer.max)
>>> {'params': {'C': 9402.684872249694, 'gamma': 0.0001}, 'target': 0.9800000000000001}

We’ve done it! After identifying the best hyperparameters for our model, we should just apply them to our model. Then you’ll get your final estimator hopefully.

# example
estimator = SVC(C=9402.684872249694, gamma=0.0001)
SVC.fit(X_train, y_train)
y_pred = SVC.predict(X_test)
# measure accuracy of estimator by RMSE metric
np.sqrt(mean_squared_error(y_test, y_pred))
>>> 0.05..

— — — — —

References

--

--

Akira Takezawa
Coldstart.ml

Data Scientist, Rakuten / a discipline of statistical causal inference and time-series modeling / using Python and Stan, R / MLOps is my current concern