Different types of Hyper-Parameter Tuning.

This article aims to show python implementation for different Hyperparameter Tuning techniques using the RandomForest model.

Abhigyan
Analytics Vidhya
Published in
13 min readJul 17, 2021

--

Contents:

→ Importance of Hyper-Parameter Tuning!
→ Hyperparameter Tuning/Optimization
→ Defining Functions
→ Checking Performance on Base Model
→ Different Hyperparameter Tuning Methods

1. GridSearch
2. RandomSearch
3. Successive Halving
4. Bayesian Optimizers
5. Manual Search

→ Difference between Parameters and Hyperparameters
→ Conclusion

Hyperparameters are the soul of any model present in today’s ML world. The values of Hyperparameters needs to be passed manually as they cannot be learned, which then controls the whole Learning Process.

Hyperparameters are needed to be set before fitting the data in order to get a more robust and optimized model.

Importance of Hyper-Parameter Tuning!

  1. The goal of any model is to achieve a minimum error, Hyperparameters help achieve that as they are responsible for the outcome of any ML models.
  2. It influences the convergence of any ML Algorithm to a large extent.

Hyperparameter Tuning/Optimization

The process that involves the search of the optimal values of hyperparameters for any machine learning algorithm is called hyperparameter tuning/optimization.

I will use pulsar star data, You can download the data from the Kaggle Link.

Complete Code can be found in my GitHub repo.

Defining Functions

Function to evaluate Train Set.

def eval_model_train(model): 
#defining a function to calculate the metrics on train data
pred = model.predict(x_train)
Precision = precision_score(y_train,pred)
Recall = recall_score(y_train,pred)
F1_Score = f1_score(y_train,pred)
return pred, Precision, Recall, F1_Score

Function to evaluate Test Set

def eval_model_test(model): 
#defining a function to calculate the metrics on test data
pred = model.predict(x_test)
Precision = precision_score(y_test,pred)
Recall = recall_score(y_test,pred)
F1_Score = f1_score(y_test,pred)
return pred, Precision, Recall, F1_Score

Function to calculate time take

def exec_time(start, end):
diff_time = end - start
m, s = divmod(diff_time, 60)
h, m = divmod(m, 60)
s,m,h = int(round(s, 0)), int(round(m, 0)), int(round(h, 0))
return f"{h}:{m}:{s}"

Checking Performance on Base Model

→ Checking default Parameters of the RandomForest Base Model

Rf_model = RandomForestClassifier()
pprint(Rf_model.get_params())
start_base = time.time()
Rf_model.fit(x_train,y_train)
end_base = time.time()
basemodel_time = exec_time(start_base,end_base)
basemodel_time

Performance on Train set

_, precision_basetrain, recall_basetrain, f1_basetrain = 
eval_model_train(Rf_model)
print("Precision = {} \nRecall = {} \nf1 = {}".format(precision_basetrain, recall_basetrain, f1_basetrain))

Performance on Test set

_, precision_basetest, recall_basetest, f1_basetest = eval_model_test(Rf_model)
print("Precision = {} \nRecall = {} \nf1 = {}".format(precision_basetest, recall_basetest, f1_basetest))

Different Hyperparameter tuning methods:

1. GridSearch:

  • Grid search picks out hyperparameter values by combining each value passed in the grid to each other, evaluates every one of them, and returns the best.
  • This leads to searching through the entire grid of the selected data.
  • GridSearch may suffer from the Curse of Dimentionality, as more the parameters we pass, the more time and space will be taken by the parameters to perform the search.

The curse of dimensionality refers to various phenomena that arise when analyzing and organizing data in high-dimensional spaces(higher feature count)that do not occur in low-dimensional spaces(lower feature count).
This means the more dimensions we add, the more the search will increase in time complexity, ultimately making this strategy inconvenient.

providing a dictionary of hyperparameters

# Number of trees in random forest
n_estimators = [int(x) for x in np.linspace(start = 200, stop = 1500, num = 3)]
# Number of features to consider at every split
max_features = ['auto', 'sqrt']
# Maximum number of levels in tree
max_depth = [int(x) for x in np.linspace(10, 80, num = 3)]
# Minimum number of samples required to split a node
min_samples_split = [2, 10, 15]
# Minimum number of samples required at each leaf node
min_samples_leaf = [1, 4, 9]
# Method of selecting samples for training each tree
bootstrap = [True, False]
para = {'n_estimators': n_estimators,
'max_features': max_features,
'max_depth': max_depth,
'min_samples_split': min_samples_split,
'min_samples_leaf': min_samples_leaf,
'bootstrap': bootstrap}
pprint(para)#print our grid of hyperparameter values

OUTPUT:

Now, We fit the GridSearch model to find the set of optimal hyperparameter values.

The model will try out 324 combinations of hyperparameters.This gives you an idea of how grid search increases the Time Complexity.
2 of bootstrap
3 of max_depth
2 of max_features
3 of min_samples_leaf
3 of min_samples_split
3 of n_estimators
which gives a combination 2*3*2*3*3*3 = 324

start_gridsearch = time.time()grid_search = GridSearchCV(estimator = Rf_model, 
param_grid = para,
scoring = "f1",
cv = 5, n_jobs = -1, verbose = 1)
# Fit the grid search model
grid_search.fit(x_train,y_train)
end_gridsearch = time.time()gridsearchmodel_time = exec_time(start_gridsearch,end_gridsearch)
gridsearchmodel_time
grid_search.best_params_ #outputs the set of best hyperparameter values.

OUTPUT:

Performance on Train Set

_, precision_gridtrain, recall_gridtrain, f1_gridtrain = 
eval_model_train(grid_search)
print("Precision = {} \n Recall = {} \n f1 = {}".format(precision_gridtrain, recall_gridtrain, f1_gridtrain))

Performance on Test Set

_, precision_gridtest, recall_gridtest, f1_gridtest = 
eval_model_test(grid_search)
print("Precision = {} \n Recall = {} \n f1 = {}".format(precision_gridtest, recall_gridtest, f1_gridtest))

2. RandomSearch:

  • Random Search removes the exhaustive search done by GridSearch by combining the values randomly.
  • Since the selection of parameters is completely random; it yields high variance during computing.
  • For example,
    Instead of checking all100 samples,RandomSearch checks 50 random parameters.
  • However, There is a trade-off to decreasing the time complexity. It is good at testing a wide range of values and normally it reaches a very good combination very fast, but the problem is that it doesn’t guarantee to give the best parameters combination.

Using the same dictionary of hyperparameters

Now,we fit the RandomSearch Model.This will take some time to execute.Depending on the size of the data.

Note:
→ The most important arguments in RandomizedSearchCV are n_iter, it handles the number of different combinations of data to try.
→ cv which is the number of folds to use for cross validation.Increasing cv folds reduces the chances of overfitting, but will increase the run time.

start_randomsearch = time.time()random_search = RandomizedSearchCV(estimator = Rf_model, param_distributions = para, cv = 5, verbose=1, random_state=42, scoring = "f1", n_jobs = -1)
# Fit the random search model
random_search.fit(x_train,y_train)
end_randomsearch = time.time()randomsearchmodel_time = exec_time(start_randomsearch,end_randomsearch)
randomsearchmodel_time
random_search.best_params_

OUTPUT:

Performance on Train Set

_, precision_randtrain, recall_randtrain, f1_randtrain = 
eval_model_train(random_search)
print("Precision = {} \n Recall = {} \n f1 = {}".format(precision_randtrain, recall_randtrain, f1_randtrain))

Performance on Test Set

_, precision_randtest, recall_randtest, f1_randtest = 
eval_model_test(random_search)
print("Precision = {} \n Recall = {} \n f1 = {}".format(precision_randtest, recall_randtest, f1_randtest))

3. Successive Halving:

Scikit-learn also provides the HalvingGridSearchCV and HalvingRandomSearchCV estimators that can be used to search a parameter space using successive halving

  • Successive halving (SH) is like a tournament among candidate parameter combinations.
  • SH is an iterative selection process where all candidates (the parameter combinations) are evaluated with a small amount of resources at the first iteration.
  • Only some of these candidates are selected for the next iteration, which will be allocated more resources.
  • For parameter tuning, the resource is typically the number of training samples, but it can also be an arbitrary numeric parameter such as n_estimators in a random forest.

3.1 — Halving GridSearch

Using the same dictionary of hyperparameters

start_halvinggrid = time.time()Halving_grid_search = HalvingGridSearchCV(estimator = Rf_model, param_grid = para, cv = 5, verbose=1, random_state=42, n_jobs = -1)
# Fit the random search model
Halving_grid_search.fit(x_train,y_train)
end_halvinggrid = time.time()
halvinggridmodel_time = exec_time(start_halvinggrid,end_halvinggrid)
halvinggridmodel_time

Checking Best Parameters

Halving_grid_search.best_params_

Performance on Train Set

_, precision_halvinggridtrain, recall_halvinggridtrain, f1_halvinggridtrain = eval_model_train(Halving_grid_search)print("Precision = {} \n Recall = {} \n f1 = {}".format(precision_halvinggridtrain, recall_halvinggridtrain, f1_halvinggridtrain))

Performance on Test Set

_, precision_halvinggridtest, recall_halvinggridtest, f1_halvinggridtest = eval_model_test(Halving_grid_search)print("Precision = {} \n Recall = {} \n f1 = {}".format(precision_halvinggridtest, recall_halvinggridtest, f1_halvinggridtest))

3.2 — Halving RandomSearch

Using the same dictionary of hyperparameters

start_halvingrandom = time.time()Halving_random_search = HalvingRandomSearchCV(estimator = Rf_model, param_distributions = para, cv = 5, n_jobs = -1, verbose = 1, )
# Fit the grid search model
Halving_random_search.fit(x_train,y_train)
end_halvingrandom = time.time()
halvingrandommodel_time = exec_time(start_halvingrandom,end_halvingrandom)
halvingrandommodel_time

Checking Best Parameters

Halving_random_search.best_params_

Performance on Train Set

_, precision_halvingrandtrain, recall_halvingrandtrain, f1_halvingrandtrain = eval_model_train(Halving_random_search)print("Precision = {} \n Recall = {} \n f1 = {}".format(precision_halvingrandtrain, recall_halvingrandtrain, f1_halvingrandtrain))

Performance on Test Set

_, precision_halvingrandtest, recall_halvingrandtest, f1_halvingrandtest = eval_model_test(Halving_random_search)print("Precision = {} \n Recall = {} \n f1 = {}".format(precision_halvingrandtest, recall_halvingrandtest, f1_halvingrandtest))

Complete Code can be found in my GitHub repo.

4. Bayesian Optimizers:

4.1 — Hyperopt

Hyperopt is a Python library for serial and parallel optimization over awkward search spaces, which may include real-valued, discrete, and conditional dimensions.

Defining Search Space

space = {
"n_estim ators": hp.choice("n_estimators",[200, 850 ,1500]),
"max_depth": hp.quniform("max_depth", 10, 80,5),
"max_features": hp.choice("max_features", ["auto", "sqrt"]),
"min_samples_split":hp.choice("min_samples_split",[2, 10, 15]),
"min_samples_leaf":hp.choice("min_samples_leaf",[1, 4, 9]),
"bootstrap": hp.choice("bootstrap",[True,False])
}

Defining Function to minimize

def tune_random(params):
rand = RandomForestClassifier(**params,n_jobs=-1)
score = cross_val_score(rand,
x_train,y_train,scoring="f1",cv=5).mean()
return {"loss": score, "status": STATUS_OK}

Minimizing the function

start_hpot = time.time()trials = Trials()best = fmin(
fn=tune_random,
space = space,
algo=tpe.suggest,
max_evals=100,
trials=trials
)
end_hpot = time.time()hpotmodel_time = exec_time(start_hpot,end_hpot)
hpotmodel_time

Checking Best Parameters

print("Best: {}".format(best))

Fitting Base Model with the set of best Parameters

rf_hyperopt = RandomForestClassifier(n_estimators=200,
max_depth=35,
max_features='auto',
min_samples_split=10,
min_samples_leaf=9,
bootstrap = True).fit(x_train,y_train)

Performance on Train Set

_, precision_hptrain, recall_hptrain, f1_hptrain = eval_model_train(rf_hyperopt)print("Precision = {} \n Recall = {} \n f1 = {}".format(precision_hptrain, recall_hptrain, f1_hptrain))

Performance on Test Set

_, precision_hptest, recall_hptest, f1_hptest = eval_model_test(rf_hyperopt)print("Precision = {} \n Recall = {} \n f1 = {}".format(precision_hptest, recall_hptest, f1_hptest))

4.2 — Optuna

  • Eager dynamic search spaces
  • Efficient sampling and pruning algorithms
  • Easy integration
  • Good visualizations
  • Distributed optimization

Defining Function

def objective(trial):
n_estimators = trial.suggest_int("n_estimators",200,1500)
max_features = trial.suggest_categorical("max_features",["auto","sqrt"])
max_depth = trial.suggest_int("max_depth",10,80,log = True)
min_samples_split = trial.suggest_int("min_samples_split",2,15)
min_samples_leaf = trial.suggest_int("min_samples_leaf",1,9)
bootstrap = trial.suggest_categorical("bootstrap",[True,False])

rand = RandomForestClassifier(n_estimators=n_estimators,max_features=max_features,
max_depth=max_depth,min_samples_leaf = min_samples_leaf,
min_samples_split = min_samples_split,
bootstrap = bootstrap)
score_cr = cross_val_score(rand,
x_train,
y_train,
n_jobs = -1,
cv=5,
scoring='f1')
score = score_cr.mean()
return score

Creating Study

study = optuna.create_study(direction='minimize')

Minimizing the Function

start_optuna = time.time()optuna.logging.set_verbosity(optuna.logging.WARNING)
study.optimize(objective, n_trials=100)
end_optuna = time.time()optunamodel_time = exec_time(start_optuna,end_optuna)
optunamodel_time

Checking Best Parameters

for key, value in study.best_trial.params.items():
print(f'{key}: {value}')

Fitting Base Model with the set of best Parameters

rf_optuna = RandomForestClassifier(**study.best_trial.params).fit(x_train,y_train)

Performance on Train Set

_, precision_opttrain, recall_opttrain, f1_opttrain = eval_model_train(rf_optuna)print("Precision = {} \n Recall = {} \n f1 = {}".format(precision_opttrain, recall_opttrain, f1_opttrain))

Performance on Test Set

_, precision_opttest, recall_opttest, f1_opttest = eval_model_test(rf_optuna)print("Precision = {} \n Recall = {} \n f1 = {}".format(precision_opttest, recall_opttest, f1_opttest))

Plotting Optimization History

optuna.visualization.plot_optimization_history(study)

4.5 — Scikit-Optimize

  • Sequential model-based optimization
  • Built on NumPy, SciPy, and Scikit-Learn
  • Open source, commercially usable

Skopt:
Defining Search Space

space = [
Integer(200,1500,name = "n_estimators"),
Integer(10, 80, name = "max_depth"),
Categorical(["auto", "sqrt"], name = "max_features"),
Integer(2,15, name = "min_samples_split"),
Integer(1,9, name = "min_samples_leaf"),
Categorical([True,False], name = "bootstrap")
]

Defining Objective Function to minimize

@use_named_args(space)# this wrapper/decorater uses the name we passed for the parameterdef objective(**params):
Rf_model.set_params(**params)
return cross_val_score(Rf_model,
x_train,
y_train,
cv=5,
n_jobs=-1,
scoring="f1").mean()

Minimizing the Objective Function

start_skopt = time.time()tune_rand_gp = gp_minimize(objective,space,random_state=1234)end_skopt = time.time()skoptmodel_time = exec_time(start_skopt,end_skopt)
skoptmodel_time

Checking Best Parameters

print(f"Best parameters: \n") 
print(f'n_estimators={tune_rand_gp.x[0]}')
print(f'max_depth={tune_rand_gp.x[1]}')
print(f'max_features={tune_rand_gp.x[2]}')
print(f'min_samples_split={tune_rand_gp.x[3]}')
print(f'min_samples_leaf={tune_rand_gp.x[4]}')
print(f'bootstrap = {tune_rand_gp.x[5]}')

Fitting Base Model with the set of best Parameters

rf_skopt = RandomForestClassifier(n_estimators=200,
max_depth=67,
max_features='sqrt',
min_samples_split=2,
min_samples_leaf=9,
bootstrap = True).fit(x_train,y_train)

Performance on Train Set

_, precision_sktrain, recall_sktrain, f1_sktrain = eval_model_train(rf_skopt)print("Precision = {} \n Recall = {} \n f1 = {}".format(precision_sktrain, recall_sktrain, f1_sktrain))

Performance on Test Set

_, precision_sktest, recall_sktest, f1_sktest = eval_model_test(rf_skopt)print("Precision = {} \n Recall = {} \n f1 = {}".format(precision_sktest, recall_sktest, f1_sktest))

Plotting Convergence Graph

plot_convergence(tune_rand_gp)

4.4 — BayesSearchCV

As of now BayesSearchCV is not compatible with sklearn 0.24 version.
To use BayesSearch downgrade sklearn to 0.23.2

Defining Search Space

param_bayes = {
"n_estimators": Integer(200,1500),
"max_depth": Integer(10, 80),
"max_features": Categorical(["auto", "sqrt"]),
"min_samples_split": Integer(2,15),
"min_samples_leaf": Integer(1,9),
"bootstrap": Categorical([True,False])
}

fitting the bayessearchCV

bayes_rf = BayesSearchCV(Rf_model, 
search_spaces = param_bayes,
cv = 5,
scoring="f1",
refit=True)
start_bayes = time.time()bayes_rf.fit(x_train, y_train)end_bayes = time.time()bayesmodel_time = exec_time(start_bayes,end_bayes)
bayesmodel_time

Checking Best Parameters

bayes_rf.best_params_

Performance on Train Set

_, precision_bayestrain, recall_bayestrain, f1_bayestrain = eval_model_train(bayes_rf)print("Precision = {} \n Recall = {} \n f1 = {}".format(precision_bayestrain, recall_bayestrain, f1_bayestrain))

Performance on Test Set

_, precision_bayestest, recall_bayestest, f1_bayestest = eval_model_test(bayes_rf)print("Precision = {} \n Recall = {} \n f1 = {}".format(precision_bayestest, recall_bayestest, f1_bayestest))

Plotting Objective

bayes_rf_plot = plot_objective(bayes_rf.optimizer_results_[0],
dimensions=["n_estimators", "max_depth", "max_features", "min_samples_split", "min_samples_leaf", "bootstrap"],
n_minimum_search=int(1e8))
plt.show()

5. Manual Search:

  • Manual Search can be done on the basis of our judgment/experience.
  • We train the model based on the random values that we assigned manually, evaluate its accuracy and start the process again.
  • This loop is repeated until a satisfactory accuracy is scored.

Difference between Parameters and Hyperparameters

→ Model Parameters: These are learnt when the model is running and recognizing the data.
Model Parameters differ from experiment to experiment and completely depends on the type of data passed and the task being solved.

Some examples of model parameters include:

  • The weights in an artificial neural network(ANN).
  • The support vectors in a support vector machine.
  • The coefficients in linear regression or logistic regression.
  • For NLP task: word frequency, sentence length, noun or verb distribution per sentence, the number of specific character n-grams per word, lexical diversity, etc.

→ Hyperparameters: These are the values that your model expects to be passed to obtain the optimal performance on any given data, for any task.

Some examples of model hyperparameters include:

  • The learning rate for training a neural network.
  • The C and sigma hyperparameters for support vector machines.
  • The k in k-nearest neighbors.
  • Depth of tree in Decision trees

Difference between Parameters and Hyperparameters

Conclusion

After using all the different methods and creating a dataframe from the results, so that we can compare each of the techniques.

models = ['RandomForest', 'RandomForest_gridsearch', 
'RandomForest_randomsearch', 'RandomForest_Halvinggrigd',
'RandomForest_Halvingrandom', 'RandomForest_hyperopt',
'RandomForest_optuna', 'RandomForest_skopt',
'RandomForest_bayes']
model_time = [basemodel_time, gridsearchmodel_time,
randomsearchmodel_time, halvinggridmodel_time,
halvingrandommodel_time, hpotmodel_time,
optunamodel_time, skoptmodel_time, bayesmodel_time]
model_precision_train = [precision_basetrain, precision_gridtrain,
precision_randtrain,
precision_halvinggridtrain,
precision_halvingrandtrain,
precision_hptrain, precision_opttrain,
precision_sktrain, precision_bayestrain]
model_recall_train = [recall_basetrain, recall_gridtrain,
recall_randtrain, recall_halvinggridtrain,
recall_halvingrandtrain, recall_hptrain,
recall_opttrain, recall_sktrain,
recall_bayestrain]
model_f1_train = [f1_basetrain, f1_gridtrain, f1_randtrain,
f1_halvinggridtrain,
f1_halvingrandtrain, f1_hptrain, f1_opttrain,
f1_sktrain, f1_bayestrain]
model_precision_test = [precision_basetest, precision_gridtest,
precision_randtest,
precision_halvinggridtest,
precision_halvingrandtest, precision_hptest,
precision_opttest, precision_sktest,
precision_bayestest]
model_recall_test = [recall_basetest, recall_gridtest,
recall_randtest, recall_halvinggridtest,
recall_halvingrandtest, recall_hptest,
recall_opttest, recall_sktest,
recall_bayestest]
model_f1_test = [f1_basetest, f1_gridtest, f1_randtest,
f1_halvinggridtest,
f1_halvingrandtest, f1_hptest, f1_opttest,
f1_sktest, f1_bayestest]
comp_dict = {"models":models,
"model_time":model_time,
"model_precision_train":[round(i,3) for i in model_precision_train],
"model_precision_test":[round(i,3) for i in model_precision_test],
"model_recall_train":[round(i,3) for i in model_recall_train],
"model_recall_test":[round(i,3) for i in model_recall_test],
"model_f1_train":[round(i,3) for i in model_f1_train],
"model_f1_test":[round(i,3) for i in model_f1_test]}
comparison = pd.DataFrame(comp_dict)
comparison

→ Sorting with respect to f1 score on test

comparison.set_index('models').sort_values('model_f1_test', ascending = False).head(3)

→ Sorting with respect to difference between f1 score on train and test

comparison['Diff_f1_train_test'] = np.abs(comparison['model_f1_train'] - comparison['model_f1_test'])
comparison.set_index('models').sort_values('Diff_f1_train_test').head(3)

After, Sorting the Values with Respect to F1 score for train and set test, it turns out that bayesian techniques worked the best.

However, in the production environment we not only have to get the best result, but also, as quickly as possible and with respect to that RandomSerach performed the best.

Complete Code can be found in my GitHub repo.

Like my article? Do give me a clap and share it, as that will boost my confidence.
Also, check out my other post and stay connected for future articles on the basics of data science and machine learning series.

Also, do connect with me on LinkedIn.

Photo by Alex on Unsplash

--

--

Analytics Vidhya
Analytics Vidhya

Published in Analytics Vidhya

Analytics Vidhya is a community of Generative AI and Data Science professionals. We are building the next-gen data science ecosystem https://www.analyticsvidhya.com

No responses yet