This article will explain in simple terms what grid search is and how to implement grid search using
sklearn in python.
What is grid search?
Grid search is the process of performing hyper parameter tuning in order to determine the optimal values for a given model. This is significant as the performance of the entire model is based on the hyper parameter values specified.
Why should I use it?
If you work with ML, you know what a nightmare it is to stipulate values for hyper parameters. There are libraries that have been implemented, such as
GridSearchCV of the
sklearn library, in order to automate this process and make life a little bit easier for ML enthusiasts.
How does it work?
Here’s a python implementation of grid search using
GridSearchCV of the
from sklearn.model_selection import GridSearchCV
from sklearn.svm import SVRgsc = GridSearchCV(
'C': [0.1, 1, 100, 1000],
'epsilon': [0.0001, 0.0005, 0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1, 5, 10],
'gamma': [0.0001, 0.001, 0.005, 0.1, 1, 3, 5]
cv=5, scoring='neg_mean_squared_error', verbose=0, n_jobs=-1)
First, we need to import
GridSearchCV from the
sklearn library, a machine learning library for python. The
estimator parameter of
GridSearchCV requires the model we are using for the hyper parameter tuning process. For this example, we are using the
rbf kernel of the Support Vector Regression model(SVR). The
param_grid parameter requires a list of parameters and the range of values for each parameter of the specified estimator. The most significant parameters required when working with the
rbf kernel of the SVR model are
c, gamma and
epsilon. A list of values to choose from should be given to each hyper parameter of the model. You can change these values and experiment more to see which value ranges give better performance. A cross validation process is performed in order to determine the hyper parameter value set which provides the best accuracy levels.
grid_result = gsc.fit(X, y)
best_params = grid_result.best_params_best_svr = SVR(kernel='rbf', C=best_params["C"], epsilon=best_params["epsilon"], gamma=best_params["gamma"],
tol=0.001, cache_size=200, verbose=False, max_iter=-1)
We then use the best set of hyper parameter values chosen in the grid search, in the actual model as shown above.
If you want to know how to use grid search and cross-validation and train and test a model check out this article.
Hope this article helped you. Until next time…Adios!