Origin of wine part 6

Published in

Software-Dev-Explore

2 min readNov 2, 2023

Introduction

Hyperparameter Tuning is a technique to help me to find the best parameters for a model so that the model can perform well.

I can use GridSearchCV from Scikit-Learn to find the best parameter for a particular model then train the model with the best parameters later to have a good performance.

Code

Notebook with code

Hyperparameter Tuning

Split dataset

Data is splitted into training set and test set.

Create pipeline

Dataset’s preprocessed descritpion need to be transformed into numerical data then fed into model for hyperparameter tuning. To organize this two tasks in a flow from start to the end, Scikit-Learn provide a way to achieve it and that is pipeline.

Pipeline is a sequence of segments where each segments is a transformer. The last segment in a pipeline have to be an estimator(model).

Pipeline accept a list of tuple where the first element in tuple is the name of transformer and the second element is the object of transformer. The last element in the list is the estimator(model).

In this pipeline, data will first be transformed into numerical data by TfidfVectorizer, second data go through a feature selection by SelectKBest and last data will be fed into LinearSVC model.

SelectKBest is an optional and can be removed from pipeline.

Tuning

It is necessary to specify a range of paramters for tuning. And use GridSearchCV to find the best paramters.

The more parameters the longer the search.

NLPTransformer is not included in pipeline because it will take a long time to find the best parameters and the reason being is for each possible combination of parameters, training data need to be preprocssed.

After GridSearchCV has finished, we can inspect the best parameters and score.

Conclusion

Instead of trying possible parameters for a model manually, GridSearchCV help me to find the best parameters for the model automatically. It reduce the complexity and increase productivity.

It is ready to train the model with best parameters.

part 7