Origin of wine part 6
Introduction
Hyperparameter Tuning is a technique to help me to find the best parameters for a model so that the model can perform well.
I can use GridSearchCV from Scikit-Learn to find the best parameter for a particular model then train the model with the best parameters later to have a good performance.
Code
Hyperparameter Tuning
Split dataset
Data is splitted into training set and test set.
Create pipeline
Dataset’s preprocessed descritpion need to be transformed into numerical data then fed into model for hyperparameter tuning. To organize this two tasks in a flow from start to the end, Scikit-Learn provide a way to achieve it and that is pipeline.
Pipeline is a sequence of segments where each segments is a transformer. The last segment in a pipeline have to be an estimator(model).
Pipeline accept a list of tuple where the first element in tuple is the name of transformer and the second element is the object of transformer. The last element in the list is the estimator(model).
In this pipeline, data will first be transformed into numerical data by TfidfVectorizer, second data go through a feature selection by SelectKBest and last data will be fed into LinearSVC model.
SelectKBest is an optional and can be removed from pipeline.
Tuning
It is necessary to specify a range of paramters for tuning. And use GridSearchCV to find the best paramters.
The more parameters the longer the search.
NLPTransformer is not included in pipeline because it will take a long time to find the best parameters and the reason being is for each possible combination of parameters, training data need to be preprocssed.
After GridSearchCV has finished, we can inspect the best parameters and score.
Conclusion
Instead of trying possible parameters for a model manually, GridSearchCV help me to find the best parameters for the model automatically. It reduce the complexity and increase productivity.
Next
It is ready to train the model with best parameters.