Member-only story
Model Selection and Hyperparameter Tuning on Amazon Kindle Book Reviews with Python
Sentiment analysis on book reviews with model selection and hyperparameter optimization
Introduction
This article aims at selecting and deploying the optimal machine learning model to perform sentiment analysis on a dataset of book reviews from the Amazon Kindle Store.
In a previous article, we optimized a Support Vector Machines algorithm on an IMDB movie review database. Although SVM is a great algorithm for classification problems, is it also the best choice? With this new project, the goal now is to include the model selection mechanism.
By pipelining multiple models we can compare them on different aspects, including accuracy and efficiency.
The data
The dataset, which you can find at the following link, is composed of roughly 1,000,000 book reviews [1][6]. The goal would be to produce a high-performing sentiment analyzer by training it on one portion of the dataset. If you want to review what sentiment analysis is, I can suggest a quick read of this article, which covers all the basics.
The structure of the csv file includes 10 columns: