Performing sentiment analysis on Amazon comments

Using Fast Text library to predict rating on amazon comments

Evelyn Hull
CodeX
3 min readJul 22, 2021

--

Photo by Christian Wiediger on Unsplash

Currently, I am running an experiment to find possible purchase biases in my amazon shopping history, the first step is to run a sentiment analysis in the product comments. For this I am using the Fast Text library for text classification, the creation of this model was based on this article on Kaggle.

Retrieving the data

In the Kaggle link, you can download the data set with the formatted data, ready to train the model. I had to make another dataset in Spanish because my account has mostly Spanish comments since it’s from Mexico. You can download the Spanish dataset here.

To train a model in Fast Text you need the data in the following format:

Basically, you define your label after the keyword “__label__” and after all the labels you want to define you put the text associated with it. In this case, we have label 1, which is going to mean comments rated with 1–2 stars, and label 2 that is going to mean comments rated with 4–5 stars.

Why don’t we include 3 stars rated articles? Neutral comments can affect the precision of our model since it’s sentiment analysis.

Training the model

To train our model we are going to use the following command:

With this we are passing over our train.ft.txt data and creating our model in the file model_amzon, the supervised tag, tells Fast Text that we are going to train a supervised classifier.

Testing the model

To test our model first we need to take a look at its accuracy. We are going to run the following command, we are testing our trained model with the test.ft.txt data. This file contains data that our model hasn’t seen, and it’s going to try to predict its labels.

The accuracy in this model with the Kaggle dataset is 0.916 which is pretty cool, we have a very reliable model, we can try it out using the predicted tag in Fast Text, this allows us to use our trained model to predict the most likely label to some text we enter. I am going to use some comments on the Roomba cleaner on Amazon.

Trying out a very obvious comment:

I hate it: This was supposed to be an autonomous helper to assist in keeping my floors clean. The cleaning part actually works pretty well. The first thing I noticed was how loud it is for such a small device…..

We got a good prediction label 1, meaning 1–2 stars

The iRobot 675 has changed my life. I’m a mom to 4 young children living in a 1880 square foot house on a large wooded lot. My kids are outside A LOT and track in all kinds of dirt, mud, leaves, grass, etc. My husband, while I love him, is also kind of a mess and has a bad habit of not wiping his shoes and bringing in a ton of sawdust from his projects.

A good prediction as well, we got label 2, meaning 4–5 stars

In general, we are going to get reliable answers for our sentiment analysis, just keep in mind that when reviewing comments, people sometimes give 5 stars and write that it sucks, so if you are going to use it in this kind of area keep that in mind to remove the noise from your model.

--

--