Sentiment Analysis With Bag of Words

Subhani Shaik
The Startup
Published in
4 min readSep 9, 2020
Sentiment analysis with bag of words
source: revuze.it

Sentiment analysis is the process of determining whether a piece of text is positive or negative. It allows businesses to identify their customer’s sentiments towards products or services through reviews and online feedback.

For Example:

In other words, sentiment analysis gives an opportunity to explore the mindset of your customers and study the state of your product or service from your customer’s point of view.

This makes sentiment analysis a great tool for

  1. Product reviews
  2. Market research
  3. Customer service
  4. Social media monitoring
  5. Reputation management etc.,

Sentiment analysis is a filed of Natural language processing(NLP).

INTRODUCTION

The dataset which we are using contains, restaurant reviews and we are going to use sentiment analysis to find wheater a particular review is positive or negative. If the review is negative we will display the following message.

Thank you <customer first name> for taking out the time to write a review. we apologize for the inconvenience caused. We hope we will get a chance to serve you better in the near future.

If the review is positive we will display the following message.

<Customer first name>, Thank you so much for that awesome review. We look forward to serving you again in the near future.

Steps to be Followed

  1. Importing the libraries.
  2. Importing the dataset.
  3. Cleaning the data.
  4. Creating the Bag of words.
  5. Training and classification.
  6. Confusion matrix.
  7. Predicting a customer review.

Step1: Importing the Libraries

Step2: Importing the Dataset

Step3: Cleaning the data

From the above dataset output, we will find some information that does not help in determining whether a review is positive or negative.

For example, words like a, an, the, was, on, etc, doesn't have any impact on the decision. These words are called Stopwords. We also have words like loved, stopped, loving, etc, we will have to convert them to its root form.

Step4: Creating a bag of words

The bag of words is the simplest form of text representation in numbers. To learn more about bag of words — click here

Step5: Training and Classification

We have to split our dataset into training and testing sets, and then we have to apply the classification model.

Classification Algorithms

  1. Linear classifier: Logistic regression, Naive Bayes classifier
  2. Nearest neighbors.
  3. Support vector machine.
  4. Decision tree.
  5. Random forest etc..,

We can use any one of them, I tried all and surprisingly Naive Bayes gives the best accuracy.

Step6: Classification model

It is a tool, used for evaluation of the performance of a classification model. If you want to learn more about this — Click here.

The accuracy of our model is 73%, which means 73% of the predictions are accurate.

Step7: Final step - Predicting a customer review

I hope you find this article helpful. If I’ve missed anything, let me know in the comment section.

What do you think about this blog? Comment!!!

Follow me @ Medium | LinkedIn

Thanks for reading!

--

--

Subhani Shaik
The Startup

Hi, I’m Shaik Subhani, Currently pursuing MSc in Data analytics with Banking and Finance at Sheffield Hallam University, UK.