Simple Sentiment Analysis — Python

Athisha R K
Analytics Vidhya
Published in
4 min readAug 1, 2021
Photo by steven lozano on Unsplash

In the modern era, every individual creates a ton of data. The majority of this data is unstructured. The ability of the companies to draw insights from the unstructured data sets them apart from other competitors.

Let me explain this with an example. Around the world, lakhs of people visit Disneyland every year. The team has tons of data in the form of visitor reviews and ratings. Applying modern-day techniques (like ML, NLP, etc.) can enable them to identify what worked for them and areas that require improvement. Sentiment Analysis is one such tool that helps in understanding the overall tone of the visitor review.

Sentiment Analysis

Sentiment Analysis measures the person’s inclination towards something using Natural Language Processing(NLP) and Text Analysis. In simple words, using this technique, we will be able to categorize a visitor’s review as a positive or negative review.

VADER

The NLTK library in Python provides us with a pre-trained model VADER, aimed at sentiment analysis on Social Media. By the end of this tutorial, you would be able to do simple sentiment analysis in Python using VADER.

Pre-requisites

  • Familiarity with Google Colab Notebooks, Jupyter Notebooks (or any other equivalent tool)
  • Basic understanding of Pandas Dataframe and NLTK Library

Dataset

The dataset I’ll be using has about 42,000 reviews of 3 Disneyland branches — Paris, California, and Hong Kong, posted by visitors on Trip Advisor. You can download the dataset from Kaggle.

Column Description (Source: Kaggle)

Expected Output

Sentiment_Tag: Positive/Negative

Download Dataset

You could either directly download the dataset from Kaggle or use the Kaggle API token to do the same. Refer to my Colab Notebook for the steps to be followed.

Download Required Libraries

Download NLTK and vader_lexicon

First, we would have to download the NLTK Library and VADER model as shown above.

Required Libraries

Import Required Libraries

Read Dataset

Read dataset using Pandas.read_csv()

Dataset Overview

data.shape and data.info()

We have 42,656 reviews and 6 features in our dataset. We have a mix of int64 and object datatype entries.

Prepare Dataset

We require only the review id and review text for our analysis.

Extract Review_ID and Review_Text

A Quick Example using VADER

Let’s go ahead and apply VADER for the first review, as shown below.

Extract First Review

If you’ve ever been to Disneyland anywhere you’ll find Disneyland Hong Kong very similar in the layout when you walk into main street! It has a very familiar feel. One of the rides its a Small World is absolutely fabulous and worth doing. The day we visited was fairly hot and relatively busy but the queues moved fairly well.

Sentiment Analysis for the first Review — rev

In the resultant dictionary,
* Pos: Positive Review Degree
* Neg: Negative Review Degree
* Neu: Neutral Review Degree
* Compound: It ranges from -1(very negative) to +1(very positive).

From the result, we could conclude that the review was quite neutral.

For All Reviews…

This can be easily applied to all the reviews using a for loop and storing the results in four arrays: neg, pos, neu, and compound.

For all reviews…

Append Results to Dataframe

Append neg, pos, neu, compound to data frame

Assign Sentiment Tags

We could assign sentiment tags by setting a threshold on the compound or by using the negative and positive features. For demonstration purposes, I have used the maximum of positive and negative to set the tag.

Assign Sentiment_Tag

Final Result

Final Result

We have got 4K+ negative reviews according to our analysis. I’m sure there are better models than VADER that could analyze it more accurately. You could also come up with your own model using ML and other familiar techniques!

For further processing, we could save the results as a CSV file as shown.

Write data frame to CSV file

Useful Links

--

--

Athisha R K
Analytics Vidhya

Infrastructure Engineer @ Lowe’s India | Python Programmer