Sentiment Analysis on Instagram captions in under 80 lines of code.

Niharika Pandit
Analytics Vidhya
Published in
5 min readJul 20, 2020

Sentiment Analysis, one of the most successful and well-known natural language processing techniques is very helpful in determining the sentiment or the opinion of the author (who is quoted in the text). It is a powerful way to make a judgement of the emotions and sentiments of people. It is widely used by many businesses to identify customer sentiment toward products, brands or services in online conversations and get their respective feedback.

Nowadays, everyone and their dogs(no joke) have instagram accounts. Sentiment analysis on twitter is very common and is done widely these days. But, Instagram captions ( the text that the user writes while posting a photo) are a great source to get textual data which can be mined and analysed. People tend to prefer an instagram caption that often overlaps with their mood or state of mind.

If you just want to access the code, Click here.

Sentiment Analysis using VADER

Over the last few years, many packages have been introduced to make implementation of sentiment analysis easier.For more information about such packages and for a detailed comparison, Check out this article by neptune.ai. VADER is one such package.

VADER ( Valence Aware Dictionary for Sentiment Reasoning) is a model used for text sentiment analysis that is sensitive to both polarity (positive/negative) and intensity (strength) of emotion. VADER sentimental analysis relies on a dictionary that maps lexical features to emotion intensities known as sentiment scores. It is a lexicon based algorithm, meaning it calculates polarities of each lexical feature (word). This is the perfect approach for naive judgements where only the sentiment of the author is required. It is very efficient and even provides polarity scores of the lexicons.

The general rule for VADER algorithm is:

If the polarity score is > 0.05 , the sentence is positive.

If the polarity is < 0.05 , the sentence is negative.

Installation of VADER

The installation of VADER sentiment is very simple. The most common way is just installation using pip.

> pip install vaderSentiment

To verify if the installation is successful:

from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer

If there is no error, then you are good to go! If not, look for other methods. Here, is their Github repository.

Scraping Instagram Captions

Source

Now this process will obviously only work on public accounts as only their account captions will be seen. The process works in the following manner:

  • Getting the instagram handle of the person from the user
  • Opening the page using the Url
  • Locating the instagram caption
  • Scraping the most recent 10 captions from the page.

Here, the python library that is used is beautifulsoup. It is widely known for pulling out data from HTML or XML files. It can navigate, store, iterate through the web page to extract the desired data. Before actually scraping the data, inspect the html components of the web page is always a good idea.

After determining the fields and understanding the format of the HTML page, the following code was used to scrape data :

Instagram Caption Scraping for Sentiment Analysis

Sentiment Analysis

Now that we have the data, sentiment analysis can be done on it. VADER makes the task pretty easy. A sentiment intensity analyzer object is initialized and then polarity scores of the words can be generated using the polarity_scores() function available by VADER.

Below function is used for sentiment analysis:

The below diagram makes the compound scores(polarity) easier to understand.

Working of VADER

Results

When the code is run, the program scrapes the most recent 10 instagram captions from the mentioned username.

Any open instagram account can be chosen, here the official instagram account “@instagram” is chosen. They often tend to write long captions.

The program scrapes the recent captions and analyzes them. Here’s an example:

Result Example 1

The algorithm prints the caption and the overall sentiment dictionary, followed by individual percentages for the sentence/ sentences.

Result Example 2

The above snap is another example of the caption analysis but it is shorter than the previous. The same procedure is followed nonetheless.

After 10 such outputs that determine the sentiments of the individual captions, we can finally group them and assign positive captions as 1 and negative as 0’s. If there are more ones than zeros in the grouped array, our user is termed positive and vice-versa.

Finally, we get the output at the end of the execution of the program as:

User is on the positive side (If overall polarity is positive)

OR

User is on the negative side (If overall polarity is negative)

This post is just the overview of this small project. The entire code can be found below:

Further Scope:

This model is a very naïve model. But, the project can be expanded further and more powerful libraries like Spacy can be used for further analysis. It can be used to find phrases or words that can be extracted from sentences used in Instagram Captions which correspond to certain emotions (happy, sad and more).

--

--