Sentiment Analysis of Comments on LHL’s Facebook Page

Learning how to use the Facebook Graph API and Google Cloud Natural Language API

Update 2018–08–21

Facebook recently put in place more API restrictions this July which mean that the method outlined below for obtaining a personal access token no longer works:

Today, developers can run test queries using our Graph API Explorer App. We will deprecate the app today, July 2, and developers will then need to use their own apps’ access tokens to do test queries on the Graph API Explorer.

From now on, in order to be able to download comments, you’ll need to have an active Facebook application to use to obtain an access token with.

The original post is below.


Breaking news

This was an eventful morning in Singapore.

But what do the people think?

PM of Singapore, Lee Hsien Loong

Let’s try to gauge public response to these statements based on Facebook comments. To do this, we will use:

  • Python 3
  • the Facebook Graph API to download comments from Facebook
  • the Google Cloud Natural Language API to perform sentiment analysis

First we will download the comments from a Facebook post using the Facebook Graph API. In this blog post, we’ll use this post on LHL’s Facebook page responding to his siblings’ statements: https://www.facebook.com/leehsienloong/posts/1505690826160285.

The approach we will use can be easily adapted to any post on a public Facebook page, for example Lee Hsien Yang’s original post.

We will then use the Google Cloud Natural Language API to classify the comments on these posts as either positive, neutral or negative, and calculate the proportion of positive, neutral and negative comments on each of these posts.

Important Note

Facebook is by no means a source of objective views, the comments we will find on these posts are extremely likely to contain some degree of bias. There is also the possibility of moderation, further reducing the objectivity of our findings. This blog post should be taken as purely an exercise in using the Facebook Graph API and Google Cloud Natural Language API, and not to draw any conclusions about Singapore’s political situation.

Getting started

Setting up Python 3

Downloading and installing Python 3 is not within the scope of this blog post. You can consult the Python documentation here. But just to check your Python version, open your Python prompt and inspect its output. You should see something similar to:

Python 3.6.0 (v3.6.0:41df79263a11, Dec 23 2016, 08:06:12) [MSC v.1900 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>>

If you see Python 3.x.x, you’re good to go. If you see Python 2.x.x and you are running Linux/MacOS, you can try using python3 instead of python. Otherwise, you will need to install Python 3 (or convert the code to Python 2 on your own).

We will also be using the requests and google-cloud-language libraries for making HTTP requests and performing sentiment analysis. To install these, run:

# if you are using the python3 command, you may need to use pip3 here as well
pip install --upgrade requests google-cloud-language

(Update 2018–01–05) This post has been updated to use the Google Cloud Python Library v0.26.1 and newer. If you encounter errors similar to AttributeError: module 'google.cloud.language' has no attribute 'LanguageServiceClient' then you could try rerunning the above command to update your packages.

Getting a Facebook Graph API Access Token

You will need a Facebook account to access the Graph API. We can get this using the Facebook Graph API Explorer at https://developers.facebook.com/tools/explorer/.

The Graph API Explorer

Click on “Get Token” at the top-right, followed by “Get User Access Token”. You will be presented by a large dialog for selecting permissions. There’s no need to check anything though, since LHL’s Facebook page is public. Click “Get Access Token” at the bottom-right, and your access token will be filled into the Access Token field. Leave this here for now.

Creating a Google Cloud Platform project

Unlike AWS, resources on Google Cloud Platform are grouped by project. Go to https://cloud.google.com/natural-language/docs/getting-started and follow steps 1–6 to set up a project. You may also need to install the Google Cloud SDK to use gcloud.

Downloading Facebook comments

Our objective here is to download all the comments on LHL’s post on his Facebook page responding to his siblings:

https://www.facebook.com/leehsienloong/posts/1505690826160285

We will do this by traversing the Facebook Graph API.

How the Facebook Graph API works

The Graph API exposes Facebook data as a graph, comprised of connected entities. These entities can be anything on Facebook, for example a Facebook User, a Page, and even a Comment. A full list of entities can be found at the Graph API Reference.

Entities are linked by vertices, which are the properties of an entity. For example, a Post entity is linked to many Comment entities represent each comment on that post.

We need to find the Post entity corresponds to LHL’s post, and go through all the Comment entities connected to it. To simplify things, the post entity is referenced by an ID comprised of to the ID of the user or page who made the post, and the ID of the post itself, which can be found in the post URL.

To get LHL’s page ID through the Graph API Explorer, we just enter his page username into the query box:

LHL’s Facebook page ID

After we get his page ID, we can find his post:

And just by appending /comments to the post ID, we can get all the comments made on the post:

However, we don’t want to use the Graph API Explorer to manually save all the comments, so we will use a Python script instead (updated 2018–01–15):

import requests
import signal
import sys

graph_api_version = 'v2.9'
access_token = 'YOUR_FACEBOOK_ACCESS_TOKEN_HERE'

# LHL's Facebook user id
user_id = '125845680811480'

# the id of LHL's response post at https://www.facebook.com/leehsienloong/posts/1505690826160285
post_id = '1505690826160285'

# the graph API endpoint for comments on LHL's post
url = 'https://graph.facebook.com/{}/{}_{}/comments'.format(graph_api_version, user_id, post_id)

comments = []

# set limit to 0 to try to download all comments
limit = 200


def write_comments_to_file(filename):
print()

if len(comments) == 0:
print('No comments to write.')
return

with open(filename, 'w', encoding='utf-8') as f:
for comment in comments:
f.write(comment + '\n')

print('Wrote {} comments to {}'.format(len(comments), filename))


# register a signal handler so that we can exit early
def signal_handler(signal, frame):
print('KeyboardInterrupt')
write_comments_to_file('comments.txt')
sys.exit(0)


signal.signal(signal.SIGINT, signal_handler)

r = requests.get(url, params={'access_token': access_token})
while True:
data = r.json()

# catch errors returned by the Graph API
if 'error' in data:
raise Exception(data['error']['message'])

# append the text of each comment into the comments list
for comment in data['data']:
# remove line breaks in each comment
text = comment['message'].replace('\n', ' ')
comments.append(text)

print('Got {} comments, total: {}'.format(len(data['data']), len(comments)))

# check if we have enough comments
if 0 < limit <= len(comments):
break

# check if there are more comments
if 'paging' in data and 'next' in data['paging']:
r = requests.get(data['paging']['next'])
else:
break

# save the comments to a file
write_comments_to_file('comments.txt')

By default, this script will just stop after downloading 200 comments. You can adjust the limit variable inside the script to download more, or set it to 0 to try to download everything. You can also use Ctrl-C to interrupt the download and save all the comments you downloaded so far.

This script works because we can get the same output from the Graph API Explorer by visiting the Graph API URL directly. This script does several things:

  1. Make a HTTP request to get the comments on LHL’s post
  2. Save the text of the comments on the post into a Python list
  3. Check if there are any more comments (using the paging cursors returned in the request, refer to https://developers.facebook.com/docs/graph-api/using-graph-api for more information about paging).
  4. Save the comments we got into a file.

Analysing the comment sentiment

Now that we have a list of comments we want to analyse, we can use the Google Cloud Natural Language to get the sentiment of each comment. The Cloud Natural Language API does many things, but in this blog post we will only use the sentiment analysis feature, which will inspect a block of text and determine if the prevailing emotion is positive, negative or neutral.

The comments.txt file we generated in the previous section contains the text of each comment on LHL’s Facebook post, one on each line (this is why we removed the line breaks from each comment). Now we will go through this list and determine if the sentiment of each comment is positive, negative or neutral, and calculate the overall proportion of each sentiment (updated 2018–01–05):

import signal
import sys

from google.cloud import language
from google.api_core.exceptions import InvalidArgument

# create a Google Cloud Natural Languague API Python client
client = language.LanguageServiceClient()


# a function which takes a block of text and returns its sentiment and magnitude
def detect_sentiment(text):
"""Detects sentiment in the text."""

document = language.types.Document(
content=text,
type=language.enums.Document.Type.PLAIN_TEXT)

sentiment = client.analyze_sentiment(document).document_sentiment

return sentiment.score, sentiment.magnitude


# keep track of count of total comments and comments with each sentiment
count = 0
positive_count = 0
neutral_count = 0
negative_count = 0


def print_summary():
print()
print('Total comments analysed: {}'.format(count))
print('Positive : {} ({:.2%})'.format(positive_count, positive_count / count))
print('Negative : {} ({:.2%})'.format(negative_count, negative_count / count))
print('Neutral : {} ({:.2%})'.format(neutral_count, neutral_count / count))


# register a signal handler so that we can exit early
def signal_handler(signal, frame):
print('KeyboardInterrupt')
print_summary()
sys.exit(0)


signal.signal(signal.SIGINT, signal_handler)

# read our comments.txt file
with open('comments.txt', encoding='utf-8') as f:
for line in f:
# use a try-except block since we occasionally get language not supported errors
try:
score, mag = detect_sentiment(line)
except InvalidArgument as e:
# skip the comment if we get an error
print('Skipped 1 comment: ', e.message)
continue

# increment the total count
count += 1

# depending on whether the sentiment is positve, negative or neutral, increment the corresponding count
if score > 0:
positive_count += 1
elif score < 0:
negative_count += 1
else:
neutral_count += 1

# calculate the proportion of comments with each sentiment
positive_proportion = positive_count / count
neutral_proportion = neutral_count / count
negative_proportion = negative_count / count

print(
'Count: {}, Positive: {:.3f}, Neutral: {:.3f}, Negative: {:.3f}'.format(
count, positive_proportion, neutral_proportion, negative_proportion))

print_summary()

You can hit Ctrl-C to abort the analysis halfway and view the results so far. Running this script, we will get output like:

...
Count: 379, Positive: 0.657, Neutral: 0.190, Negative: 0.153
Count: 380, Positive: 0.655, Neutral: 0.192, Negative: 0.153
Count: 381, Positive: 0.656, Neutral: 0.192, Negative: 0.152
Count: 382, Positive: 0.657, Neutral: 0.191, Negative: 0.152
Count: 383, Positive: 0.658, Neutral: 0.191, Negative: 0.151
Count: 384, Positive: 0.659, Neutral: 0.190, Negative: 0.151
...

And finally:

Total comments analysed: 781
Positive : 530 (67.86%)
Negative : 109 (13.96%)
Neutral : 142 (18.18%)

Conclusion

Based on our sentiment analysis of LHL’s Facebook post, we see that nearly 70% of comments are positive. While this could be interpreted as a sign of strong public support for our PM, we also need to take into account the fact that visitors to a Facebook page may be biased towards that page, and the possibility of comment moderation.

Potential improvements

While testing my scripts, I had to run the sentiment analysis script a few times, making unnecessary requests for the same comments each time. Given the dynamic nature of Facebook posts, the code could be modified to cache comments by ID and only analyse the sentiment of new comments. This is especially important since the Google Cloud Natural Language API only has a free quota of 5k API calls per month.

Source code

The source code for the examples above can be found at https://gist.github.com/yi-jiayu/b94d9df77007d8a6683f3df0990da0f6.

Ending notes

This also happens to be my first blog post, and I hope it was a palatable introduction to the Facebook Graph API and Google Cloud Platform. And if you are Singaporean and take the bus, do check out my Telegram bot, @BusEtaBot for getting bus etas!

Acknowledgements

Wong Jun Kai
Edmond To
Robin Lee

Revision history

2018–01–15: Updated code for analysing sentiment to use the Google Cloud Python Library v0.26.1 (https://cloud.google.com/natural-language/docs/python-client-migration)