Text Mining on Facebook Posts

Social media, for most of us, has become an integral part of our lives. It is an addiction but a great source of information as well. There is so much of content there, in the form of text, pics, videos, and even ads, that it is overwhelming at times. It acts as a source and a sink of data at the same time, and the hidden patterns in this data make it quite interesting for some of us.

These hidden details and patterns can give us some really interesting insights which otherwise we would not be aware of. In this article, a Facebook post would be analyzed to find out some interesting hidden facts. Here R has been used to carry out this text mining exercise, but one can use any tool of his or her choice.

There is an authentication process to access Facebook from R which is explained well in this post. Authentication is done using an ‘app_id’ and an ‘app_secret’ key.

The next step is to select a post for our analysis, in this a post from the Facebook Group of Sanctuary Asia has been selected. The post seems to be about atrocities on animals by humans. We’ll now copy the post ID for this particular post from the URL.

The highlighted text above is the post ID we need to access this post from R. In the getPost() function, we will enter the copied post ID and store the results in a variable named “details”.

We can check some basic statistics, like the number of likes, comments, and shares, related to the post by the following simple commands.

What is more interesting are the reactions of the people on the post. Reactions would bring out the common sentiment of the people. As it can be seen below, most of the users are not happy and it shows they care about wildlife.

We can further dig deeper and analyze the comments on the post. Let’s count the number of comments by the command length(details$comments$id), there are 76 comments.

It is always a good practice to clean your text before analyzing it. Cleaning the text involves removing the punctuations, numbers, white spaces, and symbols. Before doing that, make sure you have tm, ggplot2, stringr, and wordcloud packages installed. After cleaning our text we will create a Term Document Matrix which is a table containing all the words in all of our comments with their occurrence frequency. We can use this Term Document Matrix to create a word-cloud as shown below.

The more frequent words are larger than the less frequent ones. We can also plot the 20 most frequent words used in the comments.

There is one more interesting package called SentimentAnalysis. It can classify a string of text as positive, neutral, or negative. Let’s see how it performs on our set of Facebook comments.

The table above contains the first 11 comments and their sentiments as classified by the SentimentAnalysis package. The sentiment classification (first column) are quite appropriate and it could be of great help to figure out the overall sentiment when there are hundreds or thousands of comments.

So this was an attempt to demonstrate the power of text analytics, hope you like it. You can find the full code and data here.