The 2016 Philippine Elections according to Twitter
Where can I find this?
Check out Halalan16.ph
Why are you doing this?
SFW version: At BaseUp, we needed to learn more about displaying live data using Twitter, Redis, RabbitMQ, Tornado, websockets and d3js. Might as well build something useful, we thought.
NSFW version: You know that one scene in Silicon Valley where they start talking about the atrocious things they would have to perform at Disrupt just to get people to vote for them and then all of a sudden they keep going deeper into a rabbit hole? Yeah, it’s like that.
Okay, so what does it do?
We basically wanted to keep track of 4 things.
1. Who has the most number of tweets between the Presidential and Vice-presidential candidates?
2. How strongly/poorly related is one candidate to another?
3. How do people feel about each candidate?
4. Where do people tweet the most about each candidate?
How did we do this? I’m glad you asked.
- Total number of tweets per candidate
We simply increment a tally whenever a candidate is mentioned. We display this using a bar graph (fig. 1).
2. Relationship between each candidate
We read the tweet and see if multiple candidates are mentioned in the same tweet and then we draw a line between the candidates on the chord graph (fig. 2).
This process is very simple though. The first name that the processor recognizes instantly becomes the point of origin and then the next name becomes the point of destination. Basically, it thinks that the first name is the subject of the sentence (i.e. tweet) and any subsequent name is the object. To illustrate, here’s a snippet of a live tweet:
“Why is leni robredo surging and mar roxas isn’t?…”
The program draws a line from Congresswoman Robredo to Mr. Roxas and then colors the line black, which is the color assigned to Congresswoman Robredo. Whereas if the sentence was: “Why is mar roxas surging and leni isn’t?”, the program will draw a line from Mr. Roxas to Congresswoman Robredo and color the line yellow, which is the color assigned to Mr. Roxas.
3. How people feel about a candidate
What we have now is a live graph so we don’t display any overall scores yet.(We’ll update this after the elections)
What we do though is simple. Whenever, we read a tweet, we run some computations (well technically, Natural Language Toolkit does) and it can tell if the author feels positively or negatively about a candidate.
A real scientist would have trained a classifier to discern between positive and negative tweets. However, we are amateurs (lazy ones at that) so we decided to use the Vader sentiment analysis tool. You can read more about it here.
Here are live samples:
“May the best woman win talaga Leni Robredo! Huhu. Outclassing everybody!”
Sentiment score: 80% positive
“Naku nag uumpisa na ang dayaan. Ang sarap lumipat ng mars pag nanalo talaga si Mar Roxas. Jusko po kawawa naman tayo”
Sentiment score: 72.6% negative
Is it full-proof? Nope. Here are some samples of false positives and false negatives:
“Grace Poe’s enemies are afraid that you will win the Presidency, that’s why they r moving heaven and earth to DQ you”
Sentiment score: 77% positive
“no doubt, sen. miriam did santiago is the most qualified among the presidential candidates.”
Sentiment score: 73% negative
4. Mapping leading candidate volumes per geolocation
Simple tally and then we plot this on a map. Unfortunately there’s not a lot of real time data here. Out of 3000 posts, for example, we only got 4 with coordinates. We’ll hold off on this until after the elections. By then, hopefully, we’ll have a larger sample.
When did you start collecting data?
We started collecting data officially on April 12, 2016. We intend to run the project until the end of the elections. Since this started as a science project, we really are more concerned about how we use the underlying technology.
Okay now talk nerdy to me.
Disclaimer: This part goes into the meat of the code. If you would rather watch paint dry, click here. You can skip to the results section below.
Laurent did this a little bit differently. You can read more about his study here. In his case, the flow looked like this:
In our case, since we wanted to use web sockets to display real time data so we do it like this:
There are two main scripts running. The first script sets up the stream listener, which basically authenticates with Twitter and then reads the Twitter stream:
Once again, this is a little different from Laurent’s version as sometime between 2012 and now, Twitter started using OAuth authentication.
The stream_filter method then filters the Twitter stream based on the entries in names.txt.
The Listener class is the same as Laurent’s except we made some slight modifications to the on_status method.
Instead of French, for example, we just concern ourselves with the posts written by authors whose language is English. Just like Laurent’s code, however, this part also handles publishing the messages to the queue.
Where we did make several changes, however is in the process_post method.
Several things going on here. 1. In lines 11–24, if we find any tweets in Filipino, we try to translate it into English first before processing it.
2. In lines 46–51, here’s where we get the sentiment scores of the tweets. There’s a lot going here. First, after scoring the tweets, we only use the score if it scores higher than 0.5 because it seems to return a lot of false positives. This is partly because we’re lazy and did not train our own classifier. Another reason is because we translate some of the tweets from Filipino to English, which compounds the issue.
Next, we wire up the web socket. We used a simple Tornado web server.
Super simple stuff! When you start the web server, it listens on port 8888, displays index.html and opens a socket handler. Once a socket is opened, it retrieves the data in db.get_persons().
What we really want, however, is to update the graphs when a tweet is processed.
This magic happens in lines 25–26.
Next up, we need to handle the messages from the server and use the data to update the graphs.
Here’s the rest of the code.
Okay, now talk like a politician.
This study is purely academic. We are not affiliated with nor are we endorsing any candidate. This study did not receive funding from anyone.
What are the next steps for this project?
We could think of a few things off the bat.
- It would be nice if people could just use the service search for real-time keyword sentiments. E.g. “I wonder how people feel about tonight’s episode of Game of Thrones”
- It would be nice if we could compile streams from multiple social media platforms and comparing the sentiments gathered. Yes, even snapchat.
- Better tests.
Any early returns, so far?
There seems to be no correlation between raw tweet volume to poll results.
From the sentiment graph (fig. 5) as of April 13, 2016, it seems that Mr. Roxas is the most polarizing candidate.
Vice president Binay is the only candidate that is mentioned with each of the other presidential candidates whereas Senator Poe has the most number of mentions with other candidates.
After running the script for a day, we’re getting results that was consistent with the Ateneo mock elections (at least for the presidential candidates).
Here’s how the Ateneo poll turned out for the presidential candidates:
- Mar Roxas, with 37.2% of the votes cast, or 875 votes
- Rodrigo Duterte, with 28.29% of the votes cast, or 665 votes
- Grace Poe, with 13.7% of the votes cast, or 322 votes
- Miriam Defensor Santiago, with 11.23% of the votes cast, or 264 votes
- Jejomar Binay with 3.02%, of the votes cast, or 71 votes
Vice presidential candidate poll results:
- Leni Robredo, with 74.86% of the votes cast, or 1760 votes
- Ferdinand Marcos Jr., with 10.38% of the votes cast, or 665 votes
- Alan Peter Cayetano, with 8.55% of the votes cast, or 210 votes
- Francis Escudero, with 2.89% of the votes cast, or 68 votes
- Antonio Trillanes IV, with 0.89% of the votes cast, or 21 votes (not measured by us)
- Gregorio Honasan, with 0.47% of the votes cast, or 11 votes.
We added average sentiment scores per candidate on Saturday, 16-April-2016 and it paints a different picture. With the negative press that Mayor Duterte has recently been receiving, it’s not really a surprise that his sentiment score is pretty low.
From the last poll results, Congresswoman Robredo is leading, followed by Senators Marcos, Escudero, Cayetano, Trillanes (not measured) and Honasan, respectively, which is also what we’re seeing in the graphs.
This project couldn’t have been done without VADER.
Hutto, C.J. & Gilbert, E.E. (2014). VADER: A Parsimonious Rule-based Model for
Sentiment Analysis of Social Media Text. Eighth International Conference on
Weblogs and Social Media (ICWSM-14). Ann Arbor, MI, June 2014.