Election 2016 — Debate Three on Twitter.

John Swain
Neo4j Developer Blog
7 min readOct 26, 2016
Overall conversation map showing users and the connections between them based on retweets.

Introduction

In a previous analysis, we looked at the difference in structure between Clinton and Trump supporters in the Twitter conversation. In particular, we examined how the most influential users were distributed and how this indicated that the mainstream media and establishment users were aligned with the Clinton side of the debate.

Update: The O’Reilly book “Graph Algorithms on Apache Spark and Neo4j Book is now available as free ebook download, from neo4j.com

A recent article published by The BBC, based on research by @pnhoward, indicated that there was a significant number of bots creating twitter traffic.

During the third debate, we also collected 2.5 million tweets from 780k users and decided to analyse these phenomena in a little more detail.

Out data set was slightly different from the set used by Professor Howard. We collected a wider range of tweets about the election and used machine learning community detection to identify the sides rather than relying on hashtags. Here is an article with some more detail about our methodology.

Identifying Influential Users

We use two basic measures for influence within a particular conversation.

1. Voice — within a given conversation (in this case debate three of the Presidential Election) the influence of a user is called their voice. This is measured using several graph algorithms which provides us with an overall rank score for voice.

2. Authority — at Right Relevance we measure the influence of users on social media in over 50k Topics. The measure of a user’s influence within a given topic is called their Authority.

We can illustrate these measures in conversation maps which show the connections between users representing communication (retweets between users) and the importance of the user by the size of each node.

Twitter conversation map showing important users (size determined by page rank) and connections indicating communication between users (retweets)

The first thing to notice about this map is that the number of users associated with Trump appears to be much bigger than the Clinton side. This would (prima facie) support the theory that there are a large number of bots tweeting pro-Trump messages. Inspection of the map would also indicate that there are relatively fewer important (indicated by size) users in the Trump side.

Close up of the Clinton side of the map.
Close up of the Trump side of the map.

Ratios of Influencers to Non Influencers

If we define influence in this network using the definition of Authority above, we can filter the two groups of users to show just those users with influence scores above a specified threshold and compare those with the number of users below the threshold (non influencers).

If we set the score for Authority at 70 the ratios are as follows:

This shows that there are almost double the number of Influencers (pro rata) on the Clinton side than on the Trump side.

We are not making any judgement about in which Topics the user has Authority in terms of worthy or valuable Topics. We are just measuring the minimum Authority score a user has within any Topic. For example, in this context, a high score in a Topic like TCOT has the same value than a high score in Journalism. You can find a full list of Topics we found in this conversation in our Tableau dashboard.

However, if we just look at those users with the highest Voice in the conversation (by removing all uses with a low Voice score) we can make a qualitative assessment of the types of users that are associated with each side by a visual inspection of the users on both sides.

Conversation map showing only users with a high Voice score — the most influential users in the conversation.

Looking at the visualisation it is clear that there is a much higher proportion of mainstream media and establishment users on the Clinton side. The Trump side has a much higher proportion of users who are specifically Trump supporting accounts.

Right Relevance Insights usually focuses on these ‘important’ users within a conversation, however, for this post we are interested in the users identified as less ‘important’ but who generate a large volume of Tweets.

Filtering Bots

If we use the Prof. Howard’s definition of a Bot (more than 50 Tweets in the day) we can observe the ratio of Bots to non Bots in the two sides of the conversation in our analysis.

Note: our data set includes a total of 780k Twitter users. Before our initial analysis we filter out small users and those we identify as bots/spammers with a very high probability. This reduces the number of users to 550k. We use a combination of different techniques to identify these simple bots at this initial stage. We then reduce the size of the data set for visualisation to a much smaller 17k these are the maps shown in this post. The 17 contains just the most important users in the network by various measures including pagerank & betweeness centrality.

What is interesting, therefore is that there are still a significant number of bots, by Prof. Howards’s definition which remain in our data set. We would characterise these as more ‘sophisticated’ bots.

Contrast of conversation maps with bots included (left) and bots removed (right).

The ratios of bots to all users are as follows.

In the tweets we analysed there are 3x as many bots on the Trump side as the Clinton side.

Therefore, our analysis also provides evidence in support of Professor Howard’s assertion that there are a much higher proportion of Bots supporting the Trump campaign than those supporting Clinton.

Conclusion

Right Relevance Insights is an application designed to find the valuable and relevant information in social media conversations. The noise generated by Bots is one of the elements that pollutes conversations and makes it hard to find relevant information.

In this short example we have shown how the noise from Bots can be isolated. Once isolated it is possible to examine both parts of the conversation:

1. Noise — examining the noise created by the Bots is useful for identifying malevolent agents who are attempting to exert influence in support of a specific cause.

2. Non Noise — by removing the noise it is possible to identify who is actually having the most effective influence within the conversation.

These are the Top Influencers within the conversation identified by Right Relevance Insights:

Tables of top users by various measures of influence.

Brexit and the Michael Buble Effect

The Brexit vote in the UK had a very similar conversation structure. There was a very clear concentration of major media and establishment users within the Remain side of the conversation. We identified this prior to the vote.

We coined the phrase ‘The Michael Buble Effect’ to describe the way in which influence is important within a particular interest group.

If you are Michael Buble, there are a finite number of people on earth who will buy your latest album. To maximise the revenue from album sales it is only important to enthuse these people — it is totally pointless spending any effort trying to convince people who are not in this group.

Crucially it is unimportant how much you alienate the people who are not in your supporters group. No matter how much a person hates Michael Buble they cannot buy less than zero records, so their opinion is irrelevant. In fact the more that non supporters hate, the more determined the supporters become. This phenomenon also applies in a two horse race election — no one can express a negative opinion.

In the Brexit debate it was clear that the Leave campaign were running a campaign aimed at enthusing a very narrow set of interests and completely alienating those with opposing views. The gamble, of course, is that there are enough people within the core supporting group to win the vote. A gamble which paid off in the Brexit vote but has left very bitter divisions within the UK as a result.

Social media (including Twitter) is a useful reflection of society as a whole. However, it is only a reflection and contains very significant skews and biases.

The key issue is that we don’t know the exact size of the people who will vote for Trump who can be energised by strong rhetoric. Brexit shows that this strategy can be successful and flies in the face of the conventional wisdom of appealing to floating voters with a reasoned argument based on persuasion.

There is a common perception that Hillary Clinton is winning the election comfortably. The assertion that there are a large number of bots ‘supporting’ Donald Trump plays to this perception by suggesting that the noisy support for Trump is not real. Based on what we observed in the Brexit election where there is a larger ‘hidden’ support for one side over the other, we would advise some caution over thinking the election is already won by Hillary Clinton.

Free download: O’Reilly “Graph Algorithms on Apache Spark and Neo4j”

--

--

John Swain
Neo4j Developer Blog

Customer Engineer, Smart Analytics at Google Cloud. #chasingscratch golfer. Opinions are my own and not representative of Google.