US Presidential Election 2016 — Twitter Analysis

Initial analysis of the Twitter conversation surrounding the election and how to find interesting and relevant stories in a very noisy environment.

John Swain
9 min readOct 2, 2016

Update: David Fahrenthold broke this story on 7th October.

Update: Since this post was drafted, news has broken about Donald Trump’s tax returns. The people highlighted in this post are at the forefront of the quality reporting on this issue — follow them here:

At Right Relevance we provide information about influence on social media and, particularly, Twitter. We have a free service where you can discover information about topical influencers on thousands of topics. We provide an API framework to provide access to our data on influencers which we call “Relevance as a Service”. In addition, we undertake deeper consultancy projects to provide detailed analysis of topics.

I have recently blogged about Twitter influence on subjects, including NHS, Brexit, World Economic Forum at Davos and Climate Change.

Here we will introduce our analysis of the US Presidential Election.

Orientation

When conducting an analysis of an overall social network conversation on a specific subject it is important to first get an understanding of the general “landscape”.

We began collecting data on 9th August with a series of Track Phrases collecting raw tweets from the Twitter API. The Track Phrases are designed to collect a wide range of tweets about the election and be as balanced as possible with respect to both sides. This process has already collected in excess of 120 million tweets.

In order to gain an understanding of the overall structure we have initially analysed the tweets of September 11th when there was a balance of stories about both sides.

We collected 2.6 million tweets during the day from over 300k users.

In order to make sense of this volume of information we use a process called the OODA Loop which we have adapted for social media analysis. To support this process of analysis we use several software applications which provide a combination of visual & dashboard tools.

Conventional Twitter performance analysis uses, for example, straight-forward measures of the number of retweets, or occurrence of hashtags indicating support for a cause. These simple metrics are useful, but are subject to various problems, including abuse from spam-bot accounts, or a skewing towards towards users with a very large number of followers.

Graph Theory is a way of correcting for these problems and gaining insight into influence within an overall subject of interest. A graph, in this context, is a representation of data which emphasises the connections between entities. This is ideal for a social network conversation on Twitter which consists of communication between users.

The software underpinning all our systems is the Neo4j graph database. Neo4j allows us to store all the data and relationships between the entities in a native graph format. In subsequent posts we will delve more into the advantages of this technology along with some of the challenges and the solutions we developed to overcome them. In this post we will focus on illustrating what is possible by way of illustration.

Conversation Map

The first part of the analysis is to look at a visualisation of the overall conversation.

Here is the conversation map for 11th September.

Zoomable High Res Version here

In these maps, each dot (or node) represents a Twitter user and the lines joining them represent communication between users, according to retweets and mentions. The size of the dot represents the Page Rank of the user.

The width of the line between users indicates the quantity of links. Groups of users with large numbers of links (retweets and mentions) between them stand out as thick and dark lines.

The colours indicate communities — or “flocks”, of users that communicate frequently with each other and may, therefore, share common interests.

A machine-learning algorithm detects each “community” and layout of the nodes is produced using a force-directed algorithm.

The partisan nature of the conversation is immediately obvious but there are also other themes and nuances which can be detected even at this high level which we will illustrate later.

Conversation Themes

“Deplorables” & “Health Issues” were the big stories on that day. That is reflected in the Terms and Hashtags detected in the conversation as can be seen here a screenshot of our Insights dashboard.

In this list each row contains the terms detected in the flocks of users as illustrated by the colouring in the network map above.

This was a day where the news (and Twitter) was dominated by the two major stories. The terms in the list above are detected by algorithmic topic analysis and on a day like 11th Sept this illustrates that pretty much everyone is talking about these subject.

In this context it is obvious that finding balance, nuance and alternative or independent voices and stories is not easy. This is a very important problem with social media and is exacerbated by the fact that conventional social media search and tools are designed to show you content you will like and agree with — this is called the ‘Filter Bubble’.

We shall illustrate how graph theory methods can help gain an better understanding of the conversation beyond the superficial.

Top Users

We measure the top performing users in 4 categories: Top Overall, Top Connectors, Top Interesting and Top Talked About. The way these measures are calculated is described in more detail in this post.

Here is the top overall table.

The top overall list contains some of the obvious users you would expect too see along with some less obvious ones. The algorithm used to calculate these rankings adjusts for factors such as the most followers to represent a balanced view of who made an impact on Twitter during the period.

To illustrate the point let’s look at why Daniel Dale is ranked so highly.

The first thing to notice is that Daniel Dale is in the top 20 users in three of the main categories we measure. Note that both Hillary Clinton & Donald Trump (despite being massively dominant overall) are both only in the top 20 in 2 categories.

Top Twitter Influencer tables — metrics described here

This is one of the tweets that Daniel Dale on 11th Sept. Despite the other big stories on this day Daniel’s tweet was widely retweeted by other influential users.

We can learn some other things about Daniel Dale from this analysis by looking at the communities of users he is a member of. In other words, by detecting communities of users by the communication between them we can find groups of people who share common interests. We call the communities we detect Flocks and Tribes.

Flocks and Tribes

Flocks

Firstly, we can look at the flocks of users identified by colours in the network map above. Here is a list of the main flocks which are listed in our dashboard. They are named after the most influential user in each flock.

List of main Flocks in the network illustrated in the map above.

Highlighted in blue is the Flock named after David Fahrenthold, Daniel Dale is a member of this flock.

Here are the tables showing the most influential users in just this flock. It is clear that this is a large group of users in the, broadly, liberal press and Clinton supporting group.

This flock of users is detected using community detection algorithms and reflects a group of people who were closely connected with each other therefore likely to share similar values.

The information used to group these people together is entirely contained in the conversation on Twitter in one single day. No other information is used in this method of community detection.

Tribes

A tribe is a community where the shared interests within the community are fixed or slowly changing. The classic example is fans of sports teams, people rarely change allegiance.

At Right Relevance we classify Tribes by the topics of influence that users share. We provide information on over 50 thousand topics

Here is a list of the topics in which Daniel Dale is influential as monitored by the Right Relevance service.

List of all topics in which Daniel Dale is influential. From rightrelevance.com

You can see that these topics cover a wide range of issues including those not relevant to the election.

Our election analysis filters out the topics that are relevant to the election. which are listed below.

List of topics in which Daniel Dale is influential relevant to the election.

Putting it all together

By using the techniques of conversation mapping, algorithmic community detection and influencer identification it is possible to start with zero knowledge and quickly assimilate the overall picture, the trending topics and the influential users.

With the benefit of hindsight the information discussed here seems quite understandable, even obvious, however at the time when the stories are developing the ability to rapidly assimilate and understand what is important and relevent in a very crowded environment is a crucial asset.

Flocking Behaviour

In the sections above we have drawn attention to the difference between flocks of users who are actively engaged during the period of analysis and the users in groups of closely aligned interests of a more permanent and intrinsic nature, which we call tribes.

Following and engaging with tribes is valuable but it is limited. Being a member of a tribe is inherently passive. The critical element in finding and exploiting influence is to reach those people when they are engaged and part of a flock of other active people.

…marketing strategies that focus on targeting a few “special” individuals are bound to be unreliable…marketers should adopt a “portfolio” approach, targeting a large number of potential influencers and harnessing their average effect

Duncan J Watts

Everything is Obvious*

*Once You Know the Answer

Using the Right Relevance dashboard we can do just that.

In the sections above we have two groups of users; the flock of users who are connected in the conversation to Daniel Dale and David Fahrenthold and the tribes in which Daniel Dale is a member.

By combining these groups we can find those users who are members of the news/journalism tribes and currently active in the Fahrenthold flock.

In the RR Insights dashboard we can select the tribes to filter.

So, we analysed over 2m tweets and over 300k users who were active in the Twitter conversation on 11th September.

Looking beyond the obvious stories of the day we have demonstrated how we can quickly find the most important and influential people who are actively engaged in the conversation.

Here is the table showing those users who are involved in reporting the stories regarding Trump financial matters. This list was discovered entirely with machine learning and no human curation up to this point.

And some of those users highlighted on the network map.

Some Other Stories we Found

Trump Washington DC

The TrumpDC hotel twitter account is in the top 4 in two of the categories of influence and 11th overall.

The measure of “Talked About” is calculated as a ratio of the number of mentions a user gets relative to the number of retweets corrected for the number of followers they have.

The person who scooped the Hillary Clinton stumble.

The response (with hindsight overreaction) to Hillary’s stumble.

Next

In future posts we will continue to report on interesting stories in the election campaign and will also dig deeper into the technology.

--

--

John Swain

Customer Engineer, Smart Analytics at Google Cloud. #chasingscratch golfer. Opinions are my own and not representative of Google.