Rugby World Cup — First Week on Twitter

John Swain
11 min readOct 8, 2015

--

Overview

The Rugby Word Cup 2015 started on 15th September in England. For the duration of the competition I will be tracking the conversation taking place on Twitter with particular respect to the overall structure of the conversation and the communities of interest that are created.

OBJECTIVE

The overall objective of this kind of analysis is to give a wider context of how Twitter conversations are structured and the topics of conversation within the whole community. By tracking the overall conversation, groups (or communities) of users can be observed and particular subjects of interest can be more easily identified and tracked.

Conventional Twitter analysis focusses on tracking retweets and hashtags as measures of engagement. These tools are useful and important, but suffer from one important drawback; they view the world from the viewer’s perspective. For example if you are a brand tracking a hashtag that you have created for your brand you only see the conversation and engagement with people who are already engaged with your brand. What you don’t see is what the rest of the world is talking about. In the case of brand association by sponsorship or affiliation with an event like the Rugby World Cup, that means that there are large numbers of people who are engaging with the conversation that have a natural affinity for your brand but with whom you have no direct engagement.

The objective of this community detection based analysis of Twitter is to overcome these shortcomings and give a wider perspective on the overall conversation.

The analysis so far (23rd September) is of 1.2m Tweets. This is a large enough volume of traffic to make the task of finding overall meaning difficult without the use of machine learning tools.

OODA Loop

My approach to this problem is driven by the military doctrine of the OODA Loop.

The key element of the OODA doctrine is:

OBSERVING AND REACTING TO UNFOLDING EVENTS MORE RAPIDLY THAN AN OPPONENT

This is absolutely critical in a social media platform like Twitter when events, conversations and communities are evolving in real time. I have developed a tool to automate large parts of the Observe and Orient phases of the process in a situation where there is an overwhelming amount of information being created in a short period of time.

The specific objectives can be broken down as follows:

  1. Tracking performance of individual Twitter accounts or groups (or associated accounts) particularly in the context of the overall conversation about the competition. This is useful for brands or PR organisations wishing to track how well their marketing campaign is performing in context. Often a particular topic of interest is dominated by a few important users, however there is also a huge variety of different conversations taking place. These conversations may be much more aligned with your particular interests or values and it may be easier to engage with them if you can find them effectively.
  2. Understand the general zeitgeist and discover new topics of interest and the influential Users in the conversation. This can help find communities of people who share a high degree of affinity with your values or brand. This could be to find new customers; reinforce your brand image; understand political influence and opinion; or find sources of news.
  3. Find and observe competitor activity. Only by observing the whole conversation is it possible to understand how well your competitors are doing in relation to you and enable you to take action to maintain or improve your performance.

ANALYSIS METHOD

The Twitter data is collected into a graph database which records the Users, Tweets, Retweets and the relationships between them. The analysis is carried out using the R programming language. There is more information about this here in a previous blog post. In addition to the techniques described in that post I use a machine learning algorithm called Latent Dirichlet Allocation (LDA) to uncover the topics of conversation that are taking place in individual communities.

Give Blood

An example of finding information which would otherwise be difficult to uncover is the Give Blood campaign.

This is not a campaign with the budget of a major sponsor and therefore not able to compete for large share of the overall traffic. During the period between the first and second round of games (21st-22nd Sept) the @GiveBloodNHS account did not make the top 100 Twitter Accounts ranked for overall performance (more later on how that is calculated). However, community analysis reveals that they have created a significant impact in a smaller way. This impact can be seen in several ways:

Visualisation The overall Twitter map for the period looks like this:

You can clearly see where the most active conversations are taking place and the parties that are involved — even at this very high level view.

Here in the lower part of the map is a small community which is separated out by the visualisation algorithm.

This is a small community of only 48 Twitter users. However, even within a network of millions of Tweets, Retweets, and mentions the community of users who engage with this campaign can be identified as a distinct community.

Formal Community Detection using a machine learning algorithm for community detection we can identify the main communities in the overall conversation. Interestingly (as you can see from the visualisation) the Rugby World Cup creates a very subtle community structure with a clearly identifiable degree of integration and interaction. This contrasts very strongly with topics that create highly divided communities e.g. the Ebola Crisis. In the case of the Give Blood community the community detection algorithm identifies the 48 people that are most important in this community. This information is then used to further illustrate that this is a coherent community.

Tags & Words. Once we have identified the important users in the community we can analyse the words and hashtags that are prevalent within the Tweets making up the conversation.

These can be seen as a word cloud:

By way of contrast here is the equivalent word cloud for the Visit England account:

You can clearly see that these communities have a coherent set of words about a discernible topic.

Word clouds can be useful as an indicator that a community does have a coherent set of shared values and interests. However, the technique relies on a simple word count of the content of the Tweets in the community (once all the irrelevant words have been stripped out). There is a more sophisticated machine learning technique for creating a set of topics which can be uncovered in the text.

Formal Topic Analysis. Using a machine learning algorithm (LDA) as mentioned above it is possible to create a set of all the topics (made up of key words) that are contained within the entire set of the text from each Tweet and associate these topics with the communities that have been identified by the community detection algorithms.

Here is a list of the topics generated by the algorithm:

Again you can see that there is a discernible coherent subject matter in each of the topics identified and you can scan this list to see topics that are of interest to your own particular area of interest.

Here is another one from the last 2 days:

This topic helps identify that the Oddballs Twitter campaign has generated some interest.

You may notice that the topic that identifies the Give Blood campaign community does not include the word “Blood” which is obvious in the word cloud for that community. That is because the topic analysis creates topics that are shared between different communities which can help identify communities with similar or overlapping interests.

As well as finding communities you might like to know more about this technique may help identify communities you might rather avoid (if you are not into Lads Humour).

Performance Analysis

In the above analysis I have deliberately focused on smaller users and communities to illustrate the power of these techniques for finding subtle and hidden topics and influential people.

Obviously one of the most important uses is to show who is doing well overall including the big communities and very influential people and accounts.

There are three measures I have used to calculate and rank performance.

Top Overall

The “Top Overall” list shows the top Twitter Users in the overall graph for each time period.

The ranking takes into account the number of Tweets and Retweets and the overall position of the User in the Network of other important contributors. The algorithm used to calculate this is a variant of the Page Rank algorithm which I have adapted to even out the advantage that very big Twitter accounts have.

Top Connectors

Connectors are Users that sit between different communities of Users and can pass information between those communities. This is calculated using an algorithm which measures this ability to connect called Betweenness.

Top Interesting

Interesting Users are not so prominent in the network and are a much more eclectic mix of Users who have something interesting to say which makes a big impact with other relatively important Users.

The ranking for interestingness is measured by calculating how much better the user performs than would be expected given the number of followers.

So far there are rankings for the following periods:

PRE TOURNAMENT

The community structure for this period looks like this:

Here you can see from the visualisation that there is one very strong community with English Rugby at the centre. The Irish, Scottish & Springboks created their own communities along with some media organisations like ITV, 5Live & The Guardian.

Notable individuals include Mo Farah, Justin Rose, Nigel Owens & David Cameron along with Kensington Palace.

The big ‘winner’ appears to be DoveMen who created a significant community and are the most influential Connector and 18th Most Interesting.

Top Overall

Top Connectors

Top Interesting

DAY ONE — ENGLAND V FIJI

The community structure for this period looks like this:

Day One was the opening match between England & Fiji. This can be clearly seen from the overall structure of the conversation which is dominated by the English rugby fraternity and press with Fiji Rugby integrated into this community.

The Springboks have clear differentiation and there are one or two interesting small communities including Paddy Power and The Sport Bible.

Top Overall

Top Connectors

Top Interesting

DAY TWO — SA V JAPAN

The community structure for this period looks like this:

Day Two saw the matches between SA & Japan and Ireland & Canada. Again you can clearly see these communities on the map. Other teams that played that day are also visible.

In the rankings J.K. Rowling makes it to #3 Overall with a very witty Tweet — and the backing of 5.5m Followers!

Top Overall

Top Connectors

Top Interesting

DAY THREE — NZ — ARGENTINA

The community structure for this period looks like this:

Sunday was a much quieter day on Twitter after the storm of the Japan result. The smaller volume of traffic is reflected in a less dense map but interestingly the community structure is much more defined.

You can clearly see the NZ & Argentina and Wales & Uruguay conversations as you would expect.

The Daily Telegraph has a strong community based on the quality of their Rugby correspondants.

Was Vladimir Putin really the most interesting Tweeter of the day?

Top Overall

Top Connectors

Top Interesting

BREAK DAYS WEEK ONE

The community structure for this period looks like this:

The break sees a quiet few days with a clear community break down of the conversations.

Scotland & Japan are gearing up for their game with Japan still riding high after their famous win.

However, I reckon John Beatie was the star of the show with his informed take on the concussion issue.

Top Overall

Top Connectors

Top Interesting

Conclusion

Twitter is potentially a very powerful tool for communication. However, in order to communicate your message effectively it is critical (borrowing again from the military jargon) to have a good understanding of the overall environment in which you are operating.

The OODA loop concept gives a strong framework for the way in which I believe it is necessary to approach gaining insight from social media before making decisions about future Actions.

Twitter (and other social media) meet many of the criteria defined in the OODA loop for fast changing environment. Therefore, the focus has to be firmly on the Observe and Orient phases of the process.

I have built a tool which allows rapid analysis of Twitter conversations based on the breakdown of the larger picture into smaller communities of interest. This allows a daily (or more frequent) cycle of observation and orientation which can drive subsequent actions and help define the corresponding tests to measure performance.

--

--

John Swain

Customer Engineer, Smart Analytics at Google Cloud. #chasingscratch golfer. Opinions are my own and not representative of Google.