COP21 Paris Twitter Analysis

Day One — Shaping Up

John Swain
5 min readDec 1, 2015

First Published on COP21Live on 1st December 2015.

The COP21 conference began on 30 November 2015 in Paris. We are collecting the Twitter traffic related to the conference in order to identify influential Users.

We use some graph theory measures to even out the effect of Users with very large number of Followers e.g. @POTUS who will always have a high degree of influence on any topic. This helps identify Users, Tweets & Topics of interest that are not just the ones covered by the main players and mainstream media. For more information on the techniques we use see this post.

On day one we can start to see some basic shape and structure to the conversation as it develops. The purpose of this post is to illustrate how we can visualise this structure and begin to identify the communities (or Tribes) of Users who share a common interest. In future posts we will develop the idea of communities and show which Users have the most influence.

By 5pm on the first day we had collected approximately 600,000 Tweets in English. From this set we can do some initial analysis to see how the structure of the conversation is shaping up.

The first step in gaining understanding is to create an overall map of showing the major Users and the way they communicate with each other.

This is the map for the initial data.

Click here for a zoomable high resolution version

In these network maps each dot is a node that represents the User and each line is a communication between Users (Re Tweet or Mention).

The nodes are arranged by an algorithm which separates out the groups of nodes with high connectivity.

The size of the node is the measure of it’s overall importance in the network measured by how important all that nodes connections are using the famous PageRank algorithm.

The colours indicate separate communities of Users. The communities are detected by a machine learning algorithm. As you can see the community detection and the layout of the nodes on the map correlate to some extent and there are clear groups of Users identified by colour.

In the screenshot below you can clearly identify that there is a separate community of Users tweeting from India.

If you are wondering why the @FT is in this community @narendramodi wrote this piece in FT about the summit.

You can also see that some communities are identified as having a very distinct separation from the main body.

Here is the UK’s @mailonline.

The main body of the map, shows some clear separation of the Users in terms of the layout. You can see the main players and get a sense of the strong lines of communication between the main UN bodies and some of the most important world figures.

However, the community detection has created one very large community coloured in blue which does not separate out the sub communities within this group.

There are two things to note:

  1. We are attempting to visualise a complex network representing the nuances of human communication in a 2d map. This can provide some useful information but can never truly illustrate the multi dimensional nature of human interaction.
  2. Community detection algorithms make a judgement about which single community to place a User. Obviously in real life a single person or organisation interacts with many different communities.

In order to gain more information from this network we run different community detection algorithms over different periods of time and observe how Users are separated into many different communities.

The network map below illustrates the same period and layout as the one above with the colouring determined by an alternative community detection algorithm that breaks down the Users into much smaller communities.

Click here for a zoomable high resolution version

Now it is possible to identify a much richer set of communities with discernible topics of interest.

As mentioned above the information we can gain from these visualisations can be useful and interesting but is also limited. Mining information from millions of Tweets is complex. Twitter conversations tend to be dominated by Users with large Follower bases and exposure in the mainstream press. There is also a lot of noise from automated bots and spammers that generate a lot of noise and very little information of value.

These visualisations provide a useful overview and context to inform the process of extracting more detailed information.

Conclusion

Using graph theory and network analysis is a useful tool for discovering influential Users, content and communities in large Twitter conversations.

We have visually illustrated some of the basic concepts involved in using these methods and shown how a high level view of the overall network can provide valuable insight into the structure of the overall conversation.

The two visualisations of the network shown above illustrate several points:

  1. There is a clear overall structure with a main body consisting of the main bodies and world leaders together with elements of the press and NGOs.
  2. There are also communities of Users who are clearly separate from this main body with an identifiably distinct set of interests.
  3. The nature of these kind of conferences is that there is an overall consensus amongst a large group of the participants and interested parties. There is also a very large number of sub communities of interest and Users may be in very many different communities as they have varying interests.

This is a snapshot in time. The nature of these communities changes over time with Users coming together and separating into many communities as the topics they discuss vary over time.

In future posts we will examine how this cycle is analysed to create more detailed and valuable information about the most influential Users overall and the Tribes that form around varying topics of interest.

--

--

John Swain

Customer Engineer, Smart Analytics at Google Cloud. #chasingscratch golfer. Opinions are my own and not representative of Google.