Climate Change Twitter Analysis — 2016

John Swain
Neo4j Developer Blog
14 min readFeb 14, 2017

First published (in modified form) on Carbon Brief on 31 January 2017.

Intro by Carbon Brief…

Last year, Carbon Brief asked John Swain from Right Relevance to produce a map which visualised the climate change conversation on Twitter. In April, we published his map which was produced using data gathered over a few weeks. However, we wanted to produce something more substantial and insightful than this initial snapshot which captured just a brief moment in time. So we asked him to continue gathering data throughout the rest of 2016. Here, he explains his findings…

Background

In early 2016, we were commissioned by Carbon Brief to produce analysis of the conversation on Twitter about climate change. Data was collected during March 2016. Following the interest and reaction to the first analysis we produced for Carbon Brief last March, we agreed to continue collecting data and produce a much richer analysis of the whole year. But, first, it’s worth restating what the original brief was.

In summary, the object of the project was:

“To show, both visually and by ranking, who the key influencers are on Twitter for the term “climate change”. To show both how they “cluster” and interconnect. To show the volume of interaction and where the “hotspots” of activity are within the climate change Twitter universe. To use a transparent and bias-free methodology to both harvest and represent the data.”

The search used during this period was for tweets in English including the following keywords: “global warming” OR (global AND warming) OR “climate change” OR (climate AND change) OR globalwarming OR climatechange OR #climate’

Using this collected data, the principle of the analysis methodology is to examine the connections that users make by communicating with each other on Twitter via retweets, mentions and replies. We measure influence using the page rank and betweeness centrality algorithms. More details of the methodology can be found in this article.

We have now updated the analysis to discover the most influential users on Twitter over the course of the whole year. The analysis covers the period of February to the middle of December during which we collected over 13m tweets from more than 3m individual users.

For this much bigger graph, which includes tweets collected over a much longer period, it was necessary to filter some of the noise in order to produce an analysis which is consistent with the initial analysis conducted in March 2016 .

Visualisation

The conversations can be visualised as a Maps as shown below.

Within the map it is easy to identify groups of users which form communities of interest at a global level.

The width of the line between users indicates the quantity of links. Groups of users with large numbers of links (retweets and mentions) between them stand out as thick and dark lines. The “force-directed” algorithm is set up to organise users into groupings of strong mutual interactions, which tends to reflect mutual interests.

Additionally, you can see who is engaging who by the direction of the “catherine wheel” of lines coming out of a dot. If the lines are suggesting a clockwise motion, that person is being mentioned and retweeted a lot. Conversely those lines suggesting a anti-clockwise motion are just re-tweeting or mentioning others a lot. These users can be seen round the fringes and may indicate users who produce a lot of spam tweets but who are not removed by the filter we apply to remove very obvious bots.

This is a very high level view of the main groups of users and there are many smaller groups which can be seen by visual inspection of the map at a detailed level.

In addition to the visual layout for identification of communities we used machine learning community detection to identify how communities form within the conversation. The colours in the map represent the communities detected in this way which generally align with the layout but illustrate a slightly richer set of communities than identified by the postional layout.

Tables of Main Influencers

We use other graph algorithms to identify the most ‘influential’ users in the overall conversation. We use several measures to indicate different types of influence.

We have produced tables ranking the most influential users over the year in four categories as described in this article:

The table below shows the tables for each of the five categories:

Top Overall

Top overall influence is measured by combining the quantity of connections (Retweets, mentions, replies), the quality of the connections (measured by Page Rank) and the reach of the Users tweets. These are adjusted to discount the skew towards Users with very large number of followers.

Top Connectors

The value of a Users connectedness is measured using an algorithm called Betweenness Centrality. This measures how well a User is connected on the paths between all other Users — compared with everyone else. It was introduced as a measure for quantifying the control of a human on the communication between other humans in social networks.

Most ‘Interesting’ Users

The Interesting metric finds smaller Users who made a relatively high impact. It compares how well a User does in the overall ranking compared to how well they would be expected to do given the number of followers they have.

It is useful for finding niche or local stories in a large network where they are difficult to find.

Top Talked About

Measures how much a users is talked about rather than responded to. It measures the ratio of the amount of times a user is mentioned to the number of times a user is retweeted. The ratio is adjusted for the number of tweets the user makes and the number of followers the user has.

This can indicate that a user is not active on Twitter but is being talked about in the wider world which is reflected by other users mentioning the user on Twitter.

Top Brokers

“Brokers” are connectors between communities of users. Connectors (see above) measures how well connected users are between all other users. Brokers are those which connect (either by retweeting/replying themselves or being retweeted/mentioned by others) between different communities.

Dashboard

We show the information in a set of tables which can be seen in a Tableau Dashboard.

Click here to see the Tables in the live dashboard.

Click Top Tables tap to see the tables of influential users.

These are the five most important tables.

Screenshot taken from Interactive Tableau Dashboard

The Right Relevance Topical Influence Service provides information on how influential Twitter users are in over 50,000 Topics based on their followers/following and links to published articles. The topics scores are generated by an algorithm based on the relationships between influential users rather than any human assessment of influence.

For example here is the list of Topics in which Al Gore is influential. The number after each topic reflects their influence on a 0–100 scale.

With the integration of these Influence scores we can filter these tables by particular areas of Influence. In the screenshot below there is a list of the Topics in the middle section. If a Topic is selected it only shows users who have influence in that Topic.

For example if the ‘Renewable Energy” topic is selected in the main dashboard page as shown here.

The resulting Tables now show the most influential users with topical influence in the Renewable Energy Topic.

Influential users in the Renewable Energy Topic.

Leonardo Di Caprio

Over the last few days of February when the actor Leonardo DiCapro won an Oscar, citing climate change in his acceptance speech the Twitter conversation about climate change was dominated by Leonardo Di Caprio.

The following image from the original analysis shows this surge in influence during that period.

Twitter Network Map, week ending 06 March 2016, the highlighting the volume of users retweeting and mentioning Leonardo Di Caprio.

This effect of users with a very large number of followers having an extreme influence is something that is hard to avoid with conventional analysis based on simple metrics like retweets. The techniques we use can correct for this effect to gain a better understanding of who is a genuine influencer within a subject. The techniques for overcoming short term distortions were discussed in the original article.

Since then Leonardo Di Caprio has continued to be involved in the climate change conversation. In this analysis over a much longer period Leonardo Di Caprio is identified as a user with a major influence on the subject of climate change.

Over the course of the year DiCaprio has been a consistent tweeter on the subject of climate change and has been involved in other projects to promote the cause he is advocating including the film Beyond the Flood. What our analysis shows is that Leonardo Di Caprio is someone who has significant influence due to the number of followers but that his engagement in the conversation is also significant and that he is a genuinely influential person on the subject of climate change.

Flocks and Tribes

There are two types of group or communities which we identify which are referenced above; Flocks and Tribes. Both types of communities are identified by machine learning algorithms and not by any human decision making process. They are identified in different ways and have a different quality with regard to time.

Flocks are the main groups you see identified by position and colour in the visual map. The connections between these users are by the tweeting activity (retweets, mentions, replies) so are a reflection of how people group within the conversation. This is a temporal phenomenon e.g. the US Election and related issues. The flocks detected by the algorithm include the “UNFCCC”, “Fox News” and even “Carbon Brief”. Click on the flock’s name in the dashboard to reveal more analysis and data about that detected community.

Tribes are detected by the relationships between users followers and following. These are the groups identified by Topical influence such as the “renewable energy” topic above — when you select a Topic from the list in the dashboard the users are filtered to show the Tribe of users for that Topic. These relationships tend to reflect a more permanent set of shared interests.

Using a football metaphor, tribes are the supporters of teams. Flocks form to discuss a particular game, player transfer, etc.

Using a football metaphor; Tribes are the supporters of teams. Flocks form to discuss a particular game, player transfer etc.

Climate Change Scientists

The reverse of the Leonardo Di Caprio effect can be seen when looking for the influence of climate scientist within the Twitter conversation.

We have called this the ‘Expert Problem’. This occurs because experts tweet using technical language that is not picked up by the terms we use to collect tweets. One solution would be to track lots of technical terms, but If technical terms are added it is impossible to not introduce significant bias in favour of particular users.

For example, including tweets that contain “carbon dioxide” or “greenhouse” could capture lots of tweets that are not connected to climate change. Including terms like “climate science” may introduce a significant skew in favour of the academic community making them seem more influential overall than they actually are.

This is in conflict to our objective of identifying those who make the most impact and have influence in the general conversation about climate change.

What we can see, however, is how scientists gain influence through their ideas in two specific ways:

  1. By influencing others who do have a big voice within the general conversation.
  2. By having an idea which gains widespread notice — a scientific meme.

Michael Mann

Michael Mann is an example of the former. Mann, the director of Earth System Science Center at Penn State University, is one of the world’s most prominent and quoted climate scientists

In the main tables Michael Mann is highly placed as a Connector, in the Interesting category and Overall.

Micheal Mann in the overall tables

As can be seen from Michael Mann’s Twitter banner he does not have a very large number of followers compared to many of the users in the top overall table. The position that Michael occupies in the conversation map is a clue to the way in which he achieves influence on Twitter.

Here is the section he occupies and notice how close this is to many other mainstream influential users.

This map is the ego network for Michael Mann within this map. An ego network is all the other users that are directly connected to Michael and all the connections between those users.

Michael Mann ego network within the twitter conversation map.

This map illustrates two things:

  1. Michael Mann is connected to a large part of the most important users in the overall network
  2. The users in the ego network also communicate with each other.

Here is the ego network for CNN for comparsion.

It is clear that Michael Mann is more connected within the conversation amongst influential users who are themselves more engaged with other influential users.

To put this in context, CNN has over 30m followers.

When CNN tweet about climate change the overall reach (the total number of potential displays in users timelines), therefore, is very high.
This table shows the difference of some of the key metrics between these users. Notice the big differences between the total reach and the BTW (Betweeness).

Comparison of key metrics for CNN and Michael Mann

On the dashboard click ‘User Detailed List’ to view this table.

This illustrates two very different ways in which influence is created. CNN has a large following of users from the general public whilst Michael Mann is much less well known generally but has a high connectivity with important uses in the network.

Ed Hawkins

Another example of expert problem is this tweet from Ed Hawkins who exerts influence in the second way identified above — the spread of a powerful idea.

Ed Hawkins is a climate scientist from the UK with a fairly small user number of followers.

In May 2016 Ed Hawkins tweeted about a graphical representation of the change in global temperatures since 1850. This went viral but was not picked up by our search term which looks for explicit phrases about climate change.

However, a human can clearly see that it is about climate change. This is a clear example of the expert problem.

A similar example from later in the year did not have anywhere near the same impact.

The effect of Ed Hawkins viral tweet is picked up in our analysis because of the fact that many users tweet about it using terms which are picked up by our search terms.

One way this phenomenon is measured by our analysis is the “Interestingness” measure.

As mentioned above “Interestingness”:

“Compares how well a user does in the overall ranking compared to how well they would be expected to do given the number of followers they have.”

So, whilst Ed Hawkins has a small number of followers and a lot of his tweets are aimed at a scientific rather than a general audience, we can still detect his influence overall.

Presidential Election

A clearly visible section of the map is the area shown below which is easily identifiable as the main participants in the US Presidential election.

Section of the conversation map with users involved in the Presidential Election

There are two main points to note about the election and the result.

Firstly, it was a significant feature of the election that climate change was not widely discussed within the debate. We carried out a significant analysis of the election in which we found very little mention of climate change in the main topics of discusion.

Secondly, it is worth examining how Donald Trump’s influence is detected within this analysis. Whilst it is certainly the case that Donald Trump is highly influential within any subject of general interest to society as a result of becoming the President, the way in which this is represented in the conversation illustrates a fundamental shift in the nature of the global discussion of climate change as seen on Twitter.

In the tables Trump is most talked about and most influential overall, a fair reflection of the years events, he is also one of the Top Connectors.

Notice the blue bars in the “Talked About” table. These illustrate the actual score of each user and it is very apparent just how much more highly Donald Trump scores in the “Talked About” measure than other users.

As mentioned above the “Talked About” measure:

“…can indicate that a user is not active on Twitter, but is being talked about in the wider world which is reflected by other users mentioning the user.”

This table shows that Donald Trump has no retweets at all in the entire 13m set of tweets collected despite the same table showing that Trump tweeted 469 times. In other words, all of Trump’s influence comes from other users mentioning him.

The explanation for this seemingly unlikely fact is that the 469 tweets from Donald Trump were made before the collection started but replied to by users during the period of collection.

This shows that Donald Trump has not been active in the climate change conversation himself during our period of analysis.

If we plot the Talked About Ratio against the number of retweets the chart shows how far apart Donald Trump is from other important people.

Shift of Influence

In this conversation the single most important person is not engaged at all in the actual conversation. His power comes from an external source. There are other powerful and well known people who have external power which is reflected in the Twitter conversation but no where near to the same extent as Donald Trump.

If we look again at the overall map and the different groups we identified earlier we can observe that amongst the different views and topical interests that are reflected most of the most influential people are involved in the conversation.

What we can observe that this has changed, the single most important individual person with immense power over the policy decisions regarding climate change is no longer involved in the conversation on Twitter.

It is beyond the scope of our analysis to understand what this means in the wider world but it does indicate a profound change.

--

--

John Swain
Neo4j Developer Blog

Customer Engineer, Smart Analytics at Google Cloud. #chasingscratch golfer. Opinions are my own and not representative of Google.