Analysis Of Twitter Social Network

Published in

Social Media: Theories, Ethics, and Analytics

10 min readOct 6, 2020

Twitter is a major platform people use to share their opinion. It is like letting out your thoughts and opinion on something with just 280 characters. One of the key features of Twitter is that you can communicate and add to a topic using hashtags, emojis, etc. But what is most interesting about social media, and particularly in the context of this post about Twitter, is that it creates connections; networks that can be studied to understand how people interact or how news and opinions get spread. Twitter has a lot of data that can be used for many purposes. In order to use this data first, they have to be extracted.

Formula 1 racing is the most-watched motorsports event in the world. Marques such as Ferrari, McLaren, Mercedes, Honda have had competed in the event since its inauguration in 1950

I got interested in extracting hashtag “f1” which relates to Formula 1 which is the most-watched motorsports event in the world. Formula 1 has been one of the premier forms of racing around the world since its inaugural season in 1950. The word “formula” in the name refers to the set of rules to which all participants’ cars must conform. A Formula One season consists of a series of races, known as Grands Prix which is French for ‘’grand prizes’ or ‘great prizes’’, which take place worldwide on purpose-built circuits and on public roads. Formula One cars are the fastest regulated road-course racing cars in the world, owing to very high cornering speeds achieved through the generation of large amounts of aerodynamic downforce.

Tools

Python — a programming language
Tweepy — a type of RESTful API specifically for Twitter
NetworkX — a Python library for studying graphs and networks.
Pandas — data manipulation and analysis library
Matplotlib — plotting library
JSON — file type
Gephi — an open-source network analysis and visualization software package

In this article, I’m going to explain the steps I went through to extract data from Twitter. First of all, you have to obtain Twitter API credentials from the Twitter Developer website, which are API key, API secret key, Access token, and Access token secret.

https://developer.twitter.com/en

After receiving the approval, we go on and create a new app filling out the details, and lastly, we create the access tokens keeping them in a safe place.

The followings are the keys and tokens I obtained.

I have used Tweepy for extracting twitter data. In a Jupyter notebook, we can use the Tweepy Python library to connect with our Twitter credentials and stream real-time tweets related to a term of interest and then, save them into a “.txt” file.

class MyStreamListener(tweepy.StreamListener):
    
    #maxTweetCount = 50def on_status(self, status):
        print(status.text)runtime = 60 #this means one minutemyStreamListener = MyStreamListener()
myStream = tweepy.Stream(api.auth, myStreamListener)

I used a simple Twitter stream listener to collect 1000 tweets with the hashtag “f1” in it. I had the stream directly save to the “.txt” file.

class StreamSaver(tweepy.StreamListener):
    def __init__(self, filename, max_num_tweets=1000, api=None):
        self.filename = filename
        
        self.num_tweets = 0
        
        self.max_num_tweets = max_num_tweets
        
        tweepy.StreamListener.__init__(self, api=api)
        
        
    def on_data(self, data):
        #print json directly to file
        
        with open(self.filename,'a') as tf:
            tf.write(data)self.num_tweets += 1if self.num_tweets%100 == 0:
            print(self.num_tweets)if self.num_tweets > self.max_num_tweets:
            return False
        
            
    def on_error(self, status):
        print(status)saveStream = StreamSaver(filename='f1tweets.txt', max_num_tweets=1000)
mySaveStream = tweepy.Stream(api.auth, saveStream)mySaveStream.filter(track=['#f1'])
mySaveStream.disconnect()

Now, we can read all the data we gathered, that’s stored in the “.txt” file, into a pandas DataFrame.

We will use this information to graph how the people that tweet about F1 interact with each other. There are three types of interactions between two Twitter users that we are interested in: retweets, replies, and mentions. The JSON file retrieved representing the Tweet object will include a User object that describes the author of the Tweet, an entities object that includes arrays of hashtags and user mentions, among others.

The dataset for this social network analysis taken from Twitter is then stored in a DataFrame. When you’ve collected your data, you set up a pandas DataFrame containing information of interest: who tweeted, how many followers they have, was it a retweet, who it was a retweet of, if it was one, and to whom it was a reply, if it was one, etc.

tweets_df = pd.read_json("f1tweets.txt", lines=True)tweets_df.columns

You can see the information we collected after stream listening.

Index(['created_at', 'id', 'id_str', 'text', 'source', 'truncated',
       'in_reply_to_status_id', 'in_reply_to_status_id_str',
       'in_reply_to_user_id', 'in_reply_to_user_id_str',
       'in_reply_to_screen_name', 'user', 'geo', 'coordinates',              'place',
       'contributors', 'retweeted_status', 'quoted_status_id',
       'quoted_status_id_str', 'quoted_status', 'quoted_status_permalink',
       'is_quote_status', 'quote_count', 'reply_count', 'retweet_count',
       'favorite_count', 'entities', 'favorited', 'retweeted', 'filter_level',
       'lang', 'timestamp_ms', 'extended_entities', 'possibly_sensitive',
       'display_text_range', 'extended_tweet'],
      dtype='object')

From the display column, we are interested in:

Author of the Tweet: Name(screen_name) and Id(id ).
Twitter users mention in the text of the Tweet: Name and Id can be found as screen_name and id in user_mentions.
Account taking the retweet action: screen_name and id inside user object of the retweet_status.
User to which the tweet replies to: in_reply_to_screen_name and in_reply_to_id
Tweet to which the tweet replies to: in_reply_to_status_id.

After setting up and organizing the DataFrame data we can see how it looks.

print (len(tweets_data))tweets = pd.DataFrame(rows_list)print(tweets)

The resulting DataFrame will look something like this. I first displayed the amount of tweets data. note that rows 0–3 are retweets, and row 4 is a reply; “age” is days since the Twitter ID was created:

1885
               author       reply_to          age  followers     retweet_of  \
0     rizzidomenikoff                 2007.929225        393    F1_Profesor   
1          vetteclerc                 3373.098229       8059  sebvettelnews   
2          Trancients                 4157.092072        493             F1   
3         Emiliano_HN                  803.409109        110  sebvettelnews   
4        BiaRosenburg  StonedDoomguy  4152.568322       4303                  
...               ...            ...          ...        ...            ...   
1880      ansiogenaaf                 2638.076157       1896  sebvettelnews   
1881        Bledars53                 1780.614144         92          f1_fr   
1882     GuillaumeHl2                  323.730475         55                  
1883      LH_Sena1007                 1276.116192        252         F1Gate   
1884        inigo_sim                  389.397338        166  sebvettelnews   

      rtfollowers        rtage  \
0              84  3257.574421   
1          125012  2848.625903   
2         4876011  4056.905718   
3          125012  2848.625903   
4               0     0.000000   
...           ...          ...   
1880       125018  2848.625903   
1881         3294  2483.938970   
1882            0     0.000000   
1883        67590  4106.454757   
1884       125018  2848.625903   

                                                   text  
0                                RT @F1_Profesor: AMOR.  
1     RT @sebvettelnews: Sebastian celebrating his v...  
2     RT @F1: Faster hands than Bruce Lee, this guy ...  
3     RT @sebvettelnews: Sebastian celebrating his v...  
4     @StonedDoomguy Primeiro projeto: minha #F1, mi...  
...                                                 ...  
1880  RT @sebvettelnews: #OnThisDay in 2009\n\nSebas...  
1881  RT @f1_fr: Honda quitte la F1, mais prolonge e...  
1882  Je suis tellement triste que Nico Hulkenberg a...  
1883  RT @F1Gate: メルセデスF1代表 「ハミルトンはペナルティを避けなければならない」...  
1884  RT @sebvettelnews: #OnThisDay in 2009\n\nSebas...  

[1885 rows x 8 columns]

We can easily find out the most-retweeted IDs in the DataFrame.

tweets['retweet_of'].value_counts()

I guess not surprisingly, Formula 1’s official F1 Twitter page has the most retweets.

                 521
F1               399
F1Gate           232
LegendarysF1      34
Formula1arg1      27
                ... 
F1MonkeySeat       1
mint15fam          1
Investidea1        1
F1Daviderusso      1
AsistAnaliz        1

Now we’ll extract all of the information we’ll need to use NetworkX to create a directed or undirected graph that we can visualize in Gephi. We will be using this information to create a Graph or Network. of who’s retweeting whom, keeping track of the age in days and the number of followers that each user has so we can filter on those factors if we like.

Graph has two main elements, nodes and edges, lines that connect two nodes. The possibility of finding one node by following edges or paths is what makes Graph so powerful to represent different networks. Graph can also be classified as directed or undirected. Directed is when the edges have a specific orientation, normally represented by an arrow to indicate direction, and undirected is when the edges don’t follow any orientation.

In my analysis here, users represent the nodes. If there is any sort of interaction between them retweets, replies, or mentions, an edge will be created to connect the nodes. We can work with directed graph if we are interested in which user retweets another user. If we only care about the interaction present without the orientation then we can use the undirected graph. I decided to go with directed graph to see which users are retweeting who.

We will use NetworkX, which is a Python library, for the creation and study of the structure of complex networks, such as a social network. We initialize the Graph by calling the function .DiGraph() of NetworkX.

import networkx as nx
 
# Create a new directed graph
G = nx.DiGraph()

So I write code to iterate through the data we had pulled into the DataFrame earlier, row by row, and construct a directed graph of who’s retweeting whom. Each directed edge represented the relationship “is retweeted by”, the higher the weight of an edge, the more person B is getting retweeted by person A. Each node represents an individual ID on Twitter, and has attributes to track the number of followers and the age of the ID in days.

for index, row in tweets.iterrows():# Gather the data out of the row
    this_user_id = row['author']
    author = row['retweet_of']
    followers = row['followers']
    age = row['age']
    rtfollowers = row['rtfollowers']
    rtage = row['rtage']....

We can now check the number of nodes and edges of the Graph created.

There are 1532 nodes and 1165 edges present in the Graph

The last thing I did was to save out a GRAPHML file we can then read into Gephi . Start Gephi up, and open our file.

nx.write_graphml(G, 'f1tweets1.graphml')

First, we import the graphml file into Gephi and choose directed. After we successfully import the graph, this is how it looks.

You can now look at the information it contains by clicking on the “Data Laboratory” button at the top.

Then we can click on “Overview” button. Initlaly the data looks messy. Now, we run a visualization on our data. From the “Layout” section I choose “ForceAtlas″ as it’s fast and good at showing relationships in a network.

There are several clusters in the network, and we can also see the nodes and edges in each group. However, we still cannot see which nodes ar eat the center of the bigger clusters and with whom they interact with. The next step I see the name of each nodes and give it the color to see it more clearly.

Now I look at network centrality which captures the importance of a node’s position in the network. I also go ahead and find the Betweenness Centrality which shows the “strength” or “influence” of a node in social networks. We can see that there are different sizes and the edge also has different level of thickness.

The big nodes have high controls collaboration between, disparate clusters in a network We cannot see which user that belongs to the nodes and how strong they interact with each other. We implement Modularity reports to the graph. The colors in Modularity Reports indicate that different communities determined by this algorithm and basically it will show which users are being retweeted and are more densely connected between each other than to the rest of the network. I also added labels to them so we can see which users they are.

In the graph above, we see that, not surprisingly, F1, Formula 1’s official Twitter page, is the center of one of the major node clusters/communities (in pink).

We can see that there are 424 communities and the users retweeted the most are F1 and F1Gate, which is a Formula 1 news site.

The group with pink as the nodes color is 20.1% of the total data. Members of this group are people who intensely retweet with Formula 1’s Twitter account. The group with green as the nodes color is 12.08% of the total data. Members of this group are people who intensely retweet with F1-Gate’s Twitter account. The other groups are much smaller compared to those two accounts in terms of those being retweeted with the hashtag “f1”.

In the end with social network analysis we can learn so much about what goes on in social media, how users interact and what sort of interactions are going on. I feel like this is a very powerful tool for analysing social media data and this data can be used to shape the way users interact and how they interact on certain topics and certain users. This can be beneficial especially when promoting something or building a brand.

Analysis Of Twitter Social Network

Tools

Written by Pratik Parija