Assessing the Importance of Nodes in a Twitter Network

Michael Kelley
INST414: Data Science Techniques
4 min readFeb 25, 2022

Using Twitter’s developer portal, I was able to connect my account to create an app which could then be used to allow me access to Twitter’s API to collect data. From this, I pulled data from my Twitter account’s connections with followers to conduct an analysis of how my account is connected to others, and how those connected to me are connected to even more accounts. I did this in the hopes of analyzing the social network created by all of these links (followers).

One insight I hope to extract from this data is the extent to which I am socially connected to friends, and how this might influence the content that I see on Twitter. A practical decision that might be influenced by this insight is to what extent I will actively connect my account to others in the form of following, and how I may try to curate my Twitter feed based on the default posts I’m seeing with my existing social network.

The primary source of my network data is Twitter. Specifically, my own Twitter account was used as the central subject of my analysis. The accounts that come up in the graphs produced are my own, and those connected to me either through a direct follow. The nodes in my network represent individual Twitter accounts. The edges in the network represent the connection established when one Twitter account follows another.

The structure of this graph is a relatively simple one. My Twitter account is shown with visibly represented connections between mine, and accounts that follow me. Importance in this graph is a relatively loose concept, with the most important nodes being the most prominently connected. As this graph is relatively small in scale, the most important node would be the one representing my own account, as it provides the most insight into how I am connected to others through social media. The other nodes are all of equal importance in this version of the graph.

To extract data from Twitter’s API and create the graphs I used in this analysis, I used Jupyter Notebook via Anaconda. This was my interface for directly writing the code that retrieved all necessary data and generating the appropriate graphs with it. I also connected to Twitter’s developer portal in order to gain the required credentials to access data from within their API.

The main examples of bugs that I encountered throughout this process were programming errors resulting from my attempts to access API attributes without properly importing them first, and attempting to access attributes by the wrong name. These were simple mistakes resulting from my relative inexperience with this particular software. I corrected these mistakes my taking the time to learn more about how to work with Twitter’s API and the proper way to call the attributes I needed in order to perform my analysis. Additionally, there were some features that I simply did not have access to with the essential version of Tweepy. I was able to quickly apply for higher access so that I could make slightly more use of the application. Finally, I had trouble with the random_state_index of NetworkX when creating my graph. This was a version-specific error with the software that was amended by upgrading to a newer version.

The preceding image is the graph generated by my data. From this, I can see the extent to which I am directly connected to other users via Twitter. I can see that there are currently 17 accounts following my own account, and that this can potentially influence the suggested content that I encounter.

Some limitations of this data are that I only observed the immediate connections to myself. In the future, I could gain a better insight by looking not only at my own connections, by at the connections of my followers as well. This would give me a much greater understanding of the network in which I have immersed myself on Twitter, and how many different people could potentially be influencing my online social experience.

Overall, while my network analysis is fairly small in scale and not very ambitious on its own, it can serve as a model for further, future analysis. Using similar methods, I could generate graphs to analyze a much wider network and obtain data on a much greater quantity of users. The nodes in this sort of analysis would have greater variety in importance, and we could determine who the biggest “influencers” in any given group of people on social media are.

--

--