Insights on the ChangeMyView Subreddit
Grace O’Neill
Reddit is a popular online community that encourages people to actively engage in conversation on a wide variety of topics. On this website, subreddits provide channels for users to narrow their interests to a specific topic and a group of people that might share a common interest. The subreddit “ChangeMyView” is one such place. In this particular forum, members are encouraged to share their perspective on a topic of their choosing, in hopes that a commenter might challenge their view.
I was looking to gather data from the aforementioned subreddit, ChangeMyView. I was hoping to gain insight about which members are most active in comments and most prolific in engaging with others. This data would be useful to the moderators of the community, as it would identify the main players in the forum. The community relies heavily on engagement, and members would be interested in knowing who are the top contributors.
The more a user engaged with the community, the more important they were in the forum. In such a network, the accounts participating and engaging with the community are considered nodes, and edges are represented as interactions between two users.
In order to collect the data necessary to analyze this web-based network, a Google Collab Notebook was created to access the Reddit API. Once access was granted, 200 comment submissions were collected and paired with their corresponding reddit account. I was then able to browse the comments, and find the topics that were discussed to be quite diverse and fascinating. They ranged from recent celebrity news to serious discussions on racism and prejudice. The Python library Networkx was then used to study the network and create a diagram for visualization.
In order to make sense of the data, I applied a centrality metric using Networkx to find the few most-central accounts in the ChangeMyView subreddit. Using the function centrality_degree, the top accounts were collected and returned. The top 5 were as follows: DeltaBot, Economy-Phase8601, Andalib_Odulate, Hellomyyfriend, and NotADoctorAnymore. These were the accounts that had the highest degree centrality, a measure of its connections/edges, meaning their importance was highest.
Some bugs were encountered as data was gathered using Python. First, an error was encountered when the function reached values=”None.” I was unable to add nodes of value=”None” to the graph of my network, so I had to make the function skip over these. I also struggled with getting through all 200 submissions. This was not a bug, per se, it just took a great deal of time to load on my home network.
The main takeaway from this analysis is that while there were a few outstanding accounts that were most central in the forum, interactions were pretty equal for the most part. This was not too surprising, seeing as the subreddit was home to over 1 million members and frequent posters. Aside from the top member DeltaBot (whose high centrality score is due to its position as overseer of the subreddits delta system), centrality scores did not vary significantly.