Reddit data visualisation
For our second data assignment in UQR2215, “Developing Meaningful Indicators”, we were tasked to create a data visualisation chart from the Reddit’s database of top posts.
In this dataset, some information on the posts were given, such as the number of upvotes/downvotes, number of comments left on each post, as well as the Reddit score. While intuitively, the word “score” simply refers to the number of upvotes less the number of downvotes for a post, I have found other implications on the types of scores and formulas used for Reddit.
Reddit starts by calculating a simple `ups — downs` score which it puts on a logarithmic scale, reason being that a link with a score of 5000, for example, may not necessarily be much better than one with 4000, or say, 10 vs 30. This logarithmic scale reduces the impact of additional votes unless there’s significantly more votes for that post.
I have also found that for special rankings like hot posts, there are also a certain algorithm used for ranking —
… (it) still uses upvotes minus downvotes, but with a time factor which causes older stories to fall down the list. That time factor is 12.5 hours, and the hot score for the post is said to follow a logarithmic decay, which means that for one link to be higher than another, it must have 10 times the total score after 12.5 hours.
Since the Reddit dataset does not provide for any analysis of the formula to understand the algorithm used in the calculation of the Reddit scores (ie. just a single value in excel sheet), I decided to remove the category of Reddit scores before and turn to exploring other data provided to gain more insights on these posts.
To start off, it is perhaps worth thinking about how often we are truly engaged in the content that we are looking at, in our day-to-day social media usage. For popular social media interfaces like Instagram or Facebook, many users spend perhaps a few seconds looking at the picture (or other content), giving a like, before moving on. But for a platform populated with written articles and related threads, do we observe a similar phenomenon? How often do people rethink the content, and voluntarily express an opinion through a comment? Or do people also merely leave an upvote/downvote and move on too?
Hence, it would be interesting if we could use the Reddit dataset to find out if the number of upvotes or downvotes have any implications on level of engagement or amount of discussion on these posts. For this data visualisation chart, I decided to try a scatter plot diagram, to depict how the number of upvotes/downvotes affected the amount of discussion created by the post (via the number of comments left on the post).
I first started with the number of upvotes.
From the chart, it does seem that the greater the number of upvotes, the greater the number of comments left on the charts, which suggests a higher level of engagement in the content. This seems relatively in line with our intuition where a post with more votes receives a correspondingly higher probability of engagement/tendency to express an opinion about the content of the post.
However, it is interesting how the rate of increase in the number of comments with number of downvotes is much higher than that of the number of upvotes.
From this results, it seems to suggests that people tend to be more actively engaged in the content when they are upset or dissatisfied with the content discussed in the post, which may elicit stronger emotions and correspondingly, a greater inclination or desire to have a stake in the discussion to express their own personal opinion . In contrast to the previous scenario (for the same level of upvotes), in a state of agreement or positive feelings, there is often little need to express an opinion. This may inadvertently lead to a more passive reaction towards the post, or even a certain level of indifference to the post after mechanically clicking an upvote.
While many users focus on the number of “likes” on their post, perhaps it is time to look at the level of dislike, as an alternative measure of the amount of attention received.
https://www.reddit.com/r/TheoryOfReddit/comments/9fyk8n/downvotes_are_worth_counting/