A Visual Way of Peering into r/wallstreetbets’ Recent Chaos
Introduction
Like most trends these days, they leave as they come, which would seem to be the case with the Gamestop and Reddit debacle. Despite being five years too late to this topic, I’d like to add some value to the conversation by looking at the impact of the new traffic on the subreddit. As well as apply a tool that could pick out the most talked-about subjects on r/wallstreetbets, to catch the next stock trend early.
Analysing the Traffic Surge
New Traffic Drove Up Those Upvotes
To begin, I wanted to look at the top 100 posts from the subreddit. To see how the recent news had affected the most upvoted posts. One surprising note from the graph below was the lack of any top post before the 27th of January when the media appeared. Another note was the activity of the top posts. Which were on the 29th of January, a peak of the Game Stop stock price, and the 2nd — 3rd of February when the price dropped to around $82.
I wanted to compare this with the most posts I could get, 1000, due to the API being used. This is to determine how this new traffic affected the average behaviour of top posts on the subreddit.
Bigger Audiences Lead to Bigger Titles
One striking observation between the top 100 and the top 1000, is the number of noun phrases found in the title. Noun phrases generally contain the subject matter of the post. Two or three noun phrases was enough to get a top post, though, in the past couple of weeks, posts with 7+ noun phrases were getting the upvotes as well.
Larger titles preserve more of the message to entice more upvotes. The histograms below show a noticeable difference in the distribution.
Negative Posts Don’t Boost the Likes
To end this section, I’ll shift gears and look at the sentiment of the top posts.
To determine how positive or negative some text is, we’ll need to use a sentiment analyser. For this, I used TextBlob’s pre-built function. Afterwards, to label the different types of sentiment, I used K-Means Clustering.
Looking at both diagrams below, one thought speaks to my mind. The small number of negative posts. Both diagrams show a dominance of both neutral and positive posts, with similar distributions. Figure six shows an improved number of comments on the neutral and positive posts.
A final note on the sentiment analysis above. Three factors could explain the distributions.
The tool may flag posts as neutral without understanding the context of certain words used on the subreddit.
A considerable amount of posts feature a piece of media, such as a picture or a video. Which leads to less detailed titles, leaving the context to the media.
There are a noticeable number of flared posts which bear a neutral title, but cause a lot of engagement. For example, a trading update from a user.
The r/wallstreetbets Word Cloud Generator
Let’s explore this tool that can peek into the most-talked-about subjects.
To do this, I added all the noun phrases from each title and ran them through the WordCloudGenerator library. It’s easy to put in place while being interpretable compared to other tools. I adapted the code from this article.
Below, is the word cloud generator object initialised to produce the graphs.
wordcloud = wordcloud.WordCloud(width = 1200, height = 700,
background_color =’white’,
colormap = ‘Oranges’,
max_words = 200,
stopwords = stopwords,
min_font_size = 8)wordcloud.generate(all_keywords)
Top 100
From the word cloud below, the only mentions were Gamestop and the related players.
Newest 100
In the case of the newest 100 posts on the subreddit (11th of February 2021), it produced a different scenario. Cannabis stocks took centre stage, and the discussion shifted.
Hottest 100
Hot on Reddit is trending posts of a subreddit based on time and upvotes. With the hottest 100 posts, more Cannabis shares are being mentioned in the titles.
Conclusion
I expected some of the results earlier. Though a couple stood out, one definitely is the lack of a boost for negative posts. Especially, with social media companies allegedly promoting negative posts to drive engagement. Meaning, this subreddit won't fit into that mould.
I find the word cloud a simple tool to visualise something insightful from a subreddit with a lot of info. I hope you too found it insightful.
GitHub Repository: https://github.com/malcolmrite-dsi/WSB_Stock_Screener