A Visual Way of Peering into r/wallstreetbets’ Recent Chaos

Malcolm Wright
The Startup
Published in
5 min readFeb 12, 2021
Figure 1: Top noun phrases from the Top 100 posts on r/wallstreetbets. Created using the WordCloud Package with Python

Introduction

Like most trends these days, they leave as they come, which would seem to be the case with the Gamestop and Reddit debacle. Despite being five years too late to this topic, I’d like to add some value to the conversation by looking at the impact of the new traffic on the subreddit. As well as apply a tool that could pick out the most talked-about subjects on r/wallstreetbets, to catch the next stock trend early.

Analysing the Traffic Surge

New Traffic Drove Up Those Upvotes

To begin, I wanted to look at the top 100 posts from the subreddit. To see how the recent news had affected the most upvoted posts. One surprising note from the graph below was the lack of any top post before the 27th of January when the media appeared. Another note was the activity of the top posts. Which were on the 29th of January, a peak of the Game Stop stock price, and the 2nd — 3rd of February when the price dropped to around $82.

Figure 2: Graph showing Upvotes against time (Left), Graph showing Comments against time (Right)

I wanted to compare this with the most posts I could get, 1000, due to the API being used. This is to determine how this new traffic affected the average behaviour of top posts on the subreddit.

Bigger Audiences Lead to Bigger Titles

One striking observation between the top 100 and the top 1000, is the number of noun phrases found in the title. Noun phrases generally contain the subject matter of the post. Two or three noun phrases was enough to get a top post, though, in the past couple of weeks, posts with 7+ noun phrases were getting the upvotes as well.

Figure 3: Noun Phrases in both graphs, Top 100 (Left), Top 1000 (Right)

Larger titles preserve more of the message to entice more upvotes. The histograms below show a noticeable difference in the distribution.

Figure 4: Histogram distributions of Noun Phrases, Top 100 (Left), Top 1000 (Right)

Negative Posts Don’t Boost the Likes

To end this section, I’ll shift gears and look at the sentiment of the top posts.

To determine how positive or negative some text is, we’ll need to use a sentiment analyser. For this, I used TextBlob’s pre-built function. Afterwards, to label the different types of sentiment, I used K-Means Clustering.

Looking at both diagrams below, one thought speaks to my mind. The small number of negative posts. Both diagrams show a dominance of both neutral and positive posts, with similar distributions. Figure six shows an improved number of comments on the neutral and positive posts.

Figure 5: Two K-Means Cluster Graphs, the legend represents the mean of each cluster. Top 100 (Left), Top 1000 (Right)
Figure 6: Sentiment Vs. Comments, Top 100 (Left), Top 1000 (Right).

A final note on the sentiment analysis above. Three factors could explain the distributions.

The tool may flag posts as neutral without understanding the context of certain words used on the subreddit.

Figure 7: Histogram of the Top 1000 Sentiment Scores

A considerable amount of posts feature a piece of media, such as a picture or a video. Which leads to less detailed titles, leaving the context to the media.

There are a noticeable number of flared posts which bear a neutral title, but cause a lot of engagement. For example, a trading update from a user.

The r/wallstreetbets Word Cloud Generator

Let’s explore this tool that can peek into the most-talked-about subjects.

To do this, I added all the noun phrases from each title and ran them through the WordCloudGenerator library. It’s easy to put in place while being interpretable compared to other tools. I adapted the code from this article.

Below, is the word cloud generator object initialised to produce the graphs.

wordcloud = wordcloud.WordCloud(width = 1200, height = 700,
background_color =’white’,
colormap = ‘Oranges’,
max_words = 200,
stopwords = stopwords,
min_font_size = 8)
wordcloud.generate(all_keywords)

Top 100

From the word cloud below, the only mentions were Gamestop and the related players.

Top 100 Word Cloud generated by the author using Reddit data

Newest 100

In the case of the newest 100 posts on the subreddit (11th of February 2021), it produced a different scenario. Cannabis stocks took centre stage, and the discussion shifted.

New 100 Word Cloud generated by the author using Reddit Data

Hottest 100

Hot on Reddit is trending posts of a subreddit based on time and upvotes. With the hottest 100 posts, more Cannabis shares are being mentioned in the titles.

Hot 100 Word Cloud generated by the author using Reddit data

Conclusion

I expected some of the results earlier. Though a couple stood out, one definitely is the lack of a boost for negative posts. Especially, with social media companies allegedly promoting negative posts to drive engagement. Meaning, this subreddit won't fit into that mould.

I find the word cloud a simple tool to visualise something insightful from a subreddit with a lot of info. I hope you too found it insightful.

--

--

Malcolm Wright
The Startup

A 20 something year old, discovering business insights on the continent.