Reddit’s lucrative sell out Part 1: ‘That AI is going to want to die’

Dominique Carlon
5 min readFeb 20, 2024

--

News is breaking that Reddit may have just signed a lucrative contract allowing an unnamed AI company to train its AI models on the platform's previously protected user content. This comes as a betrayal to Redditors who are still reeling from the loss of third-party apps following Reddit’s changes to the API fee structure in 2023; changes allegedly made as a means to protect valuable user generated content from AI company extraction. Redditors, already in a bitter feud with Reddit’s CEO Steve Huffman (see Burning the Hooks Part 1 to find out why) have responded in classic ‘Reddit style’: with dramatic plans of sabotage and a tinge of ironic self-reflection.

As a user named u/luvast0 puts it: ‘That AI is going to want to die after analyzing all of reddit’s content’.

screenshot of a post by user name luvast0 stating ‘That AI is going to want to die after analyzing all of reddits content’. The post has been upvoted 9000 times
Post by u/luvasto0 in r/technology

Anthropomorphism aside, this is undeniably true even before any deliberate sabotage. In reflecting on what AI trained upon Reddit data would look like, user u/fingerthato writes ‘it’s only a matter of time before chatgpt starts talking like Andrew tate and starts calling me weak beta male’. Another user explained that Redditors don’t actually need to corrupt any data because if trained upon Reddit, AI — like Redditors - will also be able to find the culprits of the Boston Marathon bombing (a dig at Reddit infamous crowdsourcing and vigilante failures).

At the best of times, Reddit content may not make ideal training data for LLMs and AI and this is not just because of the obvious not safe for work (nsfw) and misogynistic, racist, and other forms of toxic subreddit content that immediately come to mind; it also lies in the fact that a substantial amount of Reddit content (made by humans and bots) is simply nonsensical outside of its context. Sometimes it is nonsensical within its context. But envision for a moment what Reddit might look like if Redditors deliberately sought to make content as unintelligible and unreliable as possible. Add to this the very real possibility that this is the exact type of thing that Redditors might actually do; after all, it is not only ChatGPT that Redditors like to jailbreak. A user named u/AandWKyle predicts that soon there will be subreddits dedicated to corrupting and breaking AIs learning models, and that people will log into reddit just to visit those sites.

Reddit post by user AandWKyle in r/technology stating: Soone there’s going to be subreddit dedicated to fucking up AI learning models
U/AandWKyle posting in r/technology predicting future AI jailbreaking

A glimpse into r/technology (a subreddit with over 15 million subscribers) gives us a hint of what this type of sabotage, to both Reddit and AI, might look like. A post about the report of Reddit selling out its data received over over 3000 comments within hours of the news story breaking. Beyond the jokes about what this means for the ‘intellect’ and ‘character’ of AI, a significant number of posts were dedicated to schemes on how to devalue Reddit content, with automated and manual propositions of doing so.

Some users began spruiking nonsensical and ‘invented facts’ as a way to ‘dumb down’ the data, saying for instance that ‘sharks can swim backwards’ and William Shakespeare’s plays were masterminded by Francis Bacon’ and so forth. Others started using warped typeface and characters, speculating how AI text models would manage indecipherable and fused fonts (see below) and how these could be developed further. Naturally, this led to discussion about how to distort all text content on Reddit.

Character and font experimentation in r/technology

The general sentiment is that Redditors didn’t lose their third-party apps just for Reddit to profit from user data. Users were quick to speculate that this was clearly Reddit’s plan all along, and that the sudden and secretive way of going about this was to prevent Redditors from taking measures to sabotage the plan. Some users (below) suggested it was a proactive decision so that Redditors couldn’t make their bots go ‘rogue’ or create their own bots to mine content to train their own AI models.

Speculation of motivations in r/technology

Other users pondered whether the unnamed AI company was aware of the eclectic bot content on the platform, including Reddit’s beloved r/SubredditSimulator, a space where bots that have been trained upon subreddit communities come together to communicate with one another in a completely artificially constructed, but believable existence. One user proposed that r/SubredditSimulator needs to be protected and they should make clones of it ‘just in case’.

Commentary about r/SubredditSimulator in r/technology

So why do Redditors seek to protect their own smart bots, but destroy the validity of others? This is not an isolated attitude on Reddit, with the practice of creating, testing, and breaking external generative AI becoming a type of pastime on the platform. The rationale behind this is worthy of close consideration within Reddit’s history and distinct culture.

Part (but not all) of Redditors strong objection to their data being used by external AI companies, arises from platform politics and Redditors ongoing grudge and resentment towards Steve Huffman for ignoring user and moderator demands in the 2023 blackouts. Calling moderators the ‘landed gentry’ and dismissing the communities demands to save third-party apps has positioned Reddit admins in a cumbersome situation where it is risky to take any action at all.

Redditors have shown extreme offence to, what at this stage, are still unconfirmed reports that Reddit has sold out. These reports have emerged amidst speculation of Reddit’s plans to finally launch its initial public offering (IPO) in March this year, three years after making preliminary moves in preparation. The details of either of these reports remain to be confirmed, however the response by Redditors give a very clear indication of the type of challenges Reddit may face in generating income from Reddit data. Redditors ‘play seriously’ and are not apathetic in response to change which could have serious ramifications for the future of the platform as well as the future of AI. While Redditors play, we also need to examine what they are doing seriously.

--

--