Why Every AI Company Is Suddenly Obsessed With Reddit

The 29-year-old company’s data is worth hundreds of millions

Thomas Smith
The Generator

--

Illustration by the author via Dall-E

AI companies are tripping over themselves to form content partnerships with a seemingly bizarre company: Reddit.

In February, Google announced a $60 million per year agreement with Reddit that allows Google to train its AI systems on Reddit’s data.

Last week, OpenAI announced a similar — and no doubt similarly lucrative — agreement.

Why are the world’s biggest and most powerful AI companies obsessed with an antiquated forum site that most traditional users regard as a biased and snark-filled cesspool?

It all comes down to how today’s large language models are trained. By a coincidence of how the site is set up, Reddit happens to generate the perfect training data for LLMs.

And the 29-year-old company is riding this wave of interest into a sea of enormous profitability — at a major cost to users.

Tell Me What You Really Think

To continue advancing, Large Language Models (LLMs) like OpenAI’s ChatGPT and Google’s Gemini need to continue ingesting copious amounts of written language.

--

--