Crossmod: A Cross-Community Learning-based System to Assist Reddit Moderators

Published in

ACM CSCW

4 min readOct 31, 2019

This blog post summarizes our new CSCW systems paper introducing an AI-based moderation system for Reddit. This paper will be presented at the 22nd ACM Conference on Computer-Supported Cooperative Work and Social Computing (CSCW) in Austin, Texas.

To keep up with the volume of content created by users, Internet platforms like Facebook, YouTube, and Twitter are known to train machine learning (ML) algorithms by compiling large datasets of past moderation decisions on the platform. Deploying these algorithms without any human oversight can be detrimental; for example, Tumblr “caused chaos” recently when it launched a new, unsupervised anti-porn algorithm on the site. Nonetheless, ML approaches can be especially helpful for algorithmically triaging comments for human moderators to review. However ML-based approaches face drawbacks that prevent them from being easily deployed — scarcity of labeled ground truth data, and the contextual nature of moderation. We aim to overcome the problems above by embracing a sociotechnical partnership with mods, who understand their community’s norms.

Introducing CrossModerator

We built a new, open-source, AI-based moderation system for assisting moderators of communities on Reddit. We call this system the CrossModerator or Crossmod. Specifically, we adopted a mixed-initiative approach in Crossmod, allowing moderators of subreddits to augment the automatic predictions obtained from cross-community learning with human decisions and oversight. Figure 1 describes the overall system pipeline.

**Figure 1:** Flowchart depicting Crossmod’s system pipeline. Crossmod makes its moderation decisions by obtaining predictions from an ensemble of cross-community learning-based classifiers. Crossmod wraps this back-end in a sociotechnical architecture that fits into Reddit’s existing Moderation Interface. Our system design allows moderators to easily configure Crossmod using simple conditional statements, and tailor its actions to suit community-specific needs.

Formative interviews: We conducted a formative interview study with 11 mods from 10 different subreddits to understand the current state of automated moderation tools on Reddit (e.g., AutoModerator), as well as opportunities for extending those tools. We also worked closely and iteratively with these moderators through all stages of building Crossmod. Through our interviews, we found that mods needed tools that adapt and learn. One of the moderators we interviewed said:

“I just need a smarter Automod. Automod is great because it can act on regular expressions. It can ban (spam) bots and report problems. It’s a very strong tool, but it’s a very simple tool. A machine learning model that can learn from past mod actions and remove content would be powerful, especially if it can do what a properly socialized and culturalized moderator can.” — P4

**Figure 2:** Broad, illustrative overview of how Crossmod works

System design: Developed with iterative, participatory methods, Crossmod is a ML based moderation system that is freely available and open source. The ML back-end for Crossmod leverages cross-community learning; specifically, it uses classifiers trained on the moderation decisions from 100 other communities over roughly a year. For example, Crossmod’s ML-backend provides counterfactual estimates about what 100 communities would do with new content, as well as whether that content resembles racism, homophobia, or other types of abuse. Driven by our formative interviews, Crossmod wraps this backend in a sociotechnical architecture that fits into existing moderator workflows and practices. Figure 2 depicts an overview of how Crossmod works.

Summative evaluation: Finally, we deploy Crossmod in a controlled environment, simulating real-time conversations from two large subreddits with over 10M subscribers each — r/science and r/Futurology. Two moderators from each subreddit evaluated Crossmod’s moderation recommendations by manually reviewing comments scored by Crossmod that are drawn randomly from existing threads in their own subreddit.

Moderators reported that they would have removed 648 (95.3%) of the 680 comments surfaced by Crossmod; however, 637 (98.3%) of these comments were still online at the time of this writing. In other words, moderators reported that those comments should have been removed, but that the current sociotechnical moderation architecture failed to help them do so.

Towards real-time deployment

Our goal is to finally push Crossmod into production across Reddit. As Chad Birch, the creator of AutoModerator (or Automod), said:

“A lot of moderators are quite disappointed in how few moderators tools there are. So when something new comes out, they’re pretty quick to adopt that.”

We are currently in conversations with moderators from several subreddits, including r/Futurology, about deploying our system real-time on Reddit. As the next step, we plan to deploy Crossmod as a real-time reporting tool to triage norm violating comments for further moderator review. We hope that by releasing Crossmod publicly, Crossmod can be adopted by moderators and researchers going forward.

For more details about Crossmod, please check out our full paper. For questions and comments about the work, please drop me an email at eshwar3 [at] gatech [dot] edu.

Citation: Eshwar Chandrasekharan, Chaitrali Gandhi, Matthew Wortley Mustelier, and Eric Gilbert. 2019. Crossmod: A Cross-Community Learning-based System to Assist Reddit Moderators. Proc. ACM Hum.-Comput. Interact. 3, CSCW, Article 174 (November 2019), 30 pages. https://doi.org/10.1145/3359276

Crossmod: A Cross-Community Learning-based System to Assist Reddit Moderators

Introducing CrossModerator

Towards real-time deployment

Written by Eshwar Chandrasekharan