Sitemap

Case Study: ML Hate Speech Moderation of Twitter (X)

3 min readDec 16, 2023

--

Industry Problem

The problem I am looking to solve is the prevalence of hate speech on Twitter (X). Hate speech can include discrimination, language that incites violence, and prejudice against groups of people (by religion, gender, sexual orientation, etc).

Why use ML/AI to solve hate speech

  • Scalable: given the millions of pieces of content generated on Twitter each day
  • Efficiency: It’s not realistic to use human moderation to review content. And ML/AI can detect harmful content before it gets audience impressions
  • Consistency: Human judgment, especially in the topic of hate speech can be incredibly inconsistent

Data Bias Risks

With over 500 Million Tweets sent each day, Twitter content is very likely to contain high degrees of bias. To mitigate bias in the data:

  • Contextual understanding to seem nuances of slang and codewords
  • Time-sensitive events triggering hate speech may not be included in the training data
  • Seek diverse group of people to create the training data labeling guidelines

Model Built Internally vs. Externally

Avoid Externally Training Platform

While faster to deploy, external platforms will be unable to scale to cover 500M tweets daily. This could also be extremely expensive on a metered plan based on the massive volume of data

Benefits of In-House model training

Building in-house models allows Twitter to maintain control over their data. And there is a potential cost-savings for computer processing over the long-run

Outsourced Training Data Annotation

There will be a lot of data labeling needed. Given the sensitive nature of protected tweets and viewing sensitive content, I would suggest partnering with a vendor who can provide a large, standardized workforce

Evaluating Model Results

I would want to look into a combination of Precision and Recall to evaluate the model performance.False Negatives indicate under-tagging of violating content. I would want the model to have a 95%+ recall to maintain performance. This stance assumes that hate speech content is very harmful.False Positives indicate a poor user experience because the hate speech model is over tagging for violations. A precision guardrail of 80%+ would be set because users have access support to appeals to recover their accounts

MVP Design

Deployment Plan

I would take a phased rollout approach to monitor the performance and mitigate the risks of widespread mistakes in moderating hate speech.I would prioritize English language since the development team would be most familiar and could spot check.Additionally, I would like to provide customer comms about HOW and WHY we use ML to moderate content and remediation paths available to appeal if the models make mistakes

  • Wave 1: 5% of Twitter accounts
  • Wave 2: 50% of Twitter accounts
  • Wave 3: 100% of Twitter accounts
  • Wave 4: Non-English Localization

Go-To-Market Plan

  • Pricing Strategy: This should be accessible to ALL users as part of the standard product experience expectations. It doesn’t make sense to offer exemption on hate-speech due to price paid by users.
  • Distribution Plan: See above in rollout plan
  • Value Proposition to Moderators: Provides a platform to quickly and at-scale review millions of tweets a day for hate speech violations
  • Value Proposition to Twitter Users: Twitter is the safest social media to share and receive up-to-date content

Designing for Longevity

In the long-term I would want to consider:

  • Adaptivity of model to change with emerging slang or satire
  • Consideration for reclaimed slurs and language
  • Quality of translation and localized cultural context
  • Adding additional context such as images, videos, links, and account information

Data labeling efforts must be maintained for the model to continue to learn and reflect the current state of the worldA/B testing can help understand whether or not new/updated models are effective in reducing the prevalence of hate speech

--

--

Anthony Yun
Anthony Yun

Written by Anthony Yun

Product Ops @ Meta | 7 years in Operations Management

No responses yet