A Content Moderation scheme for Facebook and other Social Media

Anmol Parande
Mar 13 · 4 min read

In recent years, Facebook has come under fire for their handling of harmful content posted on their platform. The content in question includes fake news, credible threats of violence, terrorism, and even genocide. According to the New York Times, Facebook’s methods for dealing with the onslaught of harmful posts have been sub-optimal. They consists of thousands of contractors sifting through posts using a large set of guidelines spread across powerpoint slides and Excel spreadsheets. These documents aim to reduce removing a post to a yes/no question. This system has produced numerous errors, sometimes accidentally enabling the very practices it seeks to prohibit. Among other factors, this has come about because of the difficult categorization hate speech, especially considering that many of the contractors working do not understand the language, politics, and culture of the citizens whose posts they aim to filter. Given these complexities, the best way for Facebook and other social media companies to handle content moderation is to step away from controlling speech and focus on preventing violence.

The basic premise for scaling down content moderation is that no matter what policies a company implements, it is inevitable that the employees will make errors. The most egregious of these errors are those that cause actual harm to people (i.e allowing an extremist group in Myanmar to organize and execute genocidal campaigns). These errors are a product of sheer volume. The content moderators that Facebook and other companies contract often have seconds to decide whether or not a post should be taken down or not because there are simply too many posts to consider each one carefully. Accuracy is sacrificed for speed. Moreover, having to recall complex rules in such a short time-span is extremely difficult, not to mention that sometimes the rules themselves are incorrect. The only way to allow more thoughtful consideration of each post is to reduce the volume of content that reviewers must sift through. The most effective method of accomplishing this might look like a tiered system, with each layer discarding posts from consideration until the ones left are those which directly call for or create violence.

The first tier of the system would necessarily be algorithmic. Sentiment analysis techniques have proved quite capable of detecting negative sentiments. Non-deep-learning models have been able to achieve near 80% accuracy on smaller datasets. Although 80% accuracy is a dismal figure when considering the scale at which Facebook operates, it is important to consider that with Facebook’s state-of-the-art deep-learning techniques and enormous access to data, the 80% figure can easily be raised to acceptable rates. In any case, the purpose of the algorithm is only to allow posts expressing negative sentiments to the next stage of review. Facebook should only care about the negative sentiments because any post organizing or advocating violence by nature will be using words with negative connotations. It is important to recognize that this model would not be flagging posts praising terrorist organizations or violent individuals. While this is a valid concern, it is not Facebooks’s job to be the global arbiter of which groups people support. In fact, Facebook’s current attempts to control discourse about the groups they have deemed “hate groups” have only resulted in inadvertent political meddling and suppression of legitimate speech as evidenced by their censorship of political parties during the Pakistani election cycle.

After the posts expressing negative sentiments have been collected, they can be put through human review. However, when reviewing these posts, humans should be looking for criterion specifically regarding violence, not the complex and nebulous ruleset which Facebook currently has its content moderators following. It can be a difficult problem to discern whether or not a post expresses admiration or support for fringe organizations. While it may be easy for short posts written in the content moderator’s language, for longer posts or those written in foreign language, mistakes are bound to be made. The downside of these mistakes is that the censored speech is mainstream in its country of origin, but not in Western thought. Accordingly, the simplest solution is to not attempt to censor these types of posts. They cause no immediate harm or violence. Instead, the focus should be on removing posts of threatening and discriminatory nature. These posts cause psychological and sometimes physical harm, and it is easier for humans to distinguish threats than it is to categorize generic “hate speech.”

Of course, every system needs fail-safes, and for the tiered structure, the fail-safe is to cascade posts through the system. For example, seemingly benign posts which are filtered out by the machine learning model should be put through other models or even a cursory human review system. Likewise, if there are three stages of human review, posts allowed to stay on the site at each stage should be reviewed by a different team at least once for verification. As a whole, this approach minimizes the number of posts looked at because it has narrowed down the search criterion to threats of violence and discrimination, disregarding those which fall in the nebulous category of hate speech. It also makes room for multiple people to review posts for potential “false positives” (i.e posts which are deemed non-threats but actually are threats). In this way, Facebook’s content moderators can filter out the posts that cause the most harm while reducing the amount of accidental censorship.