NSFW Content Moderation in 2021 : Human vs AI

Published in

NanoNets

14 min readJan 11, 2019

The internet even with all it’s positives, can be a very dark and disturbing place. The shield of anonymity makes it easy for people to behave in a certain way that would otherwise be scoffed at by society. This article explores what’s the current state of ways to moderate offensive content and helps you make a more informed decision.

It’s 2021 and we’re uploading and consuming content faster than ever before. In 2020 for example, 1.2 trillion photos were taken and billions of them shared online — that’s an average of ~200 photos per person per year (assuming world population of 7 billion). Facebook itself has a staggering rate of 300 million photos being uploaded each day and an army of 7,500 moderators working to moderate this content.

Porn is everywhere

With such a huge spike, there’s also been a rapid increase in people uploading content of a questionable nature and frankly, we’re struggling to control it. The major social networks despite what you think are chocked with NSFW content like porn. It’s a cat & mouse game, where such content is filtered and users/hashtags/groups banned, but they keep coming back only smarter and tougher to catch. Here’s a screenshot of an article talking about obscure arabic hashtags being used to share porn on Instagram.

With a majority of our content consumption moving to mobile, Apple (App Store) and Google (Play Store) are gate-keepers to the content we’re viewing.

We all heard the recent issue of Apple removing Tumblr’s app off the App Store after finding child pornography, but it’s only one such example of a platform struggling to moderate content and getting penalised.

Tumblr might still survive and course correct; but there are multitudes of apps that have failed since it’s users left in hordes as it became a bastion of porn and other offensive content that the administrators were unable to control.

Why should you be worried? If you’re an app owner working with User-Generated-Content or UGC as it’s popularly known, you are potentially exposed to multiple risks such as reputational, economic or even major legal risks. (see: India throws Ebay chief into prison)

But first let’s understand what’s considered “offensive” to be able to moderate it better as it’s not as simple as you first might think it is.

Definition of “Offensive” Content (?)

Global context

Left: Shakira’s Oral Fixation original cover; Right: Same cover in the middle east

Operating globally and setting standards for content is tricky. Most companies try to impose the same rules across users coming from different demographies with varying cultural sensitivities. And that’s where they fail.

Companies expanding globally often fall into trouble with local administration if they haven’t taken into consideration their local culture. In July 2018, the Indonesian government banned TikTok, a popular short music video making app. Here’s an excerpt from a newspaper article reporting on the issue:

The ministry said that it banned the app because it contains negative videos that are deemed to be a bad influence on the youth….Public sentiment in Indonesia is turning against Tik Tok, which is popular among 13 to 15-year-olds, as it has clips of teens engaging in provocative behavior. One such video depicts a teen dancing. It then cuts to a dead body, apparently a relative of the teen.

Apart from nudity/porn, there are regional rules specific to:

religious hate speech inciting violence
fake news and spreading for a political agenda
defamatory language against an individual/organisation

The list can go on based on the region you dominantly operate in and the freedom of speech standards present in that geography. In order to get some sort of control in such situations, organizations are increasingly looking for solutions for the following use cases:

extract text from images & scanned files
extract text from PDF documents (aka PDF scrapers)
extract data from PDF or scanned files
convert or extract tables from PDF or images
extract text from PDF or other un-editable formats

Here’s an excerpt from the Wikipedia page for Orkut — the once popular social network :

In 2008 Google announced that Orkut would be fully managed and operated in Brazil, by Google Brazil, in the city of Belo Horizonte. This was decided due to the large Brazilian user base and growth of legal issues

Consider this, the entire operations of a US-based social network were shifted to another country to better adhere to it’s local laws.

What constitutes Nudity/Porn

Even the basic definition of what constitutes as “nudity” or “porn” is highly subjective and as arbitrary as the rules of society. Consider Instagram which allows “male nipples” but bans “female nipples”.

Some allow nudity to be shown in certain special cases.

Consider Tumblr which recently updated it’s content rules with some interesting exceptions:

Banned content includes photos, videos, and GIFs of human genitalia, female-presenting nipples, and any media involving sex acts, including illustrations. The exceptions include nude classical statues and political protests that feature nudity. The new guidelines exclude text, so erotica remains permitted. Illustrations and art that feature nudity are still okay — so long as sex acts aren’t depicted — and so are breast-feeding and after birth photos

Let’s see the content guidelines for other major social networks:

I hope I’ve made my point about it being really tricky to create standards for content because of their subjective nature.

So let’s assume that you’ve created a broad first set of rules that work for your application. The next step is to either employ human moderators, rely on your community to “report” such content or use AI to detect them or typically a mix of all 3.

Using Human moderators

The key questions you need to answer while employing human moderators are :

How much does it cost? What’s the throughput and response time? How do they typically evaluate video? What will the flow look like? How do you define clear-cut standards to reduce subjectivity especially on edge cases?

We went ahead and reached out to 7 moderator outsourcing agencies and got back vague(ish?) responses from 4 of them. They are typically BPOs armed with hundreds of data-entry contractors based out of a low-wage developing economy. You can find their responses here.

Taskus
Scale.ai
Webpurify
Foiwe
Olapic
Assivo
UGC Moderators

Cost :

The price responses we received.

UGC moderators is the cheapest option out of the 3 for images costing $0.01/image.

2. Turn-around Time : Webpurify mentions a turnaround time of <2 mins. Everyone else is open-ended about it. When dealing with high volumes, the service will have to maintain a big workforce of moderators to operate on near real-time basis which is imperative to some.

3. Videos: Webpurify also mentions doing videos at $0.15/minute.

Another provider, UGC Moderators are priced at $2/hour. Assuming they can check 5 1-min videos per minute, that’s ~$0.07/minute of video

Consider this for Youtube where 400 hours of video gets uploaded every minute. = 2400 minutes of video/minute. Multiply that by Total number of minutes in a year (60 x 24 x 365) and that’s a staggering expense of ~$1.2 billion every year! Even putting in 50% consideration for volume discounts, ~$600 million.

The subjective nature of deciding what content is allowed to stay makes it important to have a certain number of human moderators in place. But as you can see they can turn very expensive very fast.

Trauma

An important thing to add is that the job is very disturbing and can cause trauma in the individuals doing it day in and out. An ex-content moderator sued Facebook, saying violent images caused her PTSD. A great documentary titled “The Moderators” that shows the life of some of these individuals :

Even Facebook with all it’s iron-clad arrangements is still exposed to a risk of legal procedures due to “inhumane” work practices. An excerpt from the same New York Times article:

“You’d go into work at 9 a.m. every morning, turn on your computer and watch someone have their head cut off,” a man who chose to remain anonymous but was quoted in the lawsuit told The Guardian last year. “Every day, every minute, that’s what you see. Heads being cut off.”

It’s a tough job.

Accuracy

Despite establishing clear guidelines, human moderators can still be prone to errors as they are expected to work fast in order to handle the high volume and meet their defined SLA. A moderator from an agency we spoke to in India is expected to moderate 10-15 <1 minute videos per minute by quickly skimming through them.

They struggle especially on edge cases and end up committing a lot of false positives i.e calling something porn which isn’t. This can end up hindering freedom of speech that some of these platforms stand for and users can revolt because of the double standards.

To summarise, human moderators are:

Unavoidable, as of now, due to the subjective nature of the content
Expensive, especially as you scale
Prone to trauma
Prone to errors, especially when volumes are high and on edge cases

So it becomes really important to track if your moderators are performing satisfactorily.

Metrics to track moderator performance

These are the metrics you should typically track to see how your individual moderators are performing, although you can adopt different metrics based on your business requirements. The metrics are inspired from Artificial Intelligence and stress on the two things that can hurt the most :

False Positives

Calling something “porn” which is “not porn”

False Negatives

Calling something “not porn” but is porn (hurts the most!)

Accuracy

No. of images correctly identified (Porn being porn, Safe being safe). A more of a health metric that you need to track to ensure you’re on-track.

Precision

No. of identified porn images actually being porn. The higher the better.

If you have a business where the freedom of speech/expression is critical (for example Reddit), you need to make sure the moderators don’t tag any image that’s abiding by the rules as “not safe”. Your most important metric then is Precision.

Recall

In the total porn images how many did they detect. The higher the better.

If you have a business where you need to cater to your audience, healthy family-viewing suitable content, you need to make sure any image that’s not following the rules doesn’t pass your filters. Your most important metric then is Recall.

F-1 Score

A more wholesome metric including both precision and recall. The higher the better.

If you need to be mid-line between both not hindering freedom of speech and enforcing strict rules, F1 score is your metric to track.

Here’s how how you calculate them:

Here’s a flowchart to help you understand the terminology better:

By reviewing a random % sample of each moderator’s daily work and setting benchmarks, you can keep a check on their performance.

Also we’ve noticed that tagging the sub-category of the discarded post (Gore, Suggestive Nudity, Explicit Nudity, Drugs, etc.) and tracking metrics within these categories is a lot more insightful in planning your future training programs.

Using Artificial Intelligence

There are multiple commercial API’s present in the market that detect NSFW content.

Using deep neural networks, these APIs provide ML-based image processing to moderate content on a user’s platform primarily detecting nudity, pornography (sexual acts) and gore. The key questions to answer while choosing an API are:

How much does it cost? What’s the response time? What metrics do you you use to evaluate their performance? What’s the setup & integration time?

We compared the following APIs :

AmazonClarifaiDeepAIGoogleMicrosoftNudedetectNanonetsPicpurifySightengine

Cost

This how much they cost per image:

Nanonets is priced the lowest at $0.0009/image followed by Amazon & Microsoft at $0.001/image.

Plotting this:

The average pricing per image comes out ~$0.001

Comparing this with the cheapest price for human moderators which is $0.01. Human moderators are 10x the price for AI API providers! Visualizing it through a graph:

Metrics

The metrics to evaluate remain the same as human moderators : Accuracy, Precision, Recall and F1. There’s a great article that gives a Comparison of the best NSFW Image Moderation APIs as of 2018 along these metrics.

Setup & Integration

Most of these APIs are web-hosted and easy-to-integrate.

They typically have a few of lines of code that you need to integrate and pass your image URL or bytes (raw file).

Nanonets provides an added advantage of generating a docker image for your model and hosting it on your server.

sudo nvidia-docker run -p 8081:8080 docker.nanonets.com/{{model_id}}:gpu

A sample line to code to run the model in a docker container.

Response time

Most APIs promise a response time of a 200–300 hundred milliseconds. This however does not include travel time between your servers and can also vary depending on the size of the image you are submitting. So you should probably want your provider to have a server in your region for fast response time or just use Nanonets’ docker service and deploy it on-premises.

Compare this with Webpurify’s Human Moderation service which promises a response time of <2 mins. It is 10x the response time compared to APIs!

To summarise this well, Machine learning based APIs compared to human moderators are:

Cheaper
Faster
Easier to scale
Machines don’t face trauma (!)

So in all, machines are definitely a lot more suited to the job than humans.

So why do we still need human moderators?

Well, the answer to that is that machines are still not well-suited to handle subjectivity and can easily be tricked.

Racial bias

Consider the following image:

You can see the original image here. WARNING: It’s explicit

We tried the above image with 2 of the services mentioned above:

Clarifai

Picpurify

So what happened here? The patterns and the see-through nature of the woman’s clothes confused the neural networks and they were unable to classify the image as NSFW or gave a completely different prediction.

The lack of training data of nude Japanese women in a traditional kimono can create this sort of a bias for these APIs, which are mostly based out of US and Europe and train their networks on mostly images of individuals of majority ethnicity in their region. So incase you have users apart from these regions and are uploading local porn (or other offensive content), most of the ready-to-use APIs might not be of too much help here.

2. Societal Context

As explored above, what’s okay in 1 region might be scoffed at in another. As most of the AI API providers are based out of western regions, they typically are not in-tune in more conservative parts of the world. So the question of what’s NSFW is very specific to you, your user demographic and the regions you operate in. Clearly a ready-to-use API is not the answer and hence the need for human moderators.

Ariana Grande’s cover art photoshopped to adhere to modesty laws in Iran and Saudi Arabia (source: Petapixel)

3. One size doesn’t fit all

Most API providers give a score of whether the image is acceptable, or additionally might tag it according to their pre-decided meta-tags. Amazon tags it’s images as follows:

Now you might have some of your own tags to create based on the niche you serve that fall in-between these categories. You don’t have an option to do that. Tagging (which is the backbone of recommendation) is the bread-and-butter of most social UGC apps today and if using any of the ready-to-use APIs, you’ll be stuck with the pre-determined tags.

How to reduce dependency on human moderators

Constantly re-training your models to identify the missing gaps is the way to reduce human dependency. Re-training basically means adding your specific NSFW dataset and training it “on top” of a pre-existing model. This way the model keeps getting better on identifying things that it previously missed.

So say for example there are images on your platform that are anti-semitic in nature and you wish to ban them to ensure a hate-free environment. Your chosen API provider doesn’t filter such images currently and you wish to create a dataset of these anti-semitic images which follow a typical pattern. You can create a dataset of these images and re-train on top of the pre-existing model so that it can start classifying them as “unsafe”.

But most API providers don’t let you do that or it’s included in their “Enterprise” tier.

Enter Nanonets

We at Nanonets realise this particular issue and have added the feature to add your own images and define your additional tags on top our Content moderation model so that you can improve overall accuracy for YOU.

Improve the **accuracy for you** of our hosted model on app.nanonets.com

Using transfer learning, we train a model that learns from your data and adapts to your needs.

Case Study: Largest Indian Social Network

Problem

We had India’s largest local social network with over 50M Monthly active users come to us with a very specific problem. Their chosen API provider was making errors when sent Indian images. The accuracy of their previous provider was ~72% on such images.

Why was their existing solution not working?

An ML model is only as good as the data it’s exposed to. Most of the current moderation models available have been trained on generic data. They thus fail on predicting on user-generated-content produced locally using low-quality cameras on budget smartphones in rural India.

These images are very different in content, skin color, camera etc than the publicly available images one would find on the search engine of your choice or any publicly available dataset.

Solution:

We asked the business about their required sensitivity levels for the user demographic they serve and about 10,000 images — both positive and negative samples.

We used this data to train new model on top our pre-existing model. This allowed us to fine tune the sensitivity and expose the model to content specific to their platform.

Results:

We now had a model whose accuracy improved by over 23% and jumped to ~95%! The entire exercise end-to-end from defining the problem statement to sharing the data and then finally delivering a model took <1 month.

accuracy improved by over 23% and jumped to ~95%!

The ability to tune our models for specific demographics and definitions of NSFW allow it to be much more powerful and adept at dealing with this problem.

NSFW Content Moderation in 2021 : Human vs AI

Porn is everywhere

Definition of “Offensive” Content (?)

Global context

What constitutes Nudity/Porn

Using Human moderators

Trauma

Accuracy

Metrics to track moderator performance

False Positives

False Negatives

Accuracy

Precision

Recall

F-1 Score

Using Artificial Intelligence

Cost

Metrics

Setup & Integration

Response time

So why do we still need human moderators?

Clarifai

Picpurify

How to reduce dependency on human moderators

Enter Nanonets

Case Study: Largest Indian Social Network

Problem

Why was their existing solution not working?

Solution:

Results:

Written by Arun Gandhi