Content Moderation in 2019 : Human vs AI

Arun Gandhi
Jan 11 · 14 min read

The internet even with all it’s positives, can be a very dark and disturbing place. The shield of anonymity makes it easy for people to behave in a certain way that would otherwise be scoffed at by society. This article explores what’s the current state of ways to moderate offensive content and helps you make a more informed decision.

It’s 2019 and we’re uploading and consuming content faster than ever before. In 2017 alone, 1.2 trillion photos were taken and billions of them shared online — that’s an average of ~200 photos per person per year (assuming world population of 7 billion). Facebook itself has a staggering rate of 300 million photos being uploaded each day and an army of 7,500 moderators working to moderate this content.

Source: BusinessInsider

Porn is everywhere

With such a huge spike, there’s also been a rapid increase in people uploading content of a questionable nature and frankly, we’re struggling to control it. The major social networks despite what you think are chocked with NSFW content like porn. It’s a cat & mouse game, where such content is filtered and users/hashtags/groups banned, but they keep coming back only smarter and tougher to catch. Here’s a screenshot of an article talking about obscure arabic hashtags being used to share porn on Instagram.

Source: Daily Express

With a majority of our content consumption moving to mobile, Apple (App Store) and Google (Play Store) are gate-keepers to the content we’re viewing.

We all heard the recent issue of Apple removing Tumblr’s app off the App Store after finding child pornography, but it’s only one such example of a platform struggling to moderate content and getting penalised.

Source: The Verge

Tumblr might still survive and course correct; but there are multitudes of apps that have failed since it’s users left in hordes as it became a bastion of porn and other offensive content that the administrators were unable to control.

Why should you be worried? If you’re an app owner working with User-Generated-Content or UGC as it’s popularly known, you are potentially exposed to multiple risks such as reputational, economic or even major legal risks. (see: India throws Ebay chief into prison)

But first let’s understand what’s considered “offensive” to be able to moderate it better as it’s not as simple as you first might think it is.

Definition of “Offensive” Content (?)

Global context

Left: Shakira’s Oral Fixation original cover; Right: Same cover in the middle east

Operating globally and setting standards for content is tricky. Most companies try to impose the same rules across users coming from different demographies with varying cultural sensitivities. And that’s where they fail.

Companies expanding globally often fall into trouble with local administration if they haven’t taken into consideration their local culture. In July 2018, the Indonesian government banned TikTok, a popular short music video making app. Here’s an excerpt from a newspaper article reporting on the issue:

The ministry said that it banned the app because it contains negative videos that are deemed to be a bad influence on the youth….Public sentiment in Indonesia is turning against Tik Tok, which is popular among 13 to 15-year-olds, as it has clips of teens engaging in provocative behavior. One such video depicts a teen dancing. It then cuts to a dead body, apparently a relative of the teen.

Apart from nudity/porn, there are regional rules specific to:

The list can go on based on the region you dominantly operate in and the freedom of speech standards present in that geography

Here’s an excerpt from the Wikipedia page for Orkut — the once popular social network :

In 2008 Google announced that Orkut would be fully managed and operated in Brazil, by Google Brazil, in the city of Belo Horizonte. This was decided due to the large Brazilian user base and growth of legal issues

Consider this, the entire operations of a US-based social network were shifted to another country to better adhere to it’s local laws.

What constitutes Nudity/Porn

Even the basic definition of what constitutes as “nudity” or “porn” is highly subjective and as arbitrary as the rules of society. Consider Instagram which allows “male nipples” but bans “female nipples”.

Some allow nudity to be shown in certain special cases.

Consider Tumblr which recently updated it’s content rules with some interesting exceptions:

Banned content includes photos, videos, and GIFs of human genitalia, female-presenting nipples, and any media involving sex acts, including illustrations. The exceptions include nude classical statues and political protests that feature nudity. The new guidelines exclude text, so erotica remains permitted. Illustrations and art that feature nudity are still okay — so long as sex acts aren’t depicted — and so are breast-feeding and after birth photos

Let’s see the content guidelines for other major social networks:

I hope I’ve made my point about it being really tricky to create standards for content because of their subjective nature.

So let’s assume that you’ve created a broad first set of rules that work for your application. The next step is to either employ human moderators, rely on your community to “report” such content or use AI to detect them or typically a mix of all 3.

Using Human moderators

The key questions you need to answer while employing human moderators are :

How much does it cost? What’s the throughput and response time? How do they typically evaluate video? What will the flow look like? How do you define clear-cut standards to reduce subjectivity especially on edge cases?

We went ahead and reached out to 7 moderator outsourcing agencies and got back vague(ish?) responses from 4 of them. They are typically BPOs armed with hundreds of data-entry contractors based out of a low-wage developing economy. You can find their responses here.

UGC Moderators

The price responses we received.

UGC moderators is the cheapest option out of the 3 for images costing $0.01/image.

2. Turn-around Time : Webpurify mentions a turnaround time of <2 mins. Everyone else is open-ended about it. When dealing with high volumes, the service will have to maintain a big workforce of moderators to operate on near real-time basis which is imperative to some.

3. Videos: Webpurify also mentions doing videos at $0.15/minute.

Another provider, UGC Moderators are priced at $2/hour. Assuming they can check 5 1-min videos per minute, that’s ~$0.07/minute of video

Consider this for Youtube where 400 hours of video gets uploaded every minute. = 2400 minutes of video/minute. Multiply that by Total number of minutes in a year (60 x 24 x 365) and that’s a staggering expense of ~$1.2 billion every year! Even putting in 50% consideration for volume discounts, ~$600 million.

The subjective nature of deciding what content is allowed to stay makes it important to have a certain number of human moderators in place. But as you can see they can turn very expensive very fast.


An important thing to add is that the job is very disturbing and can cause trauma in the individuals doing it day in and out. An ex-content moderator sued Facebook, saying violent images caused her PTSD. A great documentary titled “The Moderators” that shows the life of some of these individuals :

Even Facebook with all it’s iron-clad arrangements is still exposed to a risk of legal procedures due to “inhumane” work practices. An excerpt from the same New York Times article:

“You’d go into work at 9 a.m. every morning, turn on your computer and watch someone have their head cut off,” a man who chose to remain anonymous but was quoted in the lawsuit told The Guardian last year. “Every day, every minute, that’s what you see. Heads being cut off.”

It’s a tough job.


Despite establishing clear guidelines, human moderators can still be prone to errors as they are expected to work fast in order to handle the high volume and meet their defined SLA. A moderator from an agency we spoke to in India is expected to moderate 10-15 <1 minute videos per minute by quickly skimming through them.

They struggle especially on edge cases and end up committing a lot of false positives i.e calling something porn which isn’t. This can end up hindering freedom of speech that some of these platforms stand for and users can revolt because of the double standards.

Source : The Mic

To summarise, human moderators are:

So it becomes really important to track if your moderators are performing satisfactorily.

Metrics to track moderator performance

These are the metrics you should typically track to see how your individual moderators are performing, although you can adopt different metrics based on your business requirements. The metrics are inspired from Artificial Intelligence and stress on the two things that can hurt the most :

False Positives

Calling something “porn” which is “not porn”

False Negatives

Calling something “not porn” but is porn (hurts the most!)


No. of images correctly identified (Porn being porn, Safe being safe). A more of a health metric that you need to track to ensure you’re on-track.


No. of identified porn images actually being porn. The higher the better.

If you have a business where the freedom of speech/expression is critical (for example Reddit), you need to make sure the moderators don’t tag any image that’s abiding by the rules as “not safe”. Your most important metric then is Precision.


In the total porn images how many did they detect. The higher the better.

If you have a business where you need to cater to your audience, healthy family-viewing suitable content, you need to make sure any image that’s not following the rules doesn’t pass your filters. Your most important metric then is Recall.

F-1 Score

A more wholesome metric including both precision and recall. The higher the better.

If you need to be mid-line between both not hindering freedom of speech and enforcing strict rules, F1 score is your metric to track.

Here’s how how you calculate them:

Here’s a flowchart to help you understand the terminology better:

By reviewing a random % sample of each moderator’s daily work and setting benchmarks, you can keep a check on their performance.

Also we’ve noticed that tagging the sub-category of the discarded post (Gore, Suggestive Nudity, Explicit Nudity, Drugs, etc.) and tracking metrics within these categories is a lot more insightful in planning your future training programs.

Using Artificial Intelligence

There are multiple commercial API’s present in the market that detect NSFW content.

Using deep neural networks, these APIs provide machine learning as-a-service to moderate content on a users platform primarily detecting nudity, pornography (sexual acts) and gore. The key questions to answer while choosing an API are:

How much does it cost? What’s the response time? What metrics do you you use to evaluate their performance? What’s the setup & integration time?

We compared the following APIs :



This how much they cost per image:

Nanonets is priced the lowest at $0.0009/image followed by Amazon & Microsoft at $0.001/image.

Plotting this:

Pricing per API

The average pricing per image comes out ~$0.001

Comparing this with the cheapest price for human moderators which is $0.01. Human moderators are 10x the price for AI API providers! Visualizing it through a graph:


The metrics to evaluate remain the same as human moderators : Accuracy, Precision, Recall and F1. There’s a great article that gives a Comparison of the best NSFW Image Moderation APIs as of 2018 along these metrics.

Setup & Integration

Most of these APIs are web-hosted and easy-to-integrate.

They typically have a few of lines of code that you need to integrate and pass your image URL or bytes (raw file).

Nanonets provides an added advantage of generating a docker image for your model and hosting it on your server.

sudo nvidia-docker run -p 8081:8080{{model_id}}:gpu

A sample line to code to run the model in a docker container.

Response time

Most APIs promise a response time of a 200–300 hundred milliseconds. This however does not include travel time between your servers and can also vary depending on the size of the image you are submitting. So you should probably want your provider to have a server in your region for fast response time or just use Nanonets’ docker service and deploy it on-premises.

Compare this with Webpurify’s Human Moderation service which promises a response time of <2 mins. It is 10x the response time compared to APIs!

To summarise this well, Machine learning based APIs compared to human moderators are:

So in all, machines are definitely a lot more suited to the job than humans.

So why do we still need human moderators?

Well, the answer to that is that machines are still not well-suited to handle subjectivity and can easily be tricked.

Consider the following image:

You can see the original image here. WARNING: It’s explicit

We tried the above image with 2 of the services mentioned above:


Clarifai wrongly classifying it as SFW with a 91% confidence


Picpurify wrongly classifying it as SFW

So what happened here? The patterns and the see-through nature of the woman’s clothes confused the neural networks and they were unable to classify the image as NSFW or gave a completely different prediction.

The lack of training data of nude Japanese women in a traditional kimono can create this sort of a bias for these APIs, which are mostly based out of US and Europe and train their networks on mostly images of individuals of majority ethnicity in their region. So incase you have users apart from these regions and are uploading local porn (or other offensive content), most of the ready-to-use APIs might not be of too much help here.

2. Societal Context

As explored above, what’s okay in 1 region might be scoffed at in another. As most of the AI API providers are based out of western regions, they typically are not in-tune in more conservative parts of the world. So the question of what’s NSFW is very specific to you, your user demographic and the regions you operate in. Clearly a ready-to-use API is not the answer and hence the need for human moderators.

Ariana Grande’s cover art photoshopped to adhere to modesty laws in Iran and Saudi Arabia (source: Petapixel)

3. One size doesn’t fit all

Most API providers give a score of whether the image is acceptable, or additionally might tag it according to their pre-decided meta-tags. Amazon tags it’s images as follows:

Now you might have some of your own tags to create based on the niche you serve that fall in-between these categories. You don’t have an option to do that. Tagging (which is the backbone of recommendation) is the bread-and-butter of most social UGC apps today and if using any of the ready-to-use APIs, you’ll be stuck with the pre-determined tags.

How to reduce dependency on human moderators

Constantly re-training your models to identify the missing gaps is the way to reduce human dependency. Re-training basically means adding your specific NSFW dataset and training it “on top” of a pre-existing model. This way the model keeps getting better on identifying things that it previously missed.

So say for example there are images on your platform that are anti-semitic in nature and you wish to ban them to ensure a hate-free environment. Your chosen API provider doesn’t filter such images currently and you wish to create a dataset of these anti-semitic images which follow a typical pattern. You can create a dataset of these images and re-train on top of the pre-existing model so that it can start classifying them as “unsafe”.

But most API providers don’t let you do that or it’s included in their “Enterprise” tier.

Enter Nanonets

We at Nanonets realise this particular issue and have added the feature to add your own images and define your additional tags on top our Content moderation model so that you can improve overall accuracy for YOU.

Improve the accuracy for you of our hosted model on

Using transfer learning, we train a model that learns from your data and adapts to your needs.

Case Study: Largest Indian Social Network


We had India’s largest local social network with over 50M Monthly active users come to us with a very specific problem. Their chosen API provider was making errors when sent Indian images. The accuracy of their previous provider was ~72% on such images.

Why was their existing solution not working?

An ML model is only as good as the data it’s exposed to. Most of the current moderation models available have been trained on generic data. They thus fail on predicting on user-generated-content produced locally using low-quality cameras on budget smartphones in rural India.

These images are very different in content, skin color, camera etc than the publicly available images one would find on the search engine of your choice or any publicly available dataset.


We asked the business about their required sensitivity levels for the user demographic they serve and about 10,000 images — both positive and negative samples.

We used this data to train new model on top our pre-existing model. This allowed us to fine tune the sensitivity and expose the model to content specific to their platform.


We now had a model whose accuracy improved by over 23% and jumped to ~95%! The entire exercise end-to-end from defining the problem statement to sharing the data and then finally delivering a model took <1 month.

accuracy improved by over 23% and jumped to ~95%!

The ability to tune our models for specific demographics and definitions of NSFW allow it to be much more powerful and adept at dealing with this problem.


NanoNets: Machine Learning API

Thanks to Parv Oberoi and Sarthak Jain

Arun Gandhi

Written by

Here for the cookie!



NanoNets: Machine Learning API

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade