Mirador: Real-time Content Moderation

A case study with Twitter Images

Overview

At Mirador, we’ve built a real-time adult-content detection system, using cutting-edge techniques in artificial intelligence known as deep learning. Given an image file or URL, our system scans the picture and in less than a second delivers a nudity rating ranging from 0 (non-nude) to 1 (nude), depending on our system’s confidence of the presence or absence of nudity. Below we describe our systems results, by describing its performance on a dataset of 50,000 hand-labeled images from Twitter.

About the Data

The images for this case study have been sourced from Twitter, by randomly querying the live-steaming API for 50,000 images in tweets associated with explicit hashtags like #boobs & #sexy.

These images were then labeled using Amazon Mechanical turk, where we assigned a ground-truth nudity label to an image once we got consensus from multiple reviewers, each tasked to look for indecent exposure in the picture.

This dataset is especially challenging due to the many borderline cases of men and women in underwear and tight clothing. Non-nude images containing a lot of skin trigger the lion’s share of the mistakes in competing nudity detection systems, where safe images are mislabeled as false-positives. We have strived to make a system that can accurately differentiate between a skimpy bikini and no bikini at all.

Moreover, it’s important to note that nude images are far more common alongside the hashtags we’ve queried for, as opposed to the general Twitter firehose.

Here we compare the distribution of tweets with nude images throughout all of Twitter (estimated) vs. the dataset we’ve decided to work with

An important note to make about this dataset is that it only includes natural images. That is, only images that were taken with a camera. Cartoons and 3D-renderings are not included in our training set, so unpredictable results may occur when they are fed into our classifier. Down the line, we plan to add support for classifying un-natural images.

Definitions

Safe : We define an image as not safe if it contains any nudity. By that, we mean an image is not safe if a bare penis, vagina, butt, or female nipples are visible in the image. Under this definition, an image of a man or woman in a revealing bathing suit is considered safe.

Positive/Negative : Our classifier seeks to find the existence of nudity in an image. For that reason, unsafe images are considered positive examples and safe images are considered negative examples.

True Positive : An image is defined as a true positive if that image is not safe and our classifier determines that it is not safe.

False Positive : An image is defined as a false positive if that image is safe but our classifier determines that it is not safe.

True Negative : An image is defined as a true negative if that image is safe and our classifier determines that it is safe.

False Negative : An image is defined as a false negative if that image is not safe and our classifier determines that it is safe. This is the most important type of error to avoid.

Results

First, we present how our ratings could determine whether an image is safe or explicit. simply setting a threshold in the ratings scale, we define that images with ratings above the cut-off are considered Safe, while those below are considered Not-Safe. We tried 3 different cut-offs with varying results:

  1. By setting a low threshold, all images that are remotely found to be explicit are labeled as Not-Safe. This configuration ensures that 99% of the images our system deems as Safe are indeed non-nude, hence minimizing false-negatives rate down to 0.48%.
This threshold is especially interesting for monetization efforts, where we can identify which images are brand-safe for adjacency to advertisements, while minimizing the risk of a brand being placed next to a nude image by mistake.

2. The median threshold renders the highest percentage of correctly labeled images overall.

A median threshold is best for gathering analytics and statistics on the content of an image stream. Here we maximize the number of images we label correctly.

3. In this configuration, our system will only flag images where it’s extremely confident about the presence of nudity. Only 1% of Safe images are misclassified as Not-Safe by our system.

A liberal threshold is especially interesting for social media sites interested in automatically banning users posting explicit content, where the high threshold ensures that innocent users won’t be falsely accused with high probability.

For those with training in statistics and machine learning, we present our P/R & ROC curves, as well as histograms of our ratings in both classes:

Live Demo

Also, comment below if you want to check out a live demo of our system scanning images sourced from the Twitter. Again, we query Live-streaming API for images with explicit hashtags, and then categorize them purely based on the visual content (no text or user analysis used). We’re proud to say that our system’s average round trip time is 250ms, with speeds as low as 50ms recorded.