Benchmarking Google Vision, Amazon Rekognition, Microsoft Azure on Image Moderation

Recent advances in Computer Vision have enabled companies and start-ups to develop solutions to automatically classify and segment visual content.

One application of those developments is Image Moderation. In this post, we are going to focus on one aspect of Image Moderation: the detection of sexually suggestive images.

The reason we focus on suggestive images instead of explicit images such as porn is that:

  • this is a hard problem. It is harder to distinguish suggestive poses from acceptable poses than it is to discriminate between porn and safe content
  • there is no standard public porn or explicit image-set to use in benchmarks, whereas there are public reference datasets with suggestive material

Scope

The benchmark focuses on 4 APIs: Google Vision, Amazon Rekognition, Microsoft Cognitive Services and Sightengine. 3 out of the 4 APIs have a way to difference between explicit, suggestive and safe content:

  • Amazon Rekognition returns “explicit” and “suggestive” scores
  • Microsoft returns “adult” and “racy” scores
  • Sightengine returns “raw nudity” and “partial nudity” scores
  • Google Vision is the only API that does not natively distinguish suggestive and explicit nudity and does not return scores. It only returns one of those 5 strings: “VERY_LIKELY”, “LIKELY”, “POSSIBLE”, “UNLIKELY”, “VERY_UNLIKELY”. For this benchmark, the best results were achieved with the following mapping: LIKELY => suggestive, VERY_LIKELY => explicit, all other labels => safe

The APIs have been tested by submitting ImageNet images from the people and person synsets, totaling 1972 images. Those synsets were chosen because images with people are harder to classify than random images. Images of landscapes, wildlife and objects are easily identified as non-explicit whereas images with people, body parts and groups tend to be harder for Automated Image Classification.

The images were divided into two classes: safe images and suggestive images (the dataset does not contain any explicit nudity or porn).

Examples of “safe” images
Examples of “suggestive” images with partial nudity: bare chests, bikinis…

Results

No API was able to get 1972 correct classifications for the 1972 images in the set, and the results varied a lot between APIs.

Number of correct classifications

Google Cloud Vision: 1923 (2.5% error)

Amazon Rekognition: 1874 (5.0% error)

Microsoft Cognitive Services: 1924 (2.4% error)

Sightengine: 1942 (1.5% error)

One last thing…

Interestingly, there were some images that the APIs had issues with. Below are a few safe images that were classified as “suggestive” or “explicit” by at least 2 of the APIs we tested:

Who’s hiding here?
Hmmm… Kind of limit isn’t it?
Nice camera!