Testing for explicit content on a website with Google Cloud Vision API

Aleksandra Żuchewicz
3 min readApr 5, 2020

--

Many of us are working on Web Applications which links to other websites.

But are we sure that the content of a website we used is the same as it was when you saw/used it for the first time?

Nowadays, we can’t be so sure.

Internet Domains are getting expired, other companies buy them immediately and publish explicit content such as porn or violence there. Linking to such websites can damage the company’s reputation.

You can imagine a situation when someone clicks at the URL on your website and suddenly he/she sees adult content! That may have a huge impact on how people perceive your app in the context of reliability and security.

The above situation sounds like something impossible to test. You can remove all links from your website or use Google Cloud Vision API.

Tool Overview

Google Cloud’s Vision API offers powerful pre-trained machine learning models through REST and RPC APIs. The feature we are the most interested in is Detect explicit content within an image.

The tool returns information about how likely it is for an image to have content from the following 5 categories:

  • adult (the website may have nudity or sexual content)
  • spoof (the website may be offensive or funny)
  • medical (the website shows medical images)
  • violence (the website contain violent images)
  • racy (the website has images with skimpy clothing, some nude body areas)

Implementation

To show how SeaveSearch Detection works, we will create a script in Node.js as an example.

Google Cloud Account

If you do not have one yet, you will need to create an account on the Google Cloud Platform and enable Vision API. The following instruction will help you to do so.

After enabling Vision API you will receive authentication token in a JSON format. Create a project folder anywhere you like and save JSON in it as token.json

Prepare Input Screenshot

To check if Vision API works properly, you will use a screenshot from the google.com website. Open google.com in your browser, take a screenshot and save it in a folder screenshots/google.png

Node.js Script

To use Google Vision API in your code, you will need to install dependency:

npm i @google-cloud/vision

Create a new script file index.js and load vision SDK module:

const vision = require('@google-cloud/vision');
// Creates a client
const client = new vision.ImageAnnotatorClient();

Then you need to read a file with the image of the page you want to check and send it to API:

// Provide path to screenshot 
const fileName = './screenshots/google.png';
// Send to API
const [result] = await client.safeSearchDetection(fileName);
const detections = result.safeSearchAnnotation;

As you can see a response from API is saved in detections object. It contains a summary of image content. To print indicators from this object use the following code:

console.log(`Adult: ${detections.adult}`);
console.log(`Medical: ${detections.medical}`);
console.log(`Spoof: ${detections.spoof}`);
console.log(`Violence: ${detections.violence}`);
console.log(`Racy: ${detections.racy}`);

Finally, the whole script looks like this:

To run it you need to use the command defining the environmental variable with the token:

GOOGLE_APPLICATION_CREDENTIALS=./token.json node index.js

The output of the program:

Adult: VERY_UNLIKELY
Medical: UNLIKELY
Spoof: UNLIKELY
Violence: VERY_UNLIKELY
Racy: VERY_UNLIKELY

This and more complicated examples can be found on my GitHub.

Conclusion

The main take away is that Google SafeSearch Detection is available through the API and its the easiest way to filter out explicit content reliably. In practice, we would test multiple links to websites either from a database or scraped from the Web. Screenshots can be made in an automated manner using the tool like Selenium. This way we can get rid of links leading to unwanted content.

--

--