Detect artificial text in images
In this article we will see how we can detect artificial text in images automatically with Python and Sightengine. We will use the Python SDK provided by Sightengine.
Difference between artificial text and natural text
Artificial text is defined as being text that has been added to the image after post-processing, while natural text is text that naturally occurs in the photo.
Why detect images with artificial text ?
Detect images with artificial text is good practice. There are several use cases:
- Require that users submit or upload images without artificial text
- Hide ads that have been artificially added
- Filter images containing personal information such as phone numbers, email adresses or usernames
- Detect watermarks
Use Sightengine to detect artificial text in images
Sightengine is an API that can detect artificial text and natural text in images.
Sightengine provides several SDKS in the following languages: PHP, Node.js and Python. In this article we will use the Python SDK as an example.
You must first register on Sightengine’s website, it’s totally free. Just click on the get started button and create an account.
Sightengine works with two keys, we will need these keys to use the SDK in our application. Be careful, do not share them, they are secret keys.
We need to install the SDK. This SDK allow us to use the API in your application.
pip install sightengine
We will send our images to Sightengine. Sightengine will analyze the images and detect if there are text present on the images. Sightengine provides a text model which allows us to detect natural or artificial text in an image.
Sightengine return a JSON, in this JSON there is a text attribute that indicates if the image contains natural text or artificial text. The JSON contains also an an attributes boxes that is an array.
Here is an example of the JSON:
The API is very easy to use. Let say you want moderate the following image:
The only thing to do is to add your credentials and send the url of an image to the API.
Here is the result:
Artificial text detected !
Sightengine is really simple to use do not hesitate to try your own images. The API can also detect other elements such as nudity or weapons but this is not part of this article :-)