Creating a meme stash with state-of-the-art machine learning

Péter Hatvani.
3 min readSep 20, 2022

--

Business problem: I want to use my memes as reaction images which are superior to the normie memes of gif embedding services.

Solution: tagging my images based on their content and building a database of meme metadata.

This story will be told in many parts to focus on each individual part of the process of how the service was created and the design decisions involved.

  1. Deduplicate the images
  2. Detect text on the images
  3. Detect objects / classify image
  4. Find faces and their emotions

These steps each tackle difficult topics such as how similar can be two images before they are the same, how to detect white(ish) characters on a bright background and SO MUCH MORE

AND SO MUCH MORE MEME

In this post, I will give you the gist of the idea.

Reading / Deduplication

Images can be found in many formats in the wild, but jpeg and png are dominating the internet. Nevertheless, the extension is misguiding whether the image can contain an alpha channel or not (RGBA, RGB) this is all well until you try to create a tensor from the image and crash like Icarus with an incomprehensible error message. So zeroth step converts all incoming images to RGB.

The images are read with the Pillow package but image preparations are done with the OpenCV package because it offers one-line solutions for adaptive thresholding and advanced binarization.

After reading the image the deduplication can commence. To find only unique images I used perceptual hashing from the ImageHash library. Perceptual hashing means: getting a hash that can describe an image uniquely from the point of characteristic. Resizing and simple transformation — even cropping can be bypassed with pHash because it reduces the image into a simpler form. For a more in-depth introduction please consult this great article.

Because the pHash is quite good at identifying unique images I have ordered the images into a dictionary with the hashes being the keys.

current metadata = { ‘image_pHash’ : { ‘path’ : ’path/to/file’ } }

Reading the characters on the image — OCR

After reading the 3 channel images I have found that for the best results (surprisingly most of the OCR — Optical Character Recognition, is done to books that have black characters on white backgrounds) the images need to be preprocessed for the algorithms to be somewhat accurate. To get the best results with an average meme: white characters and a vivid background that can be bright and contains many edges that can confuse the OCR.

confusingly bright image

There are also images that can be easily processed so I added a step that runs without any preprocessing.

Preprocessing steps for OCR

To get the most out of my OCR of choice — tesseract I have chosen four ways the image can be made more readable. All process starts with creating a grayscale image with a kernel of 80 (the image is iterated through finding local maximums adaptive thresholding for smoother grayscale)

grayscale image
RGB to grayscale image with adaptive thresholding
  1. Simple Colour inversion
  2. Binarization with Otsu’s method
  3. Adaptive Gaussian thresholding + Otsu
  4. Stroke-width transformation

The rest of the preprocessing will be proposed in part 2

--

--