Predicting Hateful Memes on Social Media

8 min readDec 15, 2021

This article was produced as part of the final project for Harvard’s AC215 Fall 2021 course.

Hate Speech has become a growing problem on the internet, and social media specifically, over the past decade. In a 2018 survey by the Anti-Defamation League, a nonprofit which tracks and fights anti-Semitism, 53% of respondents reported being subject to hateful speech and harassment. For a third of Americans, online abuse was in response to their sexual orientation, religion, race, ethnicity, gender identity or disability, the survey found. Around the world, the problem takes on an even more dire size as more languages and cultural contexts are added to the mix. As a result of previous missteps, Facebook was recently sued by the Rohingya community for $150 billion due to their role in propagating fake news that helped in over 10,000 people dying in targeted riots in 2017. Despite Facebook’s best efforts to tackle this massive problem, internal documents show how it is still capable of identifying only 3–5% of all hate speech on the platform.

Much of the cyber-bullying and hate speech on social media platforms happen in the form of memes targeting a community or specific people. The presents a special challenge to moderation as both the image and the text on it together would make a meme hateful or benign. For example in the images below, the text “Love the way you smell today” could be part of a hateful or benign meme based on whether the background is of something sweet-smelling like a rose or a foul-smelling skunk. The problem is compounded further once we try to moderate content in other language or cultural contexts.

Proposition

We aim to build a deep-learning based tool that is able to take in a meme as input and classify it as hateful or not hateful. We aim to package this as an API that can be consumed in multiple different platforms like a browser extension that parents could use to keep their children safe from online hate. To allow interaction with this API, we build a front-end using React.js where anyone can upload an image and get it classified.

Datasets

We worked with two datasets for this project: the Hateful Memes Challenge (Kiela, 2021) and the Memotion Analysis (Sharma, 2020) datasets. Both of these datasets manually tagged offensive memes found on social media. In the case of the Hateful Memes dataset, the Facebook AI team modified some of the original memes to guarantee that they are licensing compliant. Below we list download links to both datasets

Hateful Memes Challenge: https://www.kaggle.com/parthplc/facebook-hateful-meme-dataset/download
Memotion Analysis: https://www.kaggle.com/williamscott701/memotion-dataset-7k/download

Exploratory Data Analysis (EDA)

Tools

For our Exploratory Data Analysis (EDA), we chose Google Colab and carried out the analysis in Python using libraries like Numpy, Pandas, Tensorflow, Matplotlib.

Downloading and Reading Data

We used Kaggle’s API to load the data into our Colab notebook:

! touch kaggle.json! echo ‘{“username”:”<YOUR KAGGLE USERNAME>”,”key”:”YOUR API KEY”}’ > kaggle.json! pip install kaggle! mkdir ~/.kaggle! cp kaggle.json ~/.kaggle/! chmod 600 ~/.kaggle/kaggle.json# downloads and unzips Facebook Hateful Meme Dataset! kaggle datasets download -d parthplc/facebook-hateful-meme-dataset — force! unzip facebook-hateful-meme-dataset.zip# downloads and unzips Memotion Analysis dataset! kaggle datasets download -d williamscott701/memotion-dataset-7k — force! unzip memotion-dataset-7k.zip

Loading Facebook Hateful Memes dataset

The Facebook Hateful Memes dataset is divided into an ‘img’ folder with all of the memes and three JSON Lines text files (train.jsonl, dev.jsonl, and test jsonl) with image metadata.

The training and development .jsonl files list each image’s unique identification number (id), its file path (img), a label (label), and a transcript of any text that appears in the picture (text). The test.jsonl file differs from the train and dev files because it does not include a label tag. Below are two examples from the train.jsonl file and a diagram representing the dataset file structure.:

{“id”:1845,”img”:”img\/01845.png”,”label”:0,”text”:”when you consume too much trans fat”}{“id”:75021,”img”:”img\/75021.png”,”label”:1,”text”:”where’s jaws when you need him?”}

Sample Images

Methods

Multimodal content can be more difficult to classify than unimodal content. For instance, take the following examples.

By themselves, neither the text or images are harmful. However, when combined like so:

The multimodal meme becomes hateful.

However, if we swap out the image but keep the text, the image again becomes harmless.

Researchers have focused on two broad frameworks to handle multimodal content, early and late fusion systems.

Early-fusion systems fuse the image and text before classifying them. This framework is able to detect hateful content even if the image or text, by itself, is not hateful.

Another approach is late-fusion systems.

Late fusion systems attempt to classify the image and text separately, and then average, or fuse, the classification. While it is much easier to build these systems, they are also much less effective at understanding multimodal inputs.

Facebook released the dataset as part of their Hateful Memes Challenge. With the dataset, they also released the results of several baseline models they trained, asking challenge participants to do better than their own models.

While most challenge participants iterated on Facebook’s pretrained multi-modal models, we were limited in that Facebook developed these models in PyTorch. In addition, most existing multimodal pre-trained models outside of the Facebook challenge have been officially developed in PyTorch. For instance, HuggingFace only offers a pre-trained VisualBert model for PyTorch.

We chose to implement our own version of Concat BERT due to the relatively simple architecture that we could efficiently build from scratch.

Concat BERT extracts features from the image using a pre-trained convnet such as VGG-16 or MobileNet and uses Bert for textual embeddings. These features are then concatenated and run through several dense layers. We experimented with many different hyperparameter setups, convnet models that were either frozen or unfrozen, and a few different NLP models including BERT and Google’s Universal Sentence Encoder.

Here is an example of one such model summary we tested:

We initially trained on Colab before moving over to GCP and made extensive use of Google’s Deep Learning VM Images, which offer integrated support with JupyterLab.

Results

Unfortunately, no longer how long we trained, either for a few hours or 24 hours, all of our ConcatBERT models, regardless of architecture and hyperparameters, seemed to get stuck in a local minima; the validation accuracy would always get stuck at around 63%.

We spent extensive amounts of time working to ‘debug’ this and also are very appreciative of the help we received from Shivas, our course Teaching Fellow. While 63% accuracy may seem good, this result is deceptive.

The predictions end up looking like the following for most data:

The model predicts a label of 0 (not hateful) the vast majority of time. For instance, on the Facebook validation set, our model predicts a label of 0 (not hateful) 94% of the time. This allows it to get 63% accuracy because 63% of the validation set is labeled as not hateful.

Interestingly, when we validate our model on the memotion dataset, we get slightly different results — a validation accuracy of 83%. However, the memotion dataset is even more imbalanced, with not hateful memes making up 97% of the dataset. Our model predicts not hateful in this dataset 84% of the time. It does seem then, that the model is learning something. It is not always predicting not hateful, and seems to predict not hateful at different rates depending on the dataset. Regardless, performance is lacking.

Discussion

Our results show that ConcatBert seems sensitive to the data set. There is approximately a 20% difference in validation accuracy between the Facebook and Memotion datasets, with the latter resulting in better performance of the model. In addition, the model overfits to the harmless labels, and its predictions are biased towards harmless, given that the overwhelming majority of the labels across all of the data is labeled as not hateful. Given the low validation accuracy, especially on the Facebook data, we hypothesize that the model has trouble learning the multiple contexts of what is considered hateful or offensive when presented with both the image and the text.

For any future work, we would like to work with larger multi-modal models like VisualBERT or Visual-Linguistic BERT rather than simply concatenating a vision and language model. Ideally, we would want our model to learn the relationship between a given text and its corresponding image rather than to make classification based on separate image and language embeddings.

Detecting hateful content remains a challenging task! There is some degree of subjectivity on what is considered hateful among different groups of people. While there certainly exists content where the vast majority of people might find hateful, there is also a lot where people disagree. So, if we as humans cannot come to a definitive conclusion, then we question whether computers can really learn the concept of hatefulness.

Hence, it is not surprising to hear that Facebook fails to detect more than 95% of hateful content on its platform. If Facebook or any other entity intends to take down hateful content, this has some profound implications. If Facebook’s internal automation system for detecting hateful content can only be 3 to 5% effective, then is it better to have a model bias towards a hateful classification? How do we keep harmful content offline and prevent them from spreading on social networks without flagging false-positive non-harmful content? Most people agree that the current system of content moderation on social media is not working. Thus, then is it worth it to try to create a model that biases towards hateful content? All of these issues remain difficult to resolve both philosophically and technically. We hope that our work can contribute towards continuing the discussion over such essential societal issues.