Multimodal Automated Content Moderation

Tech @ ShareChat
ShareChat TechByte
13 min readMay 26, 2021

--

Preserving Integrity at ShareChat & Moj (Part I)

Written by Jatin Mandav, Rishubh Parihar, Srijan Saket, Vikram Gupta, Debdoot Mukherjee

Millions of posts are created every day on ShareChat and Moj which makes timely and precise detection of Integrity Violating Content (IVC) crucial for ensuring the safety of our users and the integrity of our platform. Due to the sheer volume of these posts, it is not possible to moderate every post manually, thus our content moderators are assisted with our advanced multimodal algorithms to flag suspicious content and route them through the moderation workflow.

Since these posts are in the form of images, text or videos, it is vital that we effectively process multiple modalities together in our content moderation models. For instance, in Figure 1, to understand the post correctly, one has to build an understanding of both the text and the visual together; considering either modality does not provide enough information about the context.

Our mantra is safety of our users and integrity of our platform.

Let us start with the first part of a three-part series (part 2, part3), where we present our journey of developing the best-in-class multimodal AI algorithms for automated moderation of multimodal content. We discuss why various off-the-shelf content moderation solutions do not work for us and also delve into innovative techniques which we employ to deal with the scarcity of labelled data for IVC detection.

Figure 1. Example showing two pictures with the same text but very different context. The first one is an IVC content whereas the second one is a safe brand promoting post.

What is Integrity Violating Content (IVC)?

Our platform sees various types of Integrity Violating Content (IVC) like NSFW (Not Safe For Work) posts that can include CSAM, nudity, violence, gore etc., bait posts that may trick the user to share, like or click it, spammy posts, hate speech, fake news and so on. Few examples of these posts are shown in Figure 2.

NSFW (Not Safe for Work) posts contain pornographic content or illegal activities such as violence, gore, suicide etc.

Engagement-baits entice the user to either like/share a post or to follow the creator but usually do not have high-quality content.

Spams: Similar to engagement baits, we have content where creators repeatedly upload false advertisements to gain a larger reach.

Figure 2. Examples of Integrating Violating Content (IVC) on our content sharing platforms ShareChat and Moj

The aim of the creator of such content is to make the post reach a large audience and to get more likes/shares. Such content often gets a huge number of likes and shares which fulfils the purpose of the creator but results in a bad user experience. Hence, it is important to detect and remove such content from the platform as early as possible for a better user experience.

Content Moderation Workflow

Our content moderation workflow comprises both the human moderators and the AI algorithms who work in tandem to flag Integrity Violating Content (IVC) as soon as they are created on our platforms.

Due to the sheer scale of the content posted every day on the platform (3 million+ daily posts), it is not possible to moderate every post manually. To tackle this problem at scale, our content moderators are assisted with our advanced multimodal algorithms to flag suspicious content and route them through the moderation workflow. Figure. 3 presents a high-level summary of the workflow.

Content Moderation Policy: Proactive, Reactive and User Triggered

Figure 3. Content moderation workflow

Proactive Approach: Every post created by our creators is examined through a series of AI models to obtain the probability score of a post being an IVC. Based on the confidence of these predictions, the post is flagged for a manual review. After a post is flagged as IVC by the models, the action taken on the post and prioritisation for manual review is based on the severity of the class of IVC detected. For instance, NSFW posts may be prioritised higher in manual moderation and may invoke a stricter ban on the creator as compared to bait posts. This manual review of flagged posts not only helps us to control false rejections but also helps the model train better as similar posts are automatically discarded subsequently.

User feedback: Apart from the automated flagging system, IVC are also reported by the users of our platform. Generally, these posts appear on the platform when the models fail to flag them as IVC. To prevent abuse of the report feature, we route the reported IVC to the content moderation team. Since our AI models failed to detect these posts, we label them and use these examples for retraining our model.

Reactive Approach: Another strategy that we have adopted is that we routinely review highly viral content as they impact a large number of users. Every post that crosses a preset threshold of virality is reviewed manually by the expert reviewers.

Performance Metrics: As we constantly improve our models and processes for content moderation, we are guided by the following metrics that measure our progress in IVC detection, even in an adversarial setting where bad actors on the platform are finding new ways to disturb the integrity of our platforms.

  • Views on IVC post: Despite our best efforts, some IVC posts may slip through the gates and get exposed to some of our users before eventually getting discarded. We measure the number of times an IVC post is viewed by our users before eventually getting discarded.
  • Automatic moderation rate: We measure the fraction of IVC posts which are detected by the AI based moderation systems.

Why do off-the-shelf solutions don’t work?

Figure 4. Unique properties of the content posted on our platform.

IVC detection for images is a common problem for any content driven platform and there are existing solutions for the task like Google Vision, ClarifAI, Amazon Rekognition etc. However, these do not work well for our purpose due to the following reasons:

  1. Multimodal: Most of the above-mentioned models use only visual information to detect low-quality content whereas our content is multimodal in nature. As mentioned earlier, our models adopt a multimodal approach that processes visual, audio and text data in conjunction for classifying IVC.
  2. Multilingual: Our content is multilingual in nature and covers a wide variety of Indian languages, demanding more specialized models tuned for the Indian context.
  3. Cultural and Geographical Context: These solutions are trained on global data and work reasonably well on common scenarios but miss the cultural and geographical subtleties that become important to capture for correctly classifying more nuanced examples. For instance, censored content (using blurring) may not be considered IVC for some regions but is taken to be offensive in another one.
  4. Creator Profile: The profile of the creators also plays an important role in IVC detection. A creator with a good track record is less likely to post such content and thus we want our models to accommodate the context about the creator also while grading the content.
  5. Explainability: Since we want to encourage and educate our creators about content quality, we also want our models to explain the reason behind the removal of the post. We take special steps to make our models more explainable.
  6. Non-consensual content forms an important problem as these scenarios require a fine-grained and deeper understanding of the content. Our custom datasets cover such scenarios.
  7. Image Manipulation: We observe that the off-the-shelf solutions fail on images which have been manipulated using overlaid image filters. These filters form an important part of our platform and our IVC models need to be robust to them. Moreover, since the overlays and image manipulation tricks evolve over time, continual learning of our models is able to capture such patterns better.
  8. Evolving Policies & Trends: The definition of what constitutes IVC content is continually updated to keep up with the latest content trends that emerge on the platform. Our models are retrained at frequent intervals to reflect the changes in policies and capture new trends.

In-house Dataset

We have an in-house dataset of IVC which is updated continuously and respects all our requirements. To enhance the explainability of the predictions, we annotate IVC posts into further subcategories that cover different types of IVC (ex. gore, violence, nudity, bait etc.). As manual annotation of millions of posts is time-consuming as well as expensive, we employ smart data augmentation techniques to create large datasets with minimal manual labelling. Moreover, since the ratio of IVC content is very less as compared to the acceptable content, finding sufficient amounts of IVC content across all the subcategories is a major challenge in itself and needs advanced methods. We describe the initial baselines on these methods below and we will present more recent advances in this direction in Part — 3 of this series.

Label Propagation

The core idea of label propagation is to propagate the labels from a few set of labelled examples to unlabelled examples based on their similarity with the labelled examples.

The idea of label propagation is to propagate the labels from a few set of labelled examples to unlabelled examples based on their similarity with the labelled examples. To capture similarity we obtain a good semantic representation for the content taking all the modalities into consideration. We start the process with the set of labelled examples — Figure 5(a). For every unlabelled example (red circle in Figure 5(b)), we find the nearest neighbours with high similarity measure in the multimodal embedding space, and assign the same labels to this example as shown in Figure 5(c). This process repeats itself for all the unlabelled examples.

Since the amount of labelled data increases with iterations, finding the nearest neighbours efficiently becomes challenging. We used FAISS to obtain the nearest neighbour. Although label propagation is effective, it does not work in conjunction with the training of the task-specific model. Moreover, the quality and coverage of initial samples play an important role in getting good quality annotations. Therefore we leverage active learning to solve the problem.

Figure 5. Label propagation, We start with a few labelled samples (a), For an unlabelled sample (red circle) we find the nearest labelled sample (b) and assign it’s label to the new sample if the similarity metric crosses a certain threshold

Active Learning

We use active learning to implement continual learning in our models. We have observed that model performance can worsen over time if not updated regularly. This could happen because of new edge cases entering the system and people finding innovative ways to manipulate and fool the system.

Continual learning of our models aided with active learning ensures that our models are able to keep up with the dynamic and adversarial nature of the IVC detection problem.

Figure 6. Overall illustration of Active Learning

Also, it helps to reduce the amount of data to be manually annotated, thus saving us time and effort. In an active learning framework, we first train the model on the labelled training data and then use it to predict on the unlabelled data. All the data points where the model is uncertain about the predictions are sent for manual annotation. The uncertainty can be measured in terms of entropy of the predictions or the ratio of the probabilities of top two predicted categories.

High entropy signifies that the model is confused about the predictions and the examples should be sent for manual annotation, while lower entropy caused due to peaky probability distribution suggests confident predictions.

The ratio of probability of the first and second best predicted category is another way of measuring the confusion of the model. We use these measures to filter out posts for manual annotations. Also, we focus on building a label taxonomy so that we can keep a check on model performance on specific categories.

Semi-supervised Learning

As noted earlier, the positive samples for IVC are very sparse and thus finding sufficient amounts of IVC data across all the IVC subcategories is a huge challenge in itself. To tackle this problem, we have been working on semi-supervised learning techniques viz. Mean Teacher models that can be trained with few data samples, without compromising the performance. We will discuss these techniques in the third part of this series.

Multimodal Model Architectures

Developing neural architectures that can effectively combine representations learnt from visual, audio, text as well as other additional context on creators etc. is a key area of focus for improving IVC detection. We have experimented with different techniques to fuse representations learnt from different modalities — Early Fusion, Late Fusion and Mix Fusion.

In Early Fusion, we fuse the multimodal features using concatenation, addition etc at the early stage only and use the aggregated features as input to the IVC classifier. Different modalities are appropriately normalised so that the values are in a similar range.

Figure 7. Early Fusion and Late Fusion of modalities

In Late Fusion, we pass the features of each modality as input to the modality specific IVC classifier. Each modality produces a probability distribution and we fuse the predictions of the modalities using another classifier.

We notice interesting results by using a combination of late and early fusion. We early fuse some modalities while fusing the others at a later stage. We call this setup Mix Fusion.

In the next sections of this post, let us delve deeper into the visual classifiers for NSFW detection on images and videos to have a concrete discussion.

Training Vision Backbone

In this section, we will explain some of the vision models that we have developed for detection of IVC content on the platform. We leverage state-of-the-art architectures for training visual classifiers. We bootstrap training using OpenImagedataset as it has 5000 classes and includes classes such as organs, bikini, swimsuits, blood etc. which can be indicators of NSFW content. Next, we train a multiclass classifier on top of this feature extractor to classify among multiple IVC categories.

Figure 8. PCA embedding space of pre-trained Resnet101 model (left) and multi-class classifier model (right). Different colors represent various categories of NSFW

In Figure 8, we observe that even the pre-trained Resnet101 trained over OpenImage is able to separate out some of the IVC categories. However, some classes are overlapping in the embedding space (lower left corner). Training a classifier (right) on our IVC dataset gives a better embedding space and separates the IVC categories well. We do notice an overlap of clusters at the center of the plot, but on investigating further we found the overlap between similar IVC categories contains images with similar labels. Fine-tuning the Resnet101 CNN backbone leads to further gains instead of just using it as a feature extractor.

Ensemble of Binary Classifiers

The multi-class classifier fine-tuned on our dataset forms a reasonable starter baseline but cross entropy loss heavily penalizes misclassification among similar IVC classes. For instance, it might be acceptable to allow mistakes between nudity and suggestive nudity but the loss heavily penalizes such mistakes and this hampers the training. To address this issue, we experimented with training multiple shallow but expert binary classification models over a subset of IVC categories. We combine outputs from all these binary expert models using a separate classifier to generate the final IVC classification. We observe an average improvement of 7.9% in recall with this approach when compared to a multi-class classifier.

Figure 9. Ensemble of Binary IVC Classifiers

Leveraging Creator Profile

Tracking the quality of the content published by a creator plays an important role in improving IVC detection. At ShareChat and Moj, we carefully examine the content uploaded by our creators as well as the engagements received on their posts. This helps us in encouraging the creators who are creating engaging and entertaining content while educating the creators who are not getting enough views or are violating our policies. We observe that IVC content usually comes from creators who are either not aware of the policies and guidelines or are voluntarily posting such content. To this end, we add a representation of the creator to our model which takes into account their history of violations. Also, we are exploring methods to improve such creator representations to capture collusion between offenders on the platform.

We observe an average improvement of ~25% in recall points by leveraging the creator history along with content understanding.

Figure 10. Fusing creator profile

IVC Detection in Videos

In the previous section, we discussed vision models that process an image. For the purpose of IVC detection on videos, we start with a baseline approach, which runs inference with the same models on different frames of a video and aggregates the frame-level predictions. To manage computation cost and real-time performance, we sample a few frames from the videos for this purpose. Frame sampling needs to be performed intelligently because IVC content can be located in a few frames which may get ignored completely during sampling.

To this end, we extract Hecate frames from the video which captures the most different and clear frames from the videos and helps us to trade-off accuracy and computation cost. In the second part of this series of blog posts, we will discuss how we have significantly improved upon this baseline by efficiently incorporating spatiotemporal information present in videos.

Figure 11. Frame based model

Coming Up Next

This post was the first part of our three-part series and introduces the problem of multimodal content moderation and discusses some initial approaches. However, accurate yet efficient multimodal content moderation calls for more sophisticated mechanisms. In the second part, we will be discussing architectures for modelling spatiotemporal information and discuss ideas like knowledge distillation to increase the efficiency of video models. In the third part, we will discuss semi-supervised learning approaches for making these accurate without being data-hungry.

Read the second part and third part here.

Designed by Ritesh Waingankar and Vivek V.

--

--