Can You Label Less by Using Out-of-Domain Data?

Rafal Kocielnik
Trustworthy Social Media
5 min readFeb 5, 2023

Check out our paper published on Workshop on Transfer Learning for Natural Language Processing at NeurIPS 2022 conference

Social media propagates many toxic, biased and divisive comments. Despite significant advances in Machine Learning (ML), detecting such content is still very challenging. The definitions of what is considered toxic and the expressions of toxicity evolve fast and require careful judgment by skilled annotators. Unfortunately, labeling social-media data for dimensions of harmful content is challenging and labor-intensive. At the same time most ML approaches require vast amounts of labeled data to be effective.

Active & Transfer Learning with Few-shot Instructions (ATF) without any model fine-tuning!

To reduce the annotation effort we combine several known techniques under one Active Transfer Few-shot Instructions (ATF) approach which is flexible to variable harmful content definitions, requires no training or fine-tuning, and can be executed near real-time. ATF (Fig 1) combines transfer learning to leverage labels from existing labeled datasets, even if these were labeled under different definitions. It also uses active learning techniques to work with limited annotations and request the most beneficial annotation proactively from human labeler. Finally, ATF leverages the already existing internal linguistic knowledge of large PLMs using natural text few-shot prompts. A step-by-step overview of ATF is presented in Fig 1.

Figure 1 — Overview of our Active Transfer Few-shot Instructions (ATF) framework for efficient labeling of text data.

Application to Real-World #MeToo Twitter Dataset

We applied our ATF framework to annotate a real-world #MeToo dataset collected from Twitter from January 2017 to September 2019 with 7.55 million tweets. We label this dataset for “Sexually Explicit” and “Toxic” dimensions. We transferred labels from existing human labeled datasets:

  • Social Bias Frames (SBIC) [Sap’20] — the dataset contains 34k documents in the training set labeled under categories in which people project social biases and stereotypes onto others. It is labeled for “Offensive”, “Lewd” content with and meta dimensions on whether the harmful speech was intentional (“Intent’’) and whether it targets a particular social group (“Group”).
  • Hate Speech and Offensive Content Identification (HSOC) [Mandl’19] — the dataset contains 6k documents from Twitter and Facebook labeled for “HOF” whether a post contains hate, offensive, or profane content, “Target” whether a post contains an insult (targeted or untargeted).

It is worth noting that both datasets have been annotated for dimensions of harmful content different that our intended labeling on #MeToo. Furthermore, the documents in these dataset are obtained from different social media platforms such as Reddit and Facebook.

Figure 2 — A) Positive and negative transfer from existing datasets (SBIC & HASOC) to Perspective API labeled dimensions (“Sexually Explicit” and “Toxicity”) on a real-world dataset (#MeToo). The X-axis presents different model sizes of a Megatron model from NVIDIA. The Y-axis is the AUC metric of classification performance, where the higher value is better. B) Relative impact of target annotation size on transfer effectiveness.

In Fig 2, we can see the results of using our ATF framework to transfer the labeling knowledge from these existing datasets to our target dimensions of interest on #MeToo. We can essentially see two scenarios: Positive and Negative transfer.

  • Positive transfer (#MeToo “Sexually Explicit”) — gains from transfer are sustained across model (Fig 2.A) and annotation sizes (Fig 2.B). As the model size increases, the gains in positive transfer tend to decrease only slightly (2.45% gap between gain from Meg1.3b and Meg22b) and the overall effectiveness of transfer is largely retained (B).
  • Negative transfer (#MeToo “Toxicity”) — small gains can be inconsistent and turn into losses (1.91% gain in Meg1.3b, -3.3% loss for Meg8.3b, and a 0.9% gain for Meg22b).

The main takeaways from these results are that: 1) if the positive or negative transfer occurs, it is retained across model (Fig 2.A)and target annotation sizes (Fig 2.B), 2) the higher initial baseline AUC for the models likely contributes to the negative transfer, 3) transfer effectiveness can increase with small target domain annotation size, but diminishes with an increasing number of annotations (Fig. B).

Insights Into the Positive & Negative Transfers

We perform additional analysis to understand the nature of positive and negative transfers. Comparing the two scenarios, we can also see that the initial baseline performance of the models is consistently higher for the negative transfer scenario (mean AUC of 58.9) than for the positive one (mean AUC of 55.6). In fact the higher the initial performance of the PLM with a given annotation size (i.e., without source domain data) the lower the AUC gain from the transfer (ρ=−0.66).

We examine whether the sheer amount of external data from the source domain impacts transfer effectiveness. We find that the smaller HASOC dataset (6k) actually offers a higher mean gain of 7.54% compared to a much larger SBIC (34k) offering a mean gain of 4.76% in the same setup. We find that the difference in label imbalance between the source and target datasets is not correlated with AUC gain from the transfer (ρ=0.14). We also find that correlation between source and target labeling dimensions estimated on the source dataset (i.e., SBIC or HASOC) is only weakly related to AUC gain (ρ=−0.27).

The main takeaway is that negative transfer is more likely to happen if the initial PLM baseline on that task is higher. In other words, if PLMs internal knowledge is sufficient to perform well on the task, the value of supplying additional external information via transfer is limited.

Limitations, Applications, and Future Work

One limitation of our work is that the datasets we use rely on untrained crowd-sourced labeling which can be noisy and based on personal biases and perceptions [Binns’17]. Perspective API labeling has known limitations of its own [Bender’21]. Furthermore, PLMs can be biased and toxic themselves when prompted [Gehman’20], which also likely allows them to detect these dimensions based on their internal knowledge.

Some possible future applications of our ATF method:

  • Noisy pre-labeling of unlabeled datasets and selecting samples for future fine-tuning (e.g., via disagreement-based active learning [11]).
  • With some limited initial human labeling of as few as 100 random documents, if the baseline few-shot performance is poor, using prelabeled out-of-domain data can improve the AUC without expending more human annotation effort.
  • We also plan to use this method to efficiently label custom dimensions of toxicity relevant to #MeToo and other real-world data, which are currently not supported by tools such as Perspective API.

--

--