About biases in the data and how that affects the factual knowledge language models learn
In this blogpost we’re going to talk about 3 papers presented at EMNLP 2021 [5,6,7] that touch on the subject of biases in the training data, which affect the factual knowledge that language models learn.
About language models and factual knowledge
Pre-trained Language Models (PLMs) such as BERT, RoBERTa, etc., are deep learning models trained to process and generate text. We call them “pre-trained” because they are firstly trained on text from the web  in a self-supervised manner, and then are usually fine-tuned for other downstream tasks. In general, language models receive as input a sequence of tokens and output a vector representation for each of the input tokens. One of the most common ways to pre-train PLMs is to mask out some parts of the input sequence by replacing them with the special token “[MASK]”. Then, we optimized the model to predict from the masked token representation the token that was originally there. In this way, the model learns to represent language, using the context to understand, for example, when the verb should be in present and when in past .
What’s more, a PLM also learns some factual knowledge present in the pre-training data. For example, it learns that if a sentence contains ‘capital’ and ‘Germany’, then the other masked token is highly probable to be ‘Berlin’. We know this because given the way we optimized the model, we can create cloze-test templates like “[MASK] is the capital of Germany” and check the output probability distribution for the masked token (we call this strategy for inducing predictions prompting). However, to what extent PLMs are capable of storing knowledge is sort of an open question. We do know that retrieving facts from PLMs can be as accurate as other NLP strategies for extracting knowledge [3, 4]. But we don’t know how biased this knowledge is, and how much these possible knowledge biases affect the model’s predictions and its performance. The papers we’re going to look at right now will help us shed some light on these questions ✨.
The main ideas of the papers [5, 6, 7] can be summarized as:
- There is some factual data that maybe PLMs won’t be able to learn just from the text on the internet, since there is factual knowledge that we don’t state explicitly because we assume everybody knows it, e.g.: Strawberries are red. 
- There is some factual data that is hard to extract from PLMs, but they seem to be able to use it for other tasks .
- There is some information that models learn so well during pre-training that is really hard to overcome when using related information in a completely different context, e.g. in a book animals may be able to talk even though in reality they can’t, and we would expect machine reading models to be flexible enough to still understand that .
Let’s go now a bit more into the details of how they probe these claims.
The World of an Octopus: How Reporting Bias Influences a Language Model’s Perception of Color 
To understand what this paper looks into we first need to be familiarized with the concept of “reporting bias”, which is the fact that we tend to not report or communicate things that are obvious and therefore unnecessary to say, e.g. “green banana” is more frequent in a general text data than “yellow banana”.
Given that fact, this paper focuses on the question of whether PLMs learn the real colors of objects or if they learn the wrong distribution present in the pretraining data.
To answer this question, the authors create the CoDa dataset which contains the answers of humans about the color of some objects. Specifically, for each object people indicated how frequently that object can be found in red, yellow, green, etc. The answers were then normalized to obtain the color probability distribution for each object.
Then they create some templates to query what colors PLMs think objects are, e.g. they use “Most apples are [MASK]”, and then they check the output probability distribution of the masked token to compare the ranking of each color. Finally, they compare this ranking with the one in CoDa, and conclude that PLMs have a better ranking of the colors of objects that don’t have a particular color, i.e., PLMs predict more accurately (i.e. similar to a human) the color of a car or a t-shirt rather than the colors of a lemon or an apple.
They experimented with multimodal models (language models that also take an image as input) and found that these models can better predict the colors of objects.
ALL Dolphins Are Intelligent and SOME Are Friendly: Probing BERT for Nouns’ Semantic Properties and their Prototypicality 
In this paper, the authors run multiple experiments to try to understand whether BERT encodes the features or properties of some concepts (e.g. that dolphins are intelligent and friendly), and whether it can tell which ones are inherent and which ones are not (e.g. the green grass vs the white horse), or in other words, which adjectives reduce the scope of the noun and which ones don’t.
They create multiple prompts to query BERT’s knowledge of the properties of each concept using different templates, e.g. “Dolphins are [MASK]”, “A dolphin is usually [MASK]”, “Most dolphins are [MASK]”, etc. They find that the number of properties successfully extracted using each query differs by a lot, that is, BERT predicts more accurate properties with one template than with another. Moreover, they found that some of BERT’s predictions were correct but were not present in the dataset. Therefore, they state that it is difficult to conclude to what extent BERT encodes these properties or not.
Given that it is not clear how many of the properties of the concepts BERT learned, they ask the question: does BERT know the difference between the properties that are inherent to a concept and those which are not? To answer this question they performed 2 experiments:
- They query BERT with templates of the form “[MASK] dolphins are intelligent” and compared the ranking of the quantifiers: all, most, some, few, none. They found that BERT predicts “all” for both types of adjectives, those that change the scope of the noun (e.g. red car) and those that don’t (e.g. red lobster). Therefore, it is not clear whether the model knows which properties are inherent to an object and which are not.
- So then they perform a second experiment, where they take the vector representations of BERT for the sequence “noun [SEP] adjective noun”, e.g. “lobster [SEP] red lobster”, where “[SEP]” is a special token that is used in the pre-training of BERT to separate two sentences. They add a classifier on top of the token representations and they fine-tune the model to predict whether the first part (before the [SEP] token) is contained in the second part. In the lobster example it is contained because all lobsters are red, but “horse [SEP] white horse” would be false. They find that BERT has state-of-the-art performance in this task, showing that the model does encode the knowledge to decide which properties are inherent and which aren’t.
Locke’s Holiday: Belief Bias in Machine Reading 
To understand the findings of this paper it is important to know that machine reading models are trained to predict an answer given a context and a question, which the models have to answer with reference to the context. Also, we need to know that “belief bias” is defined as the tendency to evaluate a statement based on prior beliefs rather than its logical strength
This paper shows that PLMs fine-tuned for machine reading tasks have a strong belief bias, which they obtained during their pretraining and that cannot be overcome in the task of machine reading. That is, PLMs have a strong bias towards predicting what they already know about the world despite the context stating the opposite, e.g. when you input to machine reading models:
- The context “Leth was painted blue with paint, whereas thinking turned Jacques blue.”, and the question “Why is Jacques blue?”, models predict “paint”.
- The context “It is rarely the case that a buddhist meditates; instead, he plays drums” and the question “What does a buddhist do?”, models predict “meditate”.
Furthermore, they also run a crowdsourcing experiment to see the performance of humans in the task and humans get perfect performance. So the conclusion is that models have some really strong beliefs that they cannot overcome when they are presented with opposing evidence.
We went through the details of 3 recent papers and we saw that:
- Prompting models is hard, some queries work better than others and we don’t know whether the queries we are using are the best ones.
- There is reporting bias in the training data and the object’s colors learned by PLMs seem to follow this bias, i.e., PLMs do not seem to be able to overcome the reporting bias.
- Models learned some common beliefs and are not able to ignore them when the entities are used in a different way.
These papers show us that the pre-training data has some knowledge biases that affect what models learn and how they perform. Concurrently, we also see that it is challenging to conclude what knowledge PLMs encode in their parameters.