About biases in the data and how that affects the factual knowledge language models learn

About language models and factual knowledge

  • There is some factual data that maybe PLMs won’t be able to learn just from the text on the internet, since there is factual knowledge that we don’t state explicitly because we assume everybody knows it, e.g.: Strawberries are red. [5]
  • There is some factual data that is hard to extract from PLMs, but they seem to be able to use it for other tasks [6].
  • There is some information that models learn so well during pre-training that is really hard to overcome when using related information in a completely different context, e.g. in a book animals may be able to talk even though in reality they can’t, and we would expect machine reading models to be flexible enough to still understand that [7].

The World of an Octopus: How Reporting Bias Influences a Language Model’s Perception of Color [5]

ALL Dolphins Are Intelligent and SOME Are Friendly: Probing BERT for Nouns’ Semantic Properties and their Prototypicality [6]

  1. They query BERT with templates of the form “[MASK] dolphins are intelligent” and compared the ranking of the quantifiers: all, most, some, few, none. They found that BERT predicts “all” for both types of adjectives, those that change the scope of the noun (e.g. red car) and those that don’t (e.g. red lobster). Therefore, it is not clear whether the model knows which properties are inherent to an object and which are not.
  2. So then they perform a second experiment, where they take the vector representations of BERT for the sequence “noun [SEP] adjective noun”, e.g. “lobster [SEP] red lobster”, where “[SEP]” is a special token that is used in the pre-training of BERT to separate two sentences. They add a classifier on top of the token representations and they fine-tune the model to predict whether the first part (before the [SEP] token) is contained in the second part. In the lobster example it is contained because all lobsters are red, but “horse [SEP] white horse” would be false. They find that BERT has state-of-the-art performance in this task, showing that the model does encode the knowledge to decide which properties are inherent and which aren’t.

Locke’s Holiday: Belief Bias in Machine Reading [7]

  1. The context “Leth was painted blue with paint, whereas thinking turned Jacques blue.”, and the question “Why is Jacques blue?”, models predict “paint”.
  2. The context “It is rarely the case that a buddhist meditates; instead, he plays drums” and the question “What does a buddhist do?”, models predict “meditate”.


  1. Prompting models is hard, some queries work better than others and we don’t know whether the queries we are using are the best ones.
  2. There is reporting bias in the training data and the object’s colors learned by PLMs seem to follow this bias, i.e., PLMs do not seem to be able to overcome the reporting bias.
  3. Models learned some common beliefs and are not able to ignore them when the entities are used in a different way.





Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Using News To Make Some Trades With Python and Robinhood

The Hero’s Dark Side:

Correlation matrix:-

Machine Learning Introduction

How to plot data on a Map using GeoPandas in Python.

The chief principle on which science rests — the law of causality

Графік атрымання ўзнагарод

Confused Where To Start EDA ?

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
constanza fierro

constanza fierro

More from Medium

Paper Summary: UNIMO: Towards Unified-Modal Understanding and Generation via Cross-Modal…

Kiswahili NLP: ASR, Translation & Zero-Shot Text Classification with Hugging Face

Feature Engineering in NLP

Forget Complex Traditional Approaches to handle NLP Datasets, HuggingFace Dataset Library is your…