World Knowledge Inside LLMs

Blade of Miquella
3 min readSep 26, 2023

--

Recently, the hotspots of the natural language processing (NLP) community have shifted from BERT-style models to large language models (LLMs) such as ChatGPT. The BERT-style pretraining-finetuning paradigm is facing its marginal effects of scaling. In the meanwhile, LLMs have demonstrated impressive capabilities such as zero-shot inference, in-context learning, reasoning, etc.

Nonetheless, one may wonder, how knowledgeable are LLMs? The reason LLMs can perform zero-shot or in-context learning is considered to be that LLMs learn some general knowledge or “world knowledge”.

LLMs store knowledge in their parameters

Training on large-scale text data, LLMs are expected to learn a large amount of knowledge beyond linguistic understanding. Petroni et al., 2019 presented an analysis paper, Language Models as Knowledge Bases? They introduced the LAMA probe, a knowledge probing method that queries LM with cloze questions of factual and commonsense knowledge. The experimental results show that LMs work surprisingly well.

Examples of the cloze questions for knowledge probing in Petroni et al., 2019.

How do LLMs utilize learned knowledge to perform inference?

A typical application of LLMs is few-shot reasoning or in-context learning, which claims that LLMs can learn a specific task from in-context examples and directly do inference without changing its parameters. Clearly, such few demonstrations do not provide all the knowledge required by a task learner from scratch. In other words, the learner, which is a LLM, should know some related “background knowledge” and adapt it to the task.

Razeghi et al., 2022 proposed a study on the impact of pretraining term frequencies on few-shot reasoning. Centering on the numerical reasoning capabilities of LLMs, they demonstrated that the models are more accurate on the instances whose terms are more frequent in the pre-training corpus.

Figure from Razeghi et al., 2022.

Similarly, Kandpal et al., 2023 observed that the QA performances of LLMs are also highly related to the number of relevant pre-training documents.

Figure from Kandpal et al., 2023.

LLMs transfer knowledge across languages

LLMs are also found to be able to transfer knowledge across languages for chain-of-thought reasoning (Shi et al., 2022). For example, in PaLM-540B, underrepresented languages like Swahili (sw) and Bengali (bn) account for less than 0.01% of the pretraining data, but also perform surprisingly well for grade-school math problem reasoning.

Figure from Shi et al., 2022.

Not all of the world knowledge is in natural language

Although LLMs seem to be promising for modeling world knowledge, it has to be admitted that not all of the world knowledge is in natural language. For example, a baby definitely learns something before he/she can speak. For another instance, people can walk or play sports with some kind of knowledge in mind, but definitely do not think about how hard they should push their muscles. Thinking such muscle memory knowledge in language can be very challenging. Therefore, world knowledge should be far beyond what LLMs learn from text.

--

--

Blade of Miquella
0 Followers

Find interesting AI papers, and record my thoughts.