Large Language Models and Rule Following

9 min readSep 17, 2023

What makes large language models (LLMs) unique compared to existing programs? One thing that makes them unique is that they can follow rules the same way people do.

An excellent illustration of the difference between how computers and people act on instructions is by asking someone to write instructions for a simple task. If someone acted out these instructions literally, they would fail to accomplish the task. For example, making a peanut butter and jelly sandwich can be difficult to explain when all instructions are taken literally. Writing these instructions in a way that is specific enough to be unambiguous looks much closer to computer code than to the instructions we would give a person.

There are two ways people follow instructions differently from a computer running code.

Rules are descriptions of behaviors, but they generally cannot be defined by those behaviors.
Rules have a normative component.

Defining Rule Following

Computers have to be told how to follow instructions for every different situation. In contrast, people are relatively good at understanding the context for instructions.

What is a rule? Rules are descriptions of behavior. Understanding a rule means understanding if some behavior fits the description, even in novel situations. In this case, a “behavior” for a language model is an input prompt string along with its output string (which is a continuation of the input text).

The philosopher Wittgenstein grappled with what it means to follow a rule. He asked if it is possible to define a rule by listing all possible behaviors it describes and concluded that this wasn’t possible. Any set of behaviors will be incomplete. There will always be some novel behavior that the rule describes, but was not included.

For example, if a teacher asks a student to continue the series of numbers “2, 4, 6, 8…”. You likely recognize this series as the multiples of 2. There are infinitely many possible ways to continue this series, but only one way of continuing it is the multiples of 2 series.

Imagine you’re trying to explain this concept to a student. There is no number of examples you can provide that will guarantee that the student will continue the series correctly. You can decrease the likelihood, but not eliminate the possibility. (One of the skills of teaching is asking questions and giving instructions to minimize the possibility of these kinds of subtle misunderstanding.)

Since the training data for an LLM is also a series of expected behaviors, this means that no amount of training data by itself can serve as a comprehensive foundation for an LLM’s ability to follow rules.

Another possibility Wittgenstein considered for defining rule following was in terms of following explaining it using more basic instructions. People often learn to follow rules by listening to and following instructions.

The parallel with LLMs is how we can use prompts to instruct them how to behave. We can explain the rules for how we want it to behave in terms of more basic descriptions.

ChatGPT prompt to try:

Consider the following series: 2, 4, 6, 8…
Give a few possible ways to continue this series. Be creative.

Category Errors

There’s a concept in philosophy called “category error” that I find helpful in understanding the relationship between behaviors and rule following.

Category errors can be explained using the following joke: a prospective student is touring a college and the tour guide shows them the dining hall, the philosophy building, and then the dorms. The tour guide wraps up by asking if there’s anything else they’d like to see, and the prospective student says “you’ve shown me a lot of buildings, but I was hoping you’d show me the college!”

The joke is that the buildings are the college. He’s missing the “forest through the trees”. The joke illustrates that we might be confused by not understanding when two things are different in kind, like looking for a college as if it’s another building in the tour. By looking for the wrong sort of thing, we will not see it.

Similarly, behaviors and rule following are different kinds of things. Rule following describes behavior and can be illustrated by behaviors, but rule following isn’t itself a behavior. If we try to look for rule following as if it’s another behavior, we will not see it. If behaviors are like trees, then a rule is a forest.

Imagine you want to show someone ChatGPT for the first time and want to convince them it’s able to reason and follow rules. If they are skeptical, no particular example you provide will be sufficient because that example could have been provided in the training data. It may just be parroting out what it was trained on.

This “category error” is a pattern I see often in conversations about LLMs. An example is between syntactic and semantic reasoning. LLMs are not good at handling syntactic tasks like counting the number of letters in a word, but they are good at defining the meaning of that word and using it in a sentence.

The philosopher Wittgenstein argued that defining rule following as following instructions was insufficient as an explanation. Since he believed that rule following was the foundation of language, if we defined language use in terms of another kind of language use, we would never explain the foundations of language.

Wittgenstein’s concern isn’t worth worrying about here, though. He is correct that the foundation of language is rule following, but maybe here we can get by with “close enough”? Maybe it turns out that using lots of training data to “bootstrap” an LLM gets us close enough to rule following that we can then treat it as if it is following rules?

The philosopher Daniel Dennett argues that if we don’t allow this conceptual “rounding up”, the reasoning capabilities of A.I. will be forever mysterious. This is why he suggests using the “sorta operator”, which I discussed in my last post.

ChatGPT prompt to try:

Tell me about the ship of Theseus. What does it tell us about identity?
How does it relate to a "category error"? How does it relate to Daniel Dennett's "sorta" operator?

Intensional Natural Language

There are a couple of terms in logic and the philosophy of language that can help us understand the difference between behavior and rule following.

In logic, the term for something that can be defined in terms of a set of examples is called “extensional”. For example, the group of people who attended a party last week can be defined by just providing a list of everyone who attended that party.

The term for the opposite of extensional is “intensional”. (Note: This isn’t a misspelling of the word “intentional”, which is spelled with a “t”.) Intensional terms can be defined only by understanding the meaning of its description. They can’t be defined by making a list of all examples.

Extensional terms have an interesting property: if you substitute them in some statement with something equivalent, this doesn’t change the meaning of the statement as a whole.

This is not possible to do this with intensional terms. They cannot be substituted for equivalent values without changing the meaning of what is said.

There are plenty of intensional expressions that computers can handle just fine. For example, a computer could utilize the statement “the set of all numbers that are a prime number plus one”. So long as someone translates this natural language statement into math or computer code, computers can be made to “understand” this definition just fine.

However, computers have not been able to handle intensional expressions that are only expressed in natural language. This is because they haven’t been able to “understand” a natural language expression in the same way a person can.

Since you can’t use an intensional definition without understanding its description, computers have relied on people to translate any natural language expressions into a format it could use. This means that computers have not been able to work with any intensional terms whose only form of expression is in natural language.

LLMs are remarkable because they do not have these same constraints.

The rules that people follow are intensional. People can understand when a rule’s natural language description applies without needing for it to be broken down into a set of examples. For a typical computer algorithm, we would need to provide explicit instructions. When those instructions have exceptions, we would also need to explain what to do in each of those situations.

Well written software may hide these details well. However, it takes a lot of hard work to make sure computers do not place the whole jar of peanut butter in between two slices of bread while making a sandwich, metaphorically speaking.

This is often what is meant by saying language (and rule following) is “contextual”. Demonstrating an understanding of intensional statements made in natural language means knowing what following the rule looks like in novel situations never seen.

ChatGPT prompt to try:

Please define the term “je ne sais quoi” and use it in a sentence.

Normativity

The second way rule following is different from computer algorithms is that the rules that people follow are both descriptive and normative. People can know how to follow a rule while choosing not to follow it.

Rules don’t simply describe behavior, they also compel us to adhere to them. Comprehending when a rule applies is the descriptive aspect, and the desire to follow the rule (or not) is the normative aspect.

This breaks from my previous post, where I said that rules can be either descriptive or normative. The distinction doesn’t make sense because all rules describe as well as compel behavior.

We can imagine a situation where we might understand how to follow a rule, but choose not to follow it. For example, if someone is coaching an American Football game with very young kids, you might choose to ignore a situation that would normally have counted as a “touchback” to keep the game fun for them. This isn’t “redefining” what a “touchback” is. Rather, you’ve just decided to ignore the rule and its normative expectation.

An interesting application is that LLMs like ChatGPT seem to understand how to curse, but “choose” not to. If you send a curse word in your prompt, it can show a kind of understanding for what you said. If provided a “glitch token” it will sometimes choose to curse because the glitch token represents a situation where it doesn’t know how to react.

These two aspects of rules — their normativity and their intensional nature — means that being able to follow rules means knowing how to apply those rules in novel situations, and knowing when rules are meant to be broken.

Large Language Models and Rule Following

Defining Rule Following

Category Errors

Intensional Natural Language

Normativity

Written by Karl