Superrationality: How Decision Theory Resolves Any Dilemma

An introduction to (Functional) Decision Theory

Hein de Haan
Street Science
31 min readJul 5, 2023

--

Can we, in theory at least, find a way to make optimal decisions in any dilemma? This question is more difficult than it appears: experts have disagreed for a long time on the right answer to seemingly simple decision problems.

This (rather long) post is an attempt to explain what the author believes is our best shot at a theory for decision making: Functional Decision Theory, which can be seen as a theoretical formalization of Superrationality.

Note: I’ve published the first eight chapters before over the past few weeks. Chapter IX is new, and this long post combines all chapters.

Chapter I: Omega and her two boxes

The scene: a party with a large group of friends. Everybody’s having fun discussing interesting thought experiments, when suddenly, a strange looking being arrives.

“Greetings,” says the being. “My name is Omega, and I brought two boxes with me.”

Omega puts the boxes — labeled A and B — on the table.

“As you can see, Box A is open, and it contains $1,000.

Box B is more interesting: it is closed, and I will decide how much money to put in it soon.

You see, in a minute, I will ask one of you to play. That person will get two choices:

  • to one-box: that is, to only receive Box B and what’s in it
  • to two-box: that is, to receive both boxes and their contents.
A dilemma: either choose Box A and Box B, or just box B. Box A has $1,000, box B either $0 or $1,000,000.
An interesting dilemma

I will also predict what this person will do. And mind you: I am an extremely good predictor of human behavior in this game.

If I predict she will two-box, then I won’t put anything in box B.

On the other hand, if I predict she will one-box, then I’ll put $1,000,000 in Box B.

A table showing the payoffs for the discussed problem: one-boxing gives $1,000,000 if Omega predicted you’d one-box and $0 otherwise; two-boxing gives $1,001,000 if Omega predicted you’d one-box and $1,000 otherwise.

I have both empty envelopes and envelopes containing $1,000,000 with me, and they look identical, so the player can’t see what I put in Box B.

After I make my prediction and put an envelope in Box B, I promise I won’t change anything about the boxes.”

Omega asks Carl to play. Carl enthusiastically participates. As promised, Omega puts an envelope — either an empty one or one containing $1,000,000 — in Box B.

Carl reasons: “You can’t cause the past: whether I one-box or two-box has no effect on whether or not Box B contains $1,000,000. That’s already fixed now.

I would like to get $1,000,000, but that’s out of my hands.

If Box B has an empty envelope, I will do better by two-boxing: that will get me $1,000, whereas one-boxing will get me nothing in this scenario.

If Box B does contain $1,000,000, I also better two-box: in that scenario, two-boxing gets me $1,00,000 + $1,000, whereas one-boxing gets me just $1,000,000.

In both cases, two-boxing gets me $1,000 more than one-boxing. I therefore two-box.”

And so Carl two-boxes — which Omega correctly predicted. Carl opens Box B, and opens an empty envelope. He still gets $1,000, but it’s not much of a consolation.

Omega asks some more people to play. Some of them two-box, and some of them one-box. Remarkably, though, Omega is right in her prediction every single time: every two-boxer gets only $1,000, whereas every one-boxer walks away with a sweet $1,000,000.

Then Omega asks Eve to play. As before, Omega puts an envelope in Box B.

Eve: “So far, Omega has been right in her prediction every single time. It seems she really is as good at predicting as she says! Since one-boxing has so far turned out to be a lot more lucrative then two-boxing, I one-box.”

Unsurprisingly, Eve finds $1,000,000 in the envelope from Box B.

“If only you had two-boxed!” says Christine.

“Since there was $1,000,000 in Box B, two-boxing would have gotten you $1,001,000. Carl was right to two-box; he just got unlucky!”

Elliott disagrees: “It’s not just Carl: every two-boxer won only $1,000. As every one-boxer won $1,000,000, one-boxing is clearly the better choice.”

“But you can’t change the past!”, Carter says. “You’re arguing that two-boxing causes Box B to be empty, and one-boxing causes there to be $1,000,000 in Box B!”

So who’s right? Carl seems to make quite the rational argument: two-boxing earns the player $1,000 more than one-boxing given Omega’s prediction. However, Eve also has a good point: one-boxers do historically better than two-boxers.

A central claim of this series is that both Carl and Eve are wrong — each in their own way. Another important claim is that there is a right way to approach this problem.

Don’t worry: there will be entire chapters devoted to both Carl and Eve and why they are wrong, and how to approach this problem correctly. But first, we get to discuss some interesting subjects that are useful for understanding Newcomb’s Problem, as the above game is known in academic circles.

Chapter II: An unfortunate gamble

In the television game show Deal or No Deal, the only contestant is faced with 26 cases with wildly varying monetary amounts. The contestant knows which amounts are present, but doesn’t know which case contains which amount.

In a series of rounds, the contestant has to remove boxes from the game, and at some point she gets to take home the amount of the last remaining case. That is, unless she takes a deal, which she is offered at different points during the game.

Such a deal works as follows: the contestant has to stop the game, and in return gets a certain amount of money (which of course depends on the monetary amounts left in the cases).

One particularly interesting game had a contestant play away all cases except two: one with $1, and one with $1,000,000. He was then offered a deal of $416,000. He said “No deal” — in other words, he took a 50/50 chance of winning $1 or winning $1,000,000 over a sure win of $416,000. After that, he opened the wrong case and won $1.

A dilemma: a 50/50 shot at winning either $1 or $1,000,000, or a certain $416,000?
An easy dilemma?

Was he being rational? Some people argued yes: the contestant went with the choice that had the highest expected monetary value. After all, a 50% probability of winning $1,000,000 and an equal probability of winning $1 is an expected value of 0.50 * $1,000,000 + 0.50 * $1 = $500,000 + $0.50 = $500,000.50. That’s more than the offered deal of $416,000!

But while the math here is correct, the argument isn’t: it assumes that the value the same amount of money has to the player doesn’t decrease as the player obtains more money. You have to ask yourself: is $1,000,000 more than twice as awesome to win as $416,000?

For me personally, it isn’t. I’d love to win $416,000, and I’d love to win $1,000,000 even more — but the difference between winning nothing and winning $416,000 is a lot bigger than the difference between winning $416,000 and winning $1,000,000.

And this valuation of money is subjective: a very poor person would probably value getting $100 way more than a billionaire would.

A term that’s useful here is utility, a concept that measures how awesome something is to a person. Just like temperature can be measured in degrees Celsius, utility is measured in utils.

Think of utils as “awesome points”. Say getting a Christmas card is worth 5 utils, having a beer with friends is worth 50 utils, and winning a car is worth 1000 utils to you.

Back to our unfortunate contestant: whether the “No deal” was rational really depends on his personal situation and how he values money.

If he is anything like me, his valuation of money might work roughly like this:

  • $1 is nice, but it won’t make much of a difference in my life. It’s worth 2 utils.
  • $416,000 is very awesome to win: it makes a huge difference in my life, and can pay off my mortgage. It’s worth 10,000 utils.
  • $1,000,000 is even more awesome to win, and will make more of a difference than the $416,000. However, $416,000 would already put me in a great position in life, as I would already be able to pay off my mortgage, for example. So it isn’t worth that much more: it’s 14,000 utils.

And so the Deal is worth 10,000 utils, and the No Deal is worth 0.50 * 14,000 utils + 0.50 * 2 utils = 7,001 utils. So by these valuations, Deal is the better choice after all!

Of course, I don’t know the personal valuations of the actual contestant. If he’s rich, he may already have everything he wants; in that case, winning $1,000,000 might actually be more than twice as awesome as winning $416,000.

In the coming chapters in this Superrationality series, we’ll discuss many dilemmas based on utility. Since dollars are more familiar than utils, we’ll directly measure utility in dollars. And after this chapter, you know a bit about how utility works!

Chapter III: Why Eve is wrong

Remember Eve? She decided to one-box in Newcomb’s problem (Omega’s game with the two boxes), because everyone who one-boxed got $1,000,000 (whereas everybody who two-boxes got only $1,000).

This is a reasonable approach to decision making, but one that is ultimately flawed. This chapter aims to show why Eve is wrong in her reasoning. But first: what is evidence, really?

How evidence works

Imagine a hypothetical world in which 5 in 100 people (5%) have COVID-19. In this world, there is a device for detecting the disease.

This device has the following statistics: if 1000 people infected with COVID-19 get tested with the device, 900 will test positive (90%). If however a 1000 healthy people get tested, 100 (10%) will still get a positive test (even though they don’t have COVID-19).

A woman gets tested positively. What is the probability she has COVID-19?

Intuitively, you might think this probability is somewhere around 0.9 (or 90%) — because 900 out of 1000 people with COVID-19 get tested positively — but you have to remember only 5 in 100 people even have the disease.

So let’s take a group of 2000 people from our hypothetical world, who are all tested with our device. If this group is representative, 100 of those people have COVID-19. Of those 100 people, 90 get a positive test.

And of the 1900 people who don’t have COVID-19, 190 still get a positive test. So that’s a total of 90 + 190 = 280 people out of 2000 people who get tested positively — 280 people, of whom only 90 actually have COVID-19.

Of the infected people, 90 get a positive test and 10 a negative test. Of the healthy people, 190 get a positive test and 1710 get a negative test.
Our 2000 people, in four quadrants (square size directly proportional to number of people)

So if we positively test 280 people for COVID-19, only 90 of them actually have the disease. That’s less than a third! The woman who got tested positively only has around a 32% (90 / 280 * 100 ≈ 32.14%) probability of actually having COVID-19.

You might think that can’t be right. After all, doesn’t the device test 90% of COVID-19 patients correctly?

It does, but it also (incorrectly) tests 10% of healthy people as having COVID-19 — a smaller percentage, but there are a lot more healthy people than people with COVID-19! The 10% is 9 times as “small” as the 90%, but the group of healthy people is more than 9 times as large — 19 times as large in fact — as the group of people with COVID-19.

So all in all, that gives a quite small fraction of COVID-19 patients in the group of positively tested people. And that’s the metric we’re actually looking for! We’re not looking for “How often does our device test COVID-19 patients correctly?”; we’re looking for “How many COVID-19 patients are there in our group of positively tested people?”

Still, the 32% might make the device seem useless. But look at it this way: without a positive test, someone has only a 5% probability of having COVID-19 (that’s the a priori probability, or our prior) — so our device got that up quite a lot.

How Eve decides to one-box

So how does this all relate to Eve’s decision to one-box?

After Omega claims she is extremely good at predicting human actions in Newcomb’s Problem, Eve might be skeptical; after all, anyone can make such a claim, and it is quite outrageous. However, Omega could be right. As such, Eve assigns 10% probability to the claim “Omega predicts perfectly”.

Then, Eve sees Omega make a good prediction. That should count as some evidence for the claim that Omega is perfect at predicting, but not too much: if Omega was predicting randomly, one good prediction isn’t more surprising than a bad prediction.

So how much should Eve shift her 10%? We can model this problem in the same way we modeled the COVID-19 problem above.

Imagine a group of 2000 people playing Newcomb’s Problem. In the COVID-19 problem, we had a prior of 5% of infected people. Now, we have a prior of 10%: a 10% probability that Omega predicts perfectly. That’s 200 people who receive a good prediction.

The other 1800 people get a random prediction. A random prediction has a 50% probability of being right and the same probability of being wrong. So that’s another 900 players who get predicted correctly, and 900 who get a wrong prediction.

We have a group of 2000 people playing Newcomb’s Problem. We have a 10% probability that Omega predicts perfectly. That’s 200 people who receive a good prediction. The other 1800 people get a random prediction. That’s another 900 players who get predicted correctly, and 900 who get a wrong prediction.
Notice that the top right quadrant is empty: a perfect Omega doesn’t give bad predictions!

In total, that’s 1100 people with a good prediction, of which 200 got one from a perfect Omega. That’s a bit more than 18% — that’s significantly more than the original 10%, but Eve is still quite sure Omega isn’t perfect.

Now Eve sees a second correct prediction. A randomly predicting Omega has a 25% (0.50 * 0.50 = 0.25) probability of giving 2 out of 2 correct predictions. On the other hand, a perfect Omega necessarily makes each prediction correctly:

Only 450 people will receive 2 good predictions from a random Omega, and 1350 will receive less than 2. Perfect Omega, however, gives 200 people 2 good predictions.
As much as 1350 people will receive less than 2 good predictions from a random Omega.

As you can see, the 2 left squares are a lot more closer to each other in size now! There are now 650 people how get a correct prediction, of which 200 come from a perfect Omega; that’s more than a 30% probability Eve now assigns to Omega predicting perfectly.

So with each extra correct prediction, the bottom left square halves in size, whereas the top left square stays the same.

After 4 correct predictions, the top left square is bigger than the bottom left one, and Eve is starting to seriously suspect Omega is perfect (or extremely good) at predicting. And the more correct predictions Eve sees, the more confident she becomes!

Evidential Decision Theory

After, say, 10 correct predictions, Eve is very confident that Omega predicts perfectly. But note that that also means that if Eve two-boxes, that is evidence that Omega will have predicted she would two-box!

In a similar way, one-boxing is evidence that Omega will have predicted that. By extension, one-boxing is evidence that Box B will have a full envelope and Eve will earn $1,000,000. And that is, of course, why Eve one-boxes.

By doing so, Eve is following Evidential Decision Theory (EDT). EDT is a decision theory — within the field Decision Theory — that says you should take the action that provides the best evidence for a good outcome.

Two-boxing is good evidence for earning only $1,000; one-boxing is (equally strong) evidence for earning $1,000,000. EDT therefore says you should one-box.

But is that reasoning correct? While I submit that one-boxing is certainly the correct action in Newcomb’s Problem, EDT does so for the wrong reason. It makes a classic mistake: confusing correlation with causality.

Smoking Lesion: correlation doesn’t equal causality

Let’s imagine a hypothetical world. In this world, like in our world, smoking is strongly correlated with lung cancer; however, unlike in our world, smoking doesn’t cause lung cancer in this world. Instead, a part of the population has a genetic lesion that causes both a fondness of smoking and lung cancer. If you correct for the presence or absence of the lesion, there is no correlation between smoking and lung cancer.

Suppose your preferences are as follows:

  1. smoking, but not getting lung cancer
  2. not smoking, and not getting lung cancer
  3. smoking, and getting lung cancer
  4. not smoking, and getting lung cancer

Should you smoke?

The above problem is known as Smoking Lesion, and the answer is: yes. You either get lung cancer or you don’t; in both cases, you prefer smoking to not smoking. Furthermore, smoking doesn’t change your odds of getting lung cancer — having the lesion does — so smoking is clearly the better choice.

But what would Eve (who follows EDT) do? Unfortunately, due to the correlation between smoking and lung cancer, smoking is evidence for getting lung cancer. After all, consider the following illustration:

A diagram with two orthogonal axes: “smokes” vs. “doesn’t smoke”, and “doesn’t get lung cancer” vs. “gets lung cancer”.

The more people Eve sees, the more the top left of this diagram — the people who smoke and get lung cancer — will be filled relative to the top right. And the bottom right — the people who don’t smoke and don’t get lung cancer — will be filled more and more relative to the bottom left (where people smoke but don’t get lung cancer).

What Eve doesn’t realize is that while smoking is indeed evidence for getting lung cancer, it doesn’t cause lung cancer. Lung cancer is caused by the lesion, which happens to also make people more likely to smoke.

We can also look at this problem as follows: Eve either has the lesion, or she doesn’t. In both cases, smoking doesn’t even provide evidence for lung cancer! That correlation is only there in the population as a whole, not in the subpopulations with and without a lesion.

An example population of people with/without a lesion and who either smoke or not. Of the 1000 + 200 = 1200 people who smoke, 500 + 20 = 520 have lung cancer. Of the 1200 people who don’t smoke, 200 have lung cancer. That’s a clear correlation between smoking and lung cancer. And yet, within the group of lesion havers, there is no correlation between smoking and lung cancer, just like within the group of lesion free people.
An example population of people with/without a lesion and who either smoke or not. Of the 1000 + 200 = 1200 people who smoke, 500 + 20 = 520 have lung cancer. Of the 1200 people who don’t smoke, 200 have lung cancer. That’s a clear correlation between smoking and lung cancer. And yet, within the group of lesion havers, there is no correlation between smoking and lung cancer, just like within the group of lesion free people.

Evidence-based decision making thus seems flawed. But we know the flaw: confusing correlation with causality. If we look only at the causal effects of the actions we are considering, we would smoke in Smoking Lesion.

And that’s what our next chapter will be about: Causal Decision Theory.

Chapter IV: Why Carl is wrong

Carl two-boxes in Newcomb’s Problem, because he believes his decision (to one-box or to two-box) has no effect on the prediction made by Omega. After all, this prediction already happened! You can’t change the past.

Carl’s view of Newcomb’s Problem: one-boxing on a one-boxing prediction gives $1,000,000, but two-boxing gives $1,000 more. One-boxing on a two-boxing prediction gives $0, but two-boxing gives $1,000 more.
Carl’s view of Newcomb’s Problem

So Omega’s prediction is fixed, as Carl sees it. Now:

  • if Omega predicts you will one-box, two-boxing earns $1,000 more than one-boxing
  • if Omega predicts you will two-box, two-boxing earns $1,000 more than one-boxing.

So, two-boxing dominates one-boxing: it is better in every possible situation. Carl therefore two-boxes, and with that, follows Causal Decision Theory (CDT).

Where EDT recommends taking the action that is the best evidence for a good outcome, CDT says you should take the action that causes the best outcome.

What is causality, anyway?

A full explanation of causality goes beyond the purpose of this series, but a short introduction is still valuable.

Causality is about cause and effect. Importantly, the effect is dependent on the cause: if the cause happens, the effect is more likely to happen than if the cause doesn’t happen.

But that’s not all. Let’s look at an example.

Say we observe thousands of people. We notice that those who sneeze more often wipe their nose than those who don’t sneeze.

But observing those same people, we also notice that those who have hay fever wipe their nose more often than those without hay fever.

However, if we only look at the people who sneeze, those with hay fever don’t wipe their nose more often than those without it. The same is true if we only look at people who don’t sneeze.

So say we learn whether or not a person sneezes. Then also learning whether or not that person has hay fever provides no additional information about whether that person wipes her nose or not.

But if we learn the hay fever status of a person, then learning whether or not that person sneezes is helpful for determining whether that person wipes her nose.

How does causality come into play here? Well, wiping one’s nose is independent upon hay fever given that the person sneezes; therefore, it can’t be that hay fever is causing the nose wiping. It’s the sneezing that causes the wiping!

Smoking Lesion

In Smoking Lesion, your action — to smoke or not to smoke — has no causal effect on whether or not you get lung cancer. After all, lung cancer is independent upon smoking given the presence or absence of the genetic lesion!

The only thing — in the Smoking Lesion world — that does have a causal effect on getting lung cancer is the lesion itself, on which your action has no influence.

Carl, who follows CDT, correctly smokes in Smoking Lesion. If smoking doesn’t cause lung cancer anyway, he may as well enjoy it.

But he also two-boxes in Newcomb’s Problem, and we have already seen that’s problematic: Omega predicts Carl will two-box, and therefore Carl misses out on $1,000,000.

Newcomb’s Problem vs. Smoking Lesion

At this point, you may wonder about the exact difference between Newcomb’s Problem and Smoking Lesion.

After all, the two problems may seem basically the same: if we say not having the lesion is worth $1,000,000 (because you have a lower risk of lung cancer), and smoking is worth $1,000 to you, then Smoking Lesion looks like this:

Smoking Lesion’s payoff table is exactly like the payoff table for Newcomb’s Problem.

That’s the same table as Newcomb’s Problem (except for the names):

Carl’s view of Newcomb’s Problem: one-boxing on a one-boxing prediction gives $1,000,000, but two-boxing gives $1,000 more. One-boxing on a two-boxing prediction gives $0, but two-boxing gives $1,000 more.

It seems the dominant action in Smoking Lesion is to smoke, as that action always earns $1,000 more than not smoking.

Likewise, it seems the dominant action in Newcomb’s Problem is two-boxing.

So why am I saying the best actions are smoking and one-boxing?

The big difference between the two problems is in the predictive power of Omega. To fully understand this, we’ll get into a fun thought experiment in Chapter V.

Chapter V: Simulating Newcomb’s Problem

Imagine we simulate Newcomb’s Problem on a computer. We write code for every component of the problem:

  • a PLAYER program, that either one-boxes or two-boxes. Note that PLAYER’s outcome to Newcomb’s Problem is fixed: if it “thinks” two-boxing is best, it always outputs two-boxing, for example.
  • an OMEGA program, that predicts what the player will do.
  • an EVALUATION program, that gives a payoff ($0, $1,000, $1,000,000 or $1,001,000) based on the output of PLAYER and OMEGA.

So schematically, the project looks as follows:

A schematic view of our simulation of Newcomb’s Problem, with the OMEGA program, the PLAYER program and the EVALUATION program.
Simulating Newcomb’s Problem

Our code would run in the following order:

  1. OMEGA runs, and makes its prediction: one-box or two-box.
  2. PLAYER runs, and outputs one-box or two-box.
  3. EVALUATION runs, and outputs a payoff based on the outputs of OMEGA and PLAYER, using the table below.
If PLAYER outputs one-box, EVALUATION outputs $1,000,000 if OMEGA outputs one-box and $0 otherwise. If PLAYER outputs two-box, EVALUATION outputs $1,001,000 if OMEGA outputs one-box and $1,000 otherwise.
Payoff table for EVALUATION

But we have to ask ourselves: how does OMEGA make its prediction?

There’s a simple trick we can use if we want OMEGA to make perfect predictions: we can give it the PLAYER code!

OMEGA can then run PLAYER, see what PLAYER outputs (one-box or two-box) and use that output to make its prediction.

So OMEGA would look as follows:

OMEGA running PLAYER to check what PLAYER outputs.
OMEGA running PLAYER

And our full code looks as follows:

The full Newcomb’s Problem simulation scheme, with PLAYER inside OMEGA (and apart from it as well).
Simulation of Newcomb’s Problem, with PLAYER in 2 places

Note that PLAYER is now run in two places.

And that’s a crucial point if we want to make PLAYER get the highest payoff possible from EVALUATION (and we do, of course). Think about it: what would you like PLAYER to output — one-box or two-box — given that the exact code is used by OMEGA to make the prediction?

If PLAYER outputs two-box, it does so inside the OMEGA program as well. OMEGA then predicts PLAYER will two-box. PLAYER then actually two-boxes, and EVALUATION gives a payoff of $1,000.

On the other hand, if PLAYER one-boxes, then it does that inside OMEGA as well. OMEGA then predicts PLAYER will one-box, and PLAYER goes on to receive $1,000,000 from EVALUATION!

So it seems clear PLAYER should output one-box. What remains to be seen is how PLAYER should decide this: it can’t follow EDT (even though EDT one-boxes), because that will lead to the wrong decision on Smoking Lesion. Speaking of which…

Simulating Smoking Lesion, with a LESION program, a PLAYER program and an EVALUATION program.
Simulating Smoking Lesion

Here’s the schematic of a Smoking Lesion program. The key difference with Newcomb’s Problem is that here, PLAYER is only run once!

That’s because LESION isn’t predicting what the player does. In Smoking Lesion, the genetic lesion does tend to cause smoking — and therefore has some effect on the smoking behavior of the citizens of the Smoking Lesion world — but this irrelevant for what the correct decision is.

You may be able to predict, to some accuracy, whether a random citizen of Smoking Lesion smokes or not using the fact whether or not they have the lesion. But it’s not the lesion itself that does the predicting: the presence or absence of the lesion is irrelevant for what PLAYER should do.

So in Smoking Lesion, PLAYER can follow the recommendation of CDT and smoke. But in Newcomb’s Problem, this doesn’t work, since PLAYER is being run twice.

Let’s take that fact into account in PLAYER’s decision making, then! Instead of asking: “What action causes the best outcome?”, like CDT recommends, PLAYER should ask: “Which output of PLAYER causes the best outcome?” (Where, on Newcomb’s Problem, the output is either two-box or one-box.)

That’s may be a bit of a confusing question, but hear me out. If PLAYER knows the schematics of Newcomb’s Problem, it knows PLAYER is being run twice. So it knows that, if PLAYER’s output is two-box, two-box will be output two times as well; one of those times in OMEGA, causing OMEGA to not put $1,000,000 in Box B.

So the answer to “Which output of PLAYER causes the best outcome?” is: one-box. In Smoking Lesion, it’s smoke; the same as CDT recommends, since in this problem, PLAYER is run only once.

The decision procedure that uses this line of reasoning is called Functional Decision Theory, which will be subject of Chapter VI.

Chapter VI: Functional Decision Theory

In the previous chapter, we saw how PLAYER — a computer program — managed to succeed at both Newcomb’s Problem and Smoking Lesion by considering the question: “Which output of PLAYER causes the best outcome?”

In doing so, it essentially follows Functional Decision Theory (FDT). Where CDT asks which action causes the best outcome and EDT asks which action is the best evidence of a good outcome, FDT asks:

Which output of this decision procedure causes the best outcome?

In the computer world of Chapter V, it’s obvious PLAYER should one-box. But if we follow FDT in real life, what does it recommend?

When Carl, Eve and their friends where participating in Newcomb’s Problem in Chapter I, it appears Omega was modelling their decision procedure in order to make her prediction.

What do I mean by modelling a decision procedure?

Let’s back up a bit. PLAYER was implementing a function: it had an input (e.g. Newcomb’s Problem) and a fixed output for that input (one-box or two-box). OMEGA could easily implement that same function (by running PLAYER), in order to make her prediction.

Carl, Eve and their friends aren’t computer programs like PLAYER. But they have some way of making a decision in Newcomb’s Problem. So they, too, are implementing a function, and we call this implementation a decision procedure.

When Omega is predicting what Carl will do, she models Carl’s decision procedure: meaning, she calculates the function Carl’s decision procedure implements. This function’s output is two-box — because Carl is a two-boxer — so Omega correctly predicts Carl will two-box. As a result, she puts an empty envelope in Box B.

Okay, but what does FDT actually recommend doing?

One-boxing!

But why?! In the computer world, we were asking ourselves how to program PLAYER in advance, so that it achieves the best outcome. But in real life, you make the decision to one-box or two-box after Omega made her prediction!

Well, yes. But also in Omega’s head before she made the prediction, since Omega was implementing the same function as you do!

But you can’t retroactively change the past!

True! It may seem like I’m saying we can, but hang on.

Two calculators

At this point, it may be helpful to consider an analogy. Imagine John and Mike each have a calculator. For simplicity’s sake, let’s say these calculators are identical.

At Tuesday, John enters 65 x 76 into his calculator. He writes the answer down, but doesn’t communicate it to Mike.

At Wednesday, Mike enters 65 x 76 into his calculator, and sees the answer on the display: 4940. He then goes to John, who tells him he asked his calculator to calculate 65 x 76 a day earlier.

Does Mike need to ask what John’s calculator answered?

Of course, the answer is no. Mike knows John’s calculator gave the same answer his did: 4940.

Mike knows this because John’s calculator was implementing the exact same function (on the same inputs) as his own calculator: the multiplication function, on the inputs 65 and 76.

Since the multiplication function (like all functions) has only one output for any pair of inputs, Mike knows John’s calculator must have given the same answer his did.

65 x 76 = 4940: a fact which John’s calculator displayed on Tuesday, and Mike’s calculator on Wednesday.
John’s calculator gave the same answer on Tuesday as Mike’s did on Wednesday to the input 65 x 76

And nobody is surprised by this (rather silly) thought experiment. Nobody believes Mike’s calculator “caused” John’s calculator to output “4940”.

So why be surprised when one-boxing means Omega predicted you would one-box? One-boxing doesn’t cause Omega to have made a one-boxing prediction anymore than Mike’s calculator caused John’s calculator to output “4940”. It’s just that, if you one-box, you know Omega predicted you would.

Subjunctive dependence

John and Mike’s calculators are both implementing the multiplication function. In general, when two physical systems (like calculators, but also computers, humans, Omega, etc.) are implementing the same function, they are subjunctively dependent upon that function.

Subjunctive dependence is what causes you to not be surprised when both calculators answered 4940. It’s also what makes Omega predict you will one-box when you, well, one-box.

FDT tells us there are only two possible outcomes in Newcomb’s Problem: one-boxing while Omega predicted you would one-box, or two-boxing while Omega predicted that.
According to FDT, there are only two possible outcomes in Newcomb’s Problem

And yet… That second thing feels like changing the past in a way the first one doesn’t (at least to me, at times). It feels like, when Omega has predicted you will one-box, you’re still free to two-box (and get $1,001,000!).

And, well, you can still make any decision you like. It’s just that that decision has been predicted!

Alright, but human behavior isn’t predictable like this. Omega simply can’t model your decision procedure!

A fair point. I don’t agree with it, but I can see readers make this objection.

So let’s say Omega isn’t modelling your decision procedure at all. As it turns out, everybody wearing a hat one-boxes, everybody without a hat two-boxes, and Omega makes her prediction by checking whether you wear a hat.

Note that now, there is no subjunctive dependence between you and Omega! Your decision is made only in your head, not in Omega’s, and it doesn’t influence the content of Box B.

This means things are as they are in Smoking Lesion!

If we remove the subjunctive dependence, Newcomb’s Problem is equivalent to Smoking Lesion.
Newcomb’s Problem and Smoking Lesion

The difference between the two problems was purely in the absence or presence of subjunctive dependence, which is now absent in both problems. FDT therefore recommends two-boxing now!

Note that in the problems to come, I (explicitly) do assume subjunctive dependence. I just wanted to note that FDT doesn’t automatically assume it’s there; ideally, someone (or something) following FDT notices subjunctive dependence if (and only if) it’s there and acts accordingly.

Psychological Twin Prisoner’s Dilemma

You and your partner in crime have been arrested for two crimes: a minor burglary and an armed robbery. The police have enough evidence to convict you two for the burglary, but not for the robbery.

So, the police come up with a plan. They put both of you in separate rooms — so you can’t communicate with each other — and offer both of you the same deal:

if you witness against (betray) your partner, you will get no punishment for the burglary.

You and your partner in separate rooms, being interrogated by police officers.
Getting caught sucks, huh?

So if only one of you betrays the other, the betrayer gets to go home free while the betrayed one gets 5 years (4 for the robbery and 1 for the burglary).

If you both betray each other, you each get 4 years for the robbery.

Finally, if you both stay silent, you each get 1 year for the burglary.

Interestingly, you and your partner both know that you have the exact same decision procedure for deciding this dilemma. In order to go home free as soon as possible, should you betray your partner, or stay silent?

If only one of you betrays the other, the betrayer gets to go home free while the betrayed one gets 5 years. If you both betray each other, you each get 4 years for the robbery. If you both stay silent, you each get 1 year.
If you partner betrays you, betraying her gives you 1 year less in prison than staying silent. If your partner stays silent, the same is true!

Eve would say that betraying your partner is evidence your partner betrays you as well. Furthermore, staying silent is evidence your partner does that. She therefore stays silent.

But Carl disagrees: since you two can’t communicate, you have no causal influence on your partner’s action in this dilemma. Since betraying his partner gives him 1 year less prison time than staying silent — no matter what his partner does — he betrays his partner.

What about Fiona, an follower of FDT?

Well, whatever you do, your partner does too. You two are subjunctively dependent upon your decision procedure! Going home free immediately isn’t possible — it requires that you two make different decisions!

So either you both betray each other, or you both stay silent. The latter option gets you free 3 years earlier. Fiona therefore stays silent.

Note that, in this dilemma, you and your partner make decisions at the same time. But that’s not a crucial part of the problem! Things would have been the same if your partner made the decision a day earlier. Subjunctive dependence is timeless.

Moreover, the Psychological Twin Prisoner’s Dilemma isn’t even different from Newcomb’s Problem when you think about it:

The Psychological Twin Prisoner’s Dilemma and Newcomb’s Problem are equivalent: they have the exact same outcomes.
Comparing the Psychological Twin Prisoner’s Dilemma to Newcomb’s Problem, giving the order of preference of the outcomes in blue

Notice the order in which you prefer decision combinations (like two-boxing and a two-boxing prediction, or staying silent while your partner betrays you): they are the same for both problems. Furthermore, for both problems, (1) and (4) are impossible because of subjunctive dependence.

Subjunctive dependence might still sound mystical, but (hopefully) things will get more clear in Chapter VII.

Chapter VII: Transparent Newcomb Problem

The situation is like it is in Newcomb’s Problem, but now, Box B is also open. Furthermore, Omega says:

“I have predicted whether you will one-box or two-box upon seeing that Box B contains $1,000,000. If I predicted you would one-box upon seeing $1,000,000 — leaving the $1,000 of Box A behind — then I put $1,000,000 in Box B. But if I predicted that you would two-box upon seeing $1,000,000, then I left Box B empty.”

You look in Box B — it is open, after all — and see that it contains $1,000,000. Of course, Omega is still an extremely good predictor. Should you one-box, or two-box?

Box A visibly contains $1,000; Box B visibly contains $1,000,000. Do you pick only Box B, or both boxes?
An easy choice?

In this problem, Carl and Eve agree: you should two-box. Carl argues that one-boxing or two-boxing can’t have a causal effect on the content of Box B (as before), and Eve notes that seeing $1,000,000 in Box B is pretty strong evidence for $1,000,000 being in Box B.

What about Fiona? From a perspective of subjunctive dependence, not much has changed since the original Newcomb’s Problem. Your decision is still made at two points in time: once in Omega’s head, and once in your own.

If you one-box, you one-box in Omega’s head, and she predicts you’ll one-box; then, there is $1,000,000 in Box B.

If you two-box, Omega predicts that, and then she doesn’t put $1,000,000 in Box B.

Wait… I already see $1,000,000 in Box B. Two-boxing now can’t change that! Yes, two calculators, subjunctive dependence, very interesting. But I see $1,000,000.

A fair point. However, the problem statement contradicts itself:

  1. It says you see $1,000,000 in Box B.
  2. It also says that Omega only put $1,000,000 in Box B if she predicted you would one-box, and that Omega predicts you (almost) perfectly.

Point 2 means that Omega didn’t put $1,000,000 in Box B if you two-box, thereby contradicting point 1.

Two ways of resolving the contradiction

The above contradiction can be resolved in two ways.

First, we can choose to focus on point 1: we see $1,000,000 in Box B, and that’s that. But that does have implications: seeing this $1,000,000 means Omega put it there, and that means Omega predicted you will one-box.

So?

Subjunctive dependence, remember? If Omega predicted you will one-box, and she is so good at predicting, then that already means you will one-box. The “problem” statement might as well have been:

Omega presents you with two transparent boxes, A and B. She put $1,000 in Box A. She put $1,000,000 in Box B. You see, she predicted you will one-box. You then one-box, and get $1,000,000.

This resolution is fine, but it makes the problem statement quite boring. There is no decision to make anymore! The problem simply states what you decide.

The second resolution has the focus on point 2: Omega only put $1,000,000 in Box B if she predicted you would one-box. That means that if you one-box, you do see $1,000,000 in Box B, but if you two-box, you don’t.

This resolution may seem a bit more weird, but I’d say it’s just as valid as the first one. Furthermore, it keeps the problem interesting: now there is actually a decision to be made. And hopefully, it’s clear the right decision is the same one as in Newcomb’s Problem: one-boxing.

Note that in both resolutions, the answer is one-boxing. So in that sense, there’s no contradiction.

Another way to understand the second resolution is to ask the question:

Would you rather be a one-boxer or a two-boxer, in case you ever run into the Transparent Newcomb Problem?

If you are a one-boxer, and do run into this problem, Omega predicts you will one-box and put $1,000,000 in Box B. If you are a two-boxer, Omega leaves Box B empty.

So looking at it this way, two-boxers never even see $1,000,000 in Box B! The situation described in the problem statement — where you see $1,000,000 in Box B — is simply irrelevant for two-boxers. It’s an impossible situation for them to be in! Carl and Eve would two-box if they are ever in the described situation — but they never will be.

Chapter VIII: Bomb

Consider the following interesting problem, appropriately called Bomb.

Omega presents you with two boxes: the Left box and the Right box. You can choose either one. Right-boxing costs $1,000, Left-boxing is free.

Of course, Omega predicted what you will do.

  • If Omega predicted you will Right-box, she put a bomb in Left.
  • If Omega predicted you will Left-box, there is no bomb in Left.

So if you open Left and there is a bomb in it, the bomb will explode and you will die. Omega is, as usual, (near) perfect at making predictions.

Interestingly, Omega tells you she predicted you will Right-box. She therefore put a bomb in Left. Moreover, the boxes are transparent, and you can actually see the bomb.

Assume you value your life at $1,000,000. Should you Left-box or Right-box?

The Left box opens, causing an explosion.
An easy decision?

This problem was originally meant as a critique of Functional Decision Theory (FDT): FDT Left-boxes, which is supposedly the wrong decision (as it kills you). It’s not (and it doesn’t), but more on that later.

First, why does FDT Left-box?

Because of subjunctive dependence, of course! It seems that, once again, Omega is modelling your decision procedure. This means that…

  • if you Left-box, Omega predicted that, leaving Left empty — giving you the opportunity for a free pass
  • if you Right-box, Omega predicted that, and put a bomb in Left — letting you live, but for $1,000.

Left-boxing is then $1,000 better than Right-boxing.

But wait… There’s a bomb in Left. We see it. Left-boxing kills us!

Well, sure, that’s what the problem statement says. But it also says that if we Left-box, Omega predicted that, and that would mean there isn’t a bomb in Left.

Like with the Transparent Newcomb Problem, there are two ways to resolve this contradiction.

  1. There is a bomb in Left. That means Omega predicted you would Right-box. Because of the subjunctive dependence between Omega and you, that means you actually Right-box. There’s no decision to make anymore, and you lose $1,000.
  2. You still have a decision to make. Because of the subjunctive dependence, Left-boxing leads to a free pass, and is clearly the better decision.

Since these problems are for testing decision theories (like FDT), I prefer the second resolution. (Though note that it’s interesting that the two resolutions lead to different results, unlike in the Transparent Newcomb Problem.) But either way, you don’t die in an explosion.

So why was this problem used as a critique of FDT?

Well, because at first sight, it seems the decision to Left-box leads to a sure death, as it’s pointed out that there is a bomb in Left. However, this mixes up the two resolutions above: you can’t have it both ways. The subjunctive dependence in the problem assures us there’s no way to die: either you Right-box and there’s a bomb in Left (which you miss), or you Left-box and there’s no bomb in Left.

Again, to get the right intuition here, ask yourself: if I ever run into this problem, do I want to be a Left-boxer or a Right-boxer? Omega will know what you are! Left-boxing simply leads to $1,000 less spent.

Chapter IX: Procreation

In this chapter, I want to discuss another critique of FDT:

I wonder whether to procreate. I know for sure that doing so would make my life miserable. But I also have reason to believe that my father faced the exact same choice, and that he followed FDT. If FDT were to recommend not procreating, there’s a significant probability that I wouldn’t exist. I highly value existing (even miserably existing). So it would be better if FDT were to recommend procreating. So FDT says I should procreate. (Note that this (incrementally) confirms the hypothesis that my father used FDT in the same choice situation, for I know that he reached the decision to procreate.) Procreation, by Wolfgang Schwarz

Schwarz goes on to note:

In Procreation, FDT agents have a much worse life than CDT agents.

(Note that FDT/CDT agents are simply “followers of FDT/CDT”.)

FDT indeed procreates, which is the correct decision: not procreating means your father (who also followed FDT) didn’t procreate either, which means you never existed at all. (Which also means you didn’t decide this, but, well, your father did, which means FDT still has an opinion on this.)

CDT doesn’t have this problem: there is no subjunctive dependence if you’re a CDT’er (not that CDT would use it if it was there, but whatever)! After all, your father followed FDT, and thus had a different decision procedure than you do. CDT can therefore ignore the father in this problem, and simply recommend not having children.

The flaw in this reasoning — which for some reason Schwarz doesn’t seem to recognize — is that this problem is unfair: it punishes FDT by having a father that specifically follows FDT.

We could easily change the problem to be unfair to CDT instead: by having the father follow CDT. Then FDT would recommend not having children (since in this variant, there is no subjunctive dependence). CDT —o nly considering the causal effects of the available decisions — would recommend the same thing, and already did so when the father was deciding whether or not to procreate — which means the “I” in the problem was never born.

A more fair version of Procreation would be the following:

I wonder whether to procreate. I know for sure that doing so would make my life miserable. But I also have reason to believe that my father faced the exact same choice, and that he followed my exact decision procedure. I prefer existing with kids to existing without kids, and existing without kids to not existing at all.

Now CDT and FDT face the same problem! As discussed, CDT doesn’t procreate, and FDT does. CDT’ers therefore never existed in this problem, and FDT’ers do. FDT seems to win after all.

Conclusion

In a way, Functional Decision Theory gets the best of both worlds: where Causal Decision Theory loses out on $1,000,000 (minus $1,000) in Newcomb’s Problem, Evidential Decision Theory fails to smoke in Smoking Lesion. Functional Decision Theory wins the $1,000,000 and smokes; it also correctly solves the Transparent Newcomb Problem, where both CDT and EDT go wrong. As discussed, the Bomb problem isn’t a problem for FDT, and actually shows its strength; and Procreation simply isn’t a fair problem at all.

References

Yudkowsky, E., & Soares, N. (2017). Functional decision theory: A new theory of instrumental rationality. arXiv preprint arXiv:1710.05060.

Hofstadter, D. R. (2008). Metamagical themas: Questing for the essence of mind and pattern. Hachette UK.

MacAskill, W.D. (2019). A Critique of Functional Decision Theory.

Schwarz, W. (2018). On Functional Decision Theory.

--

--

Hein de Haan
Street Science

As a science communicator, I approach scientific topics using paradoxes. My journey was made possible by a generous grant from MIRI (intelligence.org).