Military Officers Need a New Decision Theory

Grant Demaree
Onebrief
Published in
4 min readFeb 12, 2021

When I was younger, I really enjoyed the story of Regulus. The original Regulus, though I also like the Harry Potter character of the same name.

Regulus was a Roman General, captured by Carthage. While in prison awaiting his execution, his captors let him go home to negotiate peace. He made a promise: if the Senate rejected the peace deal, he would return to prison. The Senate rejected the deal, but Regulus kept his promise. He came back to prison, where he was tortured to death.

Regulus looking out the bars of his prison cell
Image © Look and Learn, used with permission

Later, I learned that the Senate may have exaggerated his death to win public support for the war. But true or not, the story is the perfect embodiment of a new normative decision theory. If this essay succeeds, you’ll want to be more like the Regulus from the story, true or not!

Regulus’ situation is a Newcomblike problem, a hotly debated class of decision problems. From nuclear deterrence to day-to-day leadership, Newcomblike problems are everywhere in the military. They can arise when another actor’s decisions depend on their predictions of your own decisions. Here’s the original Newcomb’s problem:

You’re standing in front of two boxes. The first box is transparent, and you can see that it contains $1,000. The second box is opaque. It has either $1M or $0.

You’re given the option to take one box or both boxes. A Predictor, which is 99% accurate, filled the boxes. If the Predictor believes you will take only the opaque box, then it was filled with $1M. But if the Predictor believes you will take both boxes, then it was left empty.

Before you read on, take a moment to decide what you would do.

The two broadly supported decision theories come to different conclusions on Newcomblike problems.

Evidential Decision Theory claims you should make the decision with the highest expected value:

If you take one box, the expected value is $990,000.

If you take both boxes, the expected value is $11,000.

It’s not even close: under Evidential Decision Theory, one-boxing is almost 100x as valuable as two-boxing.

Causal Decision Theory claims you should make the decision that causes the best outcomes:

It’s too late to affect the Predictor’s decision. The boxes have already been filled. No matter what the Predictor did, you can make an extra $1,000 by taking both boxes.

This debate has been around since the 70’s, and it’s hard to resolve. One-boxers seem to get better outcomes in a world filled with Newcomblike problems, but two-boxers seem more rational.

To make matters worse, Evidential Decision Theory might have the advantage in Newcomblike problems, but it’s a disaster in other situations. Evidential Decision Theory can’t tell the difference between correlations and causal relationships. This leads to insane decisions.

Neither decision theory seems quite right. Evidential Decision Theory breaks down in the face of confounding variables, while Causal Decision Theory breaks down when other actors are trying to predict you. It’s particularly bad in blackmail and deterrence situations.

A couple years ago, Nate Soares and Eliezer Yudkowsky came up with a new decision theory. They run the Machine Intelligence Research Institute at Berkeley. Their original intent is in AI safety, but the results have practical implications for every planner.

Instead of trying to optimize any individual decision, they imagine you have a decision function. Like all functions, your decision function takes an input and gives an output:

Input: you see light infantry contact in the woods

Output: you execute Battle Drill 1A

Many of your decisions are more complicated than this one, but the principle is the same.

Their crucial point is that you’re not the only actor running your decision function. Your adversaries, allies, partners, and all sorts of other actors are simulating a copy of it. Of course, they’re running an imperfect copy.

This means you should optimize for the total effect of all copies of your decision function. For example, in the original Newcomb’s problem, it’s better to have a one-boxing decision function, because both you and the Predictor will run it.

Let’s make this theory practical. Simple rules won’t capture the entire theory, but they’re a lot more useful in a crisis than “optimize for the total effect of all copies of your decision function.”

To make this theory practical, act as you would have ideally precommited. Narrow precommitments are already a common strategy: imagine you’re a potential kidnapping target. If you can credibly commit never to pay ransom, you’re less likely to be kidnapped in the first place. Soares and Yudkowsky take it a step further. What if everyone expected you to act as you would have ideally precommitted — even in situations you never could have foreseen?

Following this blanket policy, Newcomb’s Predictor fills the box with $1M, Regulus gets to visit home, and you don’t get kidnapped for ransom. This is a practical manifestation of “optimizing for the total effect of all copies of your decision function.”

As for Regulus, it looks like he made a terrible mistake. If he’d been the kind of person who would break his promise, couldn’t he have avoided a terrible death? But if he’d been the kind of person who would break his promise, the Carthageans would never have let him go in the first place.

Why do I care so much about decision theory? My team at Onebrief is building the future of military planning. If this is a topic you’re interested in, you can help. We run exercises every month to test Onebrief against a red team. If you’re a military officer serious about planning, you’ll make $500 for participating in a remote planning exercise as a tester. Details here

--

--

Grant Demaree
Onebrief
Editor for

CEO and co-founder at Onebrief, the software platform for agile military planning